python - Why does my plot have criss-crossing lines when I convert the index from string to datetime? - Stack Overflow

时间： 2025-01-06 admin 业界

Error[2]: array_keys() expects parameter 1 to be array, null given, File: /www/wwwroot/www.dn580.com/tmp/view_template_demo_ceshi_htm_read.htm, Line: 31

File: /www/wwwroot/www.dn580.com/tmp/view_template_demo_ceshi_htm_read.htm, Line: 31, array_keys()
File: /www/wwwroot/www.dn580.com/tmp/route_read.php, Line: 204, include(/www/wwwroot/www.dn580.com/tmp/view_template_demo_ceshi_htm_read.htm)
File: /www/wwwroot/www.dn580.com/tmp/index.inc.php, Line: 129, include(/www/wwwroot/www.dn580.com/tmp/route_read.php)
File: /www/wwwroot/www.dn580.com/index.php, Line: 29, include(/www/wwwroot/www.dn580.com/tmp/index.inc.php)

c - Solaris 10 make Error code 1 Fatal Error when trying to build python 2.7.16 - Stack Overflow 推荐度：
javascript - How to dismiss a phonegap notification programmatically - Stack Overflow 推荐度：
javascript - Get the JSON objects that are not present in another array - Stack Overflow 推荐度：
javascript - VS 2015 Angular 2 import modules cannot be resolved - Stack Overflow 推荐度：
javascript - Type 'undefined' is not assignable to type 'menuItemProps[]' - Stack Overflow 推荐度：
相关推荐

I am trying to plot a time series with plt.plot() but I am constantly observing some strange output. A sample of the dataset I am providing below (the whole dataset has roughly 150,000 entries). One of the columns of this dataset consists of time values and serves as an index. Depending on whether I convert the index from a string to a datetime object or not, I am getting two different outputs when plotting against the target variable.

Scenario 1: string index

import matplotlib.pyplot as plt
import pandas as pd

sample_df = pd.DataFrame()
sample_df["Date"] = [ "2002-12-31 01:00:00", "2002-06-06 10:00:00", "2003-11-10 19:00:00",  
                      "2003-04-15 04:00:00", "2004-09-19 14:00:00", "2004-02-24 23:00:00",  
                      "2005-07-30 08:00:00", "2005-01-03 17:00:00", "2006-06-08 02:00:00",  
                      "2007-11-12 11:00:00", "2007-04-18 20:00:00", "2008-09-21 06:00:00",  
                      "2008-02-26 15:00:00", "2009-08-03 00:00:00", "2009-01-05 09:00:00",  
                      "2010-06-11 19:00:00", "2011-11-14 04:00:00", "2011-04-20 13:00:00",  
                      "2012-09-24 23:00:00", "2012-02-28 08:00:00", "2013-08-04 17:00:00",  
                      "2013-01-07 02:00:00", "2014-06-13 09:00:00", "2015-11-17 18:00:00",
                      "2015-04-22 01:00:00", "2016-09-26 09:00:00", "2016-03-02 18:00:00",
                      "2017-08-06 01:00:00", "2017-01-10 10:00:00", "2018-01-16 19:00:00" ]
sample_df["Energy_MW"] = [ 26498.0, 39167.0, 36614.0, 21837.0, 26644.0,
                           33574.0, 30255.0, 33781.0, 24344.0, 34708.0,
                           33996.0, 21127.0, 36255.0, 31982.0, 35448.0,
                           37066.0, 22116.0, 31326.0, 26569.0, 33565.0,
                           34649.0, 25709.0, 33516.0, 33032.0, 22333.0,
                           28064.0, 33905.0, 25304.0, 41505.0, 39543.0 ]
sample_df = sample_df.set_index("Date")

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()

This is the plot corresponding to a string index:

Scenario 2: datetime index

import matplotlib.pyplot as plt
import pandas as pd

sample_df = pd.DataFrame()
sample_df["Date"] = [ "2002-12-31 01:00:00", "2002-06-06 10:00:00", "2003-11-10 19:00:00",  
                      "2003-04-15 04:00:00", "2004-09-19 14:00:00", "2004-02-24 23:00:00",  
                      "2005-07-30 08:00:00", "2005-01-03 17:00:00", "2006-06-08 02:00:00",  
                      "2007-11-12 11:00:00", "2007-04-18 20:00:00", "2008-09-21 06:00:00",  
                      "2008-02-26 15:00:00", "2009-08-03 00:00:00", "2009-01-05 09:00:00",  
                      "2010-06-11 19:00:00", "2011-11-14 04:00:00", "2011-04-20 13:00:00",  
                      "2012-09-24 23:00:00", "2012-02-28 08:00:00", "2013-08-04 17:00:00",  
                      "2013-01-07 02:00:00", "2014-06-13 09:00:00", "2015-11-17 18:00:00",
                      "2015-04-22 01:00:00", "2016-09-26 09:00:00", "2016-03-02 18:00:00",
                      "2017-08-06 01:00:00", "2017-01-10 10:00:00", "2018-01-16 19:00:00" ]
sample_df["Energy_MW"] = [ 26498.0, 39167.0, 36614.0, 21837.0, 26644.0,
                           33574.0, 30255.0, 33781.0, 24344.0, 34708.0,
                           33996.0, 21127.0, 36255.0, 31982.0, 35448.0,
                           37066.0, 22116.0, 31326.0, 26569.0, 33565.0,
                           34649.0, 25709.0, 33516.0, 33032.0, 22333.0,
                           28064.0, 33905.0, 25304.0, 41505.0, 39543.0 ]
sample_df = sample_df.set_index("Date")
sample_df.index = pd.to_datetime(sample_df.index)

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()

And this is the plot corresponding to a datetime index:

Why does the conversion to a datetime object drastically make the results worse? How to resolve the issue? Please explain in simple words. I tried to obtain graphical results in both of the scenarios in order to locate the source of the problem. I would wish to see some ideas on how to get the correct plot when the index as well has the correct data type (datetime). I also came across a YouTube video covering the same dataset and the same time series. In the video I did not see any wrong plot despite the fact the index of the dataframe was converted to the correct datatype.

Scenario 1: string index

import matplotlib.pyplot as plt
import pandas as pd

sample_df = pd.DataFrame()
sample_df["Date"] = [ "2002-12-31 01:00:00", "2002-06-06 10:00:00", "2003-11-10 19:00:00",  
                      "2003-04-15 04:00:00", "2004-09-19 14:00:00", "2004-02-24 23:00:00",  
                      "2005-07-30 08:00:00", "2005-01-03 17:00:00", "2006-06-08 02:00:00",  
                      "2007-11-12 11:00:00", "2007-04-18 20:00:00", "2008-09-21 06:00:00",  
                      "2008-02-26 15:00:00", "2009-08-03 00:00:00", "2009-01-05 09:00:00",  
                      "2010-06-11 19:00:00", "2011-11-14 04:00:00", "2011-04-20 13:00:00",  
                      "2012-09-24 23:00:00", "2012-02-28 08:00:00", "2013-08-04 17:00:00",  
                      "2013-01-07 02:00:00", "2014-06-13 09:00:00", "2015-11-17 18:00:00",
                      "2015-04-22 01:00:00", "2016-09-26 09:00:00", "2016-03-02 18:00:00",
                      "2017-08-06 01:00:00", "2017-01-10 10:00:00", "2018-01-16 19:00:00" ]
sample_df["Energy_MW"] = [ 26498.0, 39167.0, 36614.0, 21837.0, 26644.0,
                           33574.0, 30255.0, 33781.0, 24344.0, 34708.0,
                           33996.0, 21127.0, 36255.0, 31982.0, 35448.0,
                           37066.0, 22116.0, 31326.0, 26569.0, 33565.0,
                           34649.0, 25709.0, 33516.0, 33032.0, 22333.0,
                           28064.0, 33905.0, 25304.0, 41505.0, 39543.0 ]
sample_df = sample_df.set_index("Date")

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()

This is the plot corresponding to a string index:

Scenario 2: datetime index

import matplotlib.pyplot as plt
import pandas as pd

sample_df = pd.DataFrame()
sample_df["Date"] = [ "2002-12-31 01:00:00", "2002-06-06 10:00:00", "2003-11-10 19:00:00",  
                      "2003-04-15 04:00:00", "2004-09-19 14:00:00", "2004-02-24 23:00:00",  
                      "2005-07-30 08:00:00", "2005-01-03 17:00:00", "2006-06-08 02:00:00",  
                      "2007-11-12 11:00:00", "2007-04-18 20:00:00", "2008-09-21 06:00:00",  
                      "2008-02-26 15:00:00", "2009-08-03 00:00:00", "2009-01-05 09:00:00",  
                      "2010-06-11 19:00:00", "2011-11-14 04:00:00", "2011-04-20 13:00:00",  
                      "2012-09-24 23:00:00", "2012-02-28 08:00:00", "2013-08-04 17:00:00",  
                      "2013-01-07 02:00:00", "2014-06-13 09:00:00", "2015-11-17 18:00:00",
                      "2015-04-22 01:00:00", "2016-09-26 09:00:00", "2016-03-02 18:00:00",
                      "2017-08-06 01:00:00", "2017-01-10 10:00:00", "2018-01-16 19:00:00" ]
sample_df["Energy_MW"] = [ 26498.0, 39167.0, 36614.0, 21837.0, 26644.0,
                           33574.0, 30255.0, 33781.0, 24344.0, 34708.0,
                           33996.0, 21127.0, 36255.0, 31982.0, 35448.0,
                           37066.0, 22116.0, 31326.0, 26569.0, 33565.0,
                           34649.0, 25709.0, 33516.0, 33032.0, 22333.0,
                           28064.0, 33905.0, 25304.0, 41505.0, 39543.0 ]
sample_df = sample_df.set_index("Date")
sample_df.index = pd.to_datetime(sample_df.index)

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()

And this is the plot corresponding to a datetime index:

Share Improve this question asked 15 hours ago Lyudmil Yovkov 111 silver badge3 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

you need sort

sample_df = sample_df.set_index("Date").sort_index()  # sort

# same with your code
sample_df.index = pd.to_datetime(sample_df.index)

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()