python - Why does my plot have criss-crossing lines when I convert the index from string to datetime? - Stack Overflow

时间: 2025-01-06 admin 业界

I am trying to plot a time series with plt.plot() but I am constantly observing some strange output. A sample of the dataset I am providing below (the whole dataset has roughly 150,000 entries). One of the columns of this dataset consists of time values and serves as an index. Depending on whether I convert the index from a string to a datetime object or not, I am getting two different outputs when plotting against the target variable.

Scenario 1: string index

import matplotlib.pyplot as plt
import pandas as pd

sample_df = pd.DataFrame()
sample_df["Date"] = [ "2002-12-31 01:00:00", "2002-06-06 10:00:00", "2003-11-10 19:00:00",  
                      "2003-04-15 04:00:00", "2004-09-19 14:00:00", "2004-02-24 23:00:00",  
                      "2005-07-30 08:00:00", "2005-01-03 17:00:00", "2006-06-08 02:00:00",  
                      "2007-11-12 11:00:00", "2007-04-18 20:00:00", "2008-09-21 06:00:00",  
                      "2008-02-26 15:00:00", "2009-08-03 00:00:00", "2009-01-05 09:00:00",  
                      "2010-06-11 19:00:00", "2011-11-14 04:00:00", "2011-04-20 13:00:00",  
                      "2012-09-24 23:00:00", "2012-02-28 08:00:00", "2013-08-04 17:00:00",  
                      "2013-01-07 02:00:00", "2014-06-13 09:00:00", "2015-11-17 18:00:00",
                      "2015-04-22 01:00:00", "2016-09-26 09:00:00", "2016-03-02 18:00:00",
                      "2017-08-06 01:00:00", "2017-01-10 10:00:00", "2018-01-16 19:00:00" ]
sample_df["Energy_MW"] = [ 26498.0, 39167.0, 36614.0, 21837.0, 26644.0,
                           33574.0, 30255.0, 33781.0, 24344.0, 34708.0,
                           33996.0, 21127.0, 36255.0, 31982.0, 35448.0,
                           37066.0, 22116.0, 31326.0, 26569.0, 33565.0,
                           34649.0, 25709.0, 33516.0, 33032.0, 22333.0,
                           28064.0, 33905.0, 25304.0, 41505.0, 39543.0 ]
sample_df = sample_df.set_index("Date")

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()

This is the plot corresponding to a string index:

Scenario 2: datetime index

import matplotlib.pyplot as plt
import pandas as pd

sample_df = pd.DataFrame()
sample_df["Date"] = [ "2002-12-31 01:00:00", "2002-06-06 10:00:00", "2003-11-10 19:00:00",  
                      "2003-04-15 04:00:00", "2004-09-19 14:00:00", "2004-02-24 23:00:00",  
                      "2005-07-30 08:00:00", "2005-01-03 17:00:00", "2006-06-08 02:00:00",  
                      "2007-11-12 11:00:00", "2007-04-18 20:00:00", "2008-09-21 06:00:00",  
                      "2008-02-26 15:00:00", "2009-08-03 00:00:00", "2009-01-05 09:00:00",  
                      "2010-06-11 19:00:00", "2011-11-14 04:00:00", "2011-04-20 13:00:00",  
                      "2012-09-24 23:00:00", "2012-02-28 08:00:00", "2013-08-04 17:00:00",  
                      "2013-01-07 02:00:00", "2014-06-13 09:00:00", "2015-11-17 18:00:00",
                      "2015-04-22 01:00:00", "2016-09-26 09:00:00", "2016-03-02 18:00:00",
                      "2017-08-06 01:00:00", "2017-01-10 10:00:00", "2018-01-16 19:00:00" ]
sample_df["Energy_MW"] = [ 26498.0, 39167.0, 36614.0, 21837.0, 26644.0,
                           33574.0, 30255.0, 33781.0, 24344.0, 34708.0,
                           33996.0, 21127.0, 36255.0, 31982.0, 35448.0,
                           37066.0, 22116.0, 31326.0, 26569.0, 33565.0,
                           34649.0, 25709.0, 33516.0, 33032.0, 22333.0,
                           28064.0, 33905.0, 25304.0, 41505.0, 39543.0 ]
sample_df = sample_df.set_index("Date")
sample_df.index = pd.to_datetime(sample_df.index)

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()

And this is the plot corresponding to a datetime index:

Why does the conversion to a datetime object drastically make the results worse? How to resolve the issue? Please explain in simple words. I tried to obtain graphical results in both of the scenarios in order to locate the source of the problem. I would wish to see some ideas on how to get the correct plot when the index as well has the correct data type (datetime). I also came across a YouTube video covering the same dataset and the same time series. In the video I did not see any wrong plot despite the fact the index of the dataframe was converted to the correct datatype.

I am trying to plot a time series with plt.plot() but I am constantly observing some strange output. A sample of the dataset I am providing below (the whole dataset has roughly 150,000 entries). One of the columns of this dataset consists of time values and serves as an index. Depending on whether I convert the index from a string to a datetime object or not, I am getting two different outputs when plotting against the target variable.

Scenario 1: string index

import matplotlib.pyplot as plt
import pandas as pd

sample_df = pd.DataFrame()
sample_df["Date"] = [ "2002-12-31 01:00:00", "2002-06-06 10:00:00", "2003-11-10 19:00:00",  
                      "2003-04-15 04:00:00", "2004-09-19 14:00:00", "2004-02-24 23:00:00",  
                      "2005-07-30 08:00:00", "2005-01-03 17:00:00", "2006-06-08 02:00:00",  
                      "2007-11-12 11:00:00", "2007-04-18 20:00:00", "2008-09-21 06:00:00",  
                      "2008-02-26 15:00:00", "2009-08-03 00:00:00", "2009-01-05 09:00:00",  
                      "2010-06-11 19:00:00", "2011-11-14 04:00:00", "2011-04-20 13:00:00",  
                      "2012-09-24 23:00:00", "2012-02-28 08:00:00", "2013-08-04 17:00:00",  
                      "2013-01-07 02:00:00", "2014-06-13 09:00:00", "2015-11-17 18:00:00",
                      "2015-04-22 01:00:00", "2016-09-26 09:00:00", "2016-03-02 18:00:00",
                      "2017-08-06 01:00:00", "2017-01-10 10:00:00", "2018-01-16 19:00:00" ]
sample_df["Energy_MW"] = [ 26498.0, 39167.0, 36614.0, 21837.0, 26644.0,
                           33574.0, 30255.0, 33781.0, 24344.0, 34708.0,
                           33996.0, 21127.0, 36255.0, 31982.0, 35448.0,
                           37066.0, 22116.0, 31326.0, 26569.0, 33565.0,
                           34649.0, 25709.0, 33516.0, 33032.0, 22333.0,
                           28064.0, 33905.0, 25304.0, 41505.0, 39543.0 ]
sample_df = sample_df.set_index("Date")

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()

This is the plot corresponding to a string index:

Scenario 2: datetime index

import matplotlib.pyplot as plt
import pandas as pd

sample_df = pd.DataFrame()
sample_df["Date"] = [ "2002-12-31 01:00:00", "2002-06-06 10:00:00", "2003-11-10 19:00:00",  
                      "2003-04-15 04:00:00", "2004-09-19 14:00:00", "2004-02-24 23:00:00",  
                      "2005-07-30 08:00:00", "2005-01-03 17:00:00", "2006-06-08 02:00:00",  
                      "2007-11-12 11:00:00", "2007-04-18 20:00:00", "2008-09-21 06:00:00",  
                      "2008-02-26 15:00:00", "2009-08-03 00:00:00", "2009-01-05 09:00:00",  
                      "2010-06-11 19:00:00", "2011-11-14 04:00:00", "2011-04-20 13:00:00",  
                      "2012-09-24 23:00:00", "2012-02-28 08:00:00", "2013-08-04 17:00:00",  
                      "2013-01-07 02:00:00", "2014-06-13 09:00:00", "2015-11-17 18:00:00",
                      "2015-04-22 01:00:00", "2016-09-26 09:00:00", "2016-03-02 18:00:00",
                      "2017-08-06 01:00:00", "2017-01-10 10:00:00", "2018-01-16 19:00:00" ]
sample_df["Energy_MW"] = [ 26498.0, 39167.0, 36614.0, 21837.0, 26644.0,
                           33574.0, 30255.0, 33781.0, 24344.0, 34708.0,
                           33996.0, 21127.0, 36255.0, 31982.0, 35448.0,
                           37066.0, 22116.0, 31326.0, 26569.0, 33565.0,
                           34649.0, 25709.0, 33516.0, 33032.0, 22333.0,
                           28064.0, 33905.0, 25304.0, 41505.0, 39543.0 ]
sample_df = sample_df.set_index("Date")
sample_df.index = pd.to_datetime(sample_df.index)

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()

And this is the plot corresponding to a datetime index:

Why does the conversion to a datetime object drastically make the results worse? How to resolve the issue? Please explain in simple words. I tried to obtain graphical results in both of the scenarios in order to locate the source of the problem. I would wish to see some ideas on how to get the correct plot when the index as well has the correct data type (datetime). I also came across a YouTube video covering the same dataset and the same time series. In the video I did not see any wrong plot despite the fact the index of the dataframe was converted to the correct datatype.

Share Improve this question asked 15 hours ago Lyudmil YovkovLyudmil Yovkov 111 silver badge3 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

you need sort

sample_df = sample_df.set_index("Date").sort_index()  # sort

# same with your code
sample_df.index = pd.to_datetime(sample_df.index)

# Basic plot.
fig = plt.figure( figsize = (10,5) )
plt.plot(sample_df.index, sample_df["Energy_MW"], 'b')
plt.grid(True)
plt.show()