Time series scatter lines connecting non-sequentially - jumbled lines

Hi, I have encountered this problem a couple of times now and have not been able to find a solution. I use a lot of time series data for environmental monitoring. I connect to SQL or Access databases for this.

Sometimes when I use ‘lines+markers’ for Scattergl style, the lines seem to connect data points in the wrong order, resulting in lines crossing eachother like so:

In this instance I have extracted the dates from a sample code and formatted it to a date in pandas using the following:

tblResult['DateString1'] = tblResult['SampleNo'].str[-12:]
tblResult['DateString2'] = tblResult['DateString1'].str[:8]
tblResult['SampleDate'] = pd.to_datetime(tblResult['DateString2'], errors='coerce')
tblResult.sort_values(by='SampleDate', ascending=True)

I would appreciate help in understanding why this issue occurs and how I can fix it. I have checked the same data and it plots fine in Excel. My entire code is below.

# Define Access connection and connect to table

conn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};' 
                      r'DBQ=H:\Python\Database.accdb;'
                     )


cursor = conn.cursor()

qry_tblResult = ("select SampleNo, LocationCode, ParameterCode, Unit, ResultValue, ResultComment, AnalysisCode, QAFlag from tblResult WHERE QAFlag = 0 AND LocationCode = 'GW-MJA-10A' OR LocationCode = 'GW-MRA-31'")
cursor.execute(qry_tblResult)

result = cursor.fetchall()

# Convert table to pandas dataframe

tblResult = pd.DataFrame([[ij for ij in i] for i in result])
tblResult.rename(columns={0: 'SampleNo', 1: 'LocationCode', 
                          2: 'ParameterCode', 3: 'Unit', 
                          4: 'ResultValue', 5: 'ResultComment', 
                          6: 'AnalysisCode', 7: 'QAFlag'}, 
                 inplace=True)

# Extract sample date from SampleNo

tblResult['DateString1'] = tblResult['SampleNo'].str[-12:]
tblResult['DateString2'] = tblResult['DateString1'].str[:8]
tblResult['SampleDate'] = pd.to_datetime(tblResult['DateString2'], errors='coerce')
tblResult.sort_values(by='SampleDate', ascending=True)

# Plot chart
locations = list(tblResult.LocationCode.unique())
parameters = sorted(list(tblResult.ParameterCode.unique()))

data = []
list_updatemenus = []
for n, location in enumerate(locations):
    visible = [False] * len(locations)
    visible[n] = True
    temp_dict = dict(label = str(location),
                 method = 'update',
                 args = [{'visible': visible},
                         {'title': '{}'.format(location)}])
    list_updatemenus.append(temp_dict)

for n, parameter in enumerate(parameters):
    for location in locations:
        mask = (tblResult.LocationCode.values == location) & (tblResult.ParameterCode.values == parameter)
        trace = go.Scattergl(dict(visible = False,
            text = tblResult.Unit[mask],      
            name = parameter,
            x = tblResult.loc[mask, 'SampleDate'],
            y = tblResult.loc[mask, 'ResultValue'],
            mode = 'lines+markers',
            marker = dict(size = 3))
                   )
        data.append(trace)
    data[int(n*len(locations))]['visible'] = True
    
layout = dict(updatemenus=list([dict(buttons= list_updatemenus)]),
              xaxis=dict(title = 'Date', range=['2001-01-01 00:00:00', '2019-01-01 00:00:00']),
              yaxis=dict(title = 'Value'),
              title='Parameter Result')

fig = dict(data=data, layout=layout)
offline.plot(fig, filename="H:/3982_NorthMara_WaterBalance/500_Processed/540_Database/Python/WQ_TS_v1.html", auto_open=False)

Thank you!

plotly by design doesn’t sort values before plotting, so this kind of output usually indicates some kind of sorting issue in the data itself.

I haven’t tried to run your code but the line tblResult.sort_values(by='SampleDate', ascending=True) seems like a possible culprit. This Pandas function doesn’t actually result in any change in tblResult unless you also pass in the inplace=True argument, or unless you reassign its output to tblResult.

Makes sense! I sorted my data in the database query itself and that solved the problem. Thanks very much.

three reasons why you should buy plotly pro: support open source, get great support, host your plots and dashboards online