How to show overlap points in scatter plot

Hello. When two points have same [x,y] values, the scatter plot only show one point. Is there a way to display all points even when there are overlap? Thank you!

2 Likes

Hi @wangziheng,

How would you like both points to be displayed? A common workaround for this kind of overplotting situation is to give the markers an opacity < 1 so that overlapping points are darker. Opacity is controlled by the scatter.marker.opacity property.

If you have lots of overplotting, you may want to consider a histogram2dcontour trace.

-Jon

Hi Jon,

Thank you for great advice. I played with the opacity property as demoed in https://plot.ly/python/marker-style/.

But even when overlapping points are displayed, if I hover the mouse over the points, only one point’s detail is shown. Sorry I didn’t make this clear. It’s ok if the overlapping points only shown as one point in the plot. But I would like the overlapping points details when hover. Is there a property to address this? Thank you.

Hi @wangziheng,

Ok, I understand your question now. Unfortunately, I don’t think this is possible right now. Feel free to open a feature request issue with the Plotly.js project at https://github.com/plotly/plotly.js/issues to discuss the possibility.

-Jon

Is it possible to depict 2 or more points with the same y-values in a scatterplot now?

I would like to have

newplot-1

instead of

newplot-2

using this code

import plotly.express as px
import pandas as pd

df = pd.read_excel('/Users/Jakob/Documents/python_notebooks/data/tips_2.xlsx')

fig = px.scatter(df, x='day', y='total_bill', color="day")

#  Customization of y-axis
#fig.update_yaxes(range=[0, 10])

# Figure layout
fig.update_layout(template='simple_white',  width=400, height=500, title='Main Title', yaxis_title='Distance moved',
                  legend=dict(title='', itemclick='toggle', itemsizing='constant', traceorder='normal',
                  bgcolor='rgba(0,0,0,0)', x=1),
                  xaxis=dict(title='This is a title', showticklabels=True, ticks='outside', type='category')
                 )
# Make figure zoomable
config = dict({'scrollZoom': False})

fig.show(config=config)

Data is here > https://www.dropbox.com/s/za5e81lksyipztm/tips_2.xlsx?dl=0

1 Like

Hi @windrose,

I think the strip' function from plotly.express` does what you want.

import pandas as pd
import plotly.express as px

df = pd.read_excel("https://www.dropbox.com/s/za5e81lksyipztm/tips_2.xlsx?dl=1")

fig = px.strip(df, x='day', y='total_bill', color="day")
#  Customization of y-axis
#fig.update_yaxes(range=[0, 10])

# Figure layout
fig.update_layout(template='simple_white',  width=400, height=500, title='Main Title', yaxis_title='Distance moved',
                  legend=dict(title='', itemclick='toggle', itemsizing='constant', traceorder='normal',
                  bgcolor='rgba(0,0,0,0)', x=1),
                  xaxis=dict(title='This is a title', showticklabels=True, ticks='outside', type='category')
                 )
# Make figure zoomable
config = dict({'scrollZoom': False})

fig.show(config=config)

1 Like

Awesome, @Alexboiboi, thanks a lot! Is it also possible to use the jitter parameter or something like that to control the spacing between individual dots?

fig = px.strip(df, x='day', y='total_bill', color="day").update_traces(jitter = 1)

actually works quite well; thanks again.

import pandas as pd
import plotly.express as px

df = pd.read_excel("https://www.dropbox.com/s/za5e81lksyipztm/tips_2.xlsx?dl=1")

fig = px.strip(df, x='day', y='total_bill', color="day").update_traces(jitter = 1)
#  Customization of y-axis
#fig.update_yaxes(range=[0, 10])

# Figure layout
fig.update_layout(template='simple_white',  width=400, height=500, title='Main Title', yaxis_title='Distance moved',
                  legend=dict(title='', itemclick='toggle', itemsizing='constant', traceorder='normal',
                  bgcolor='rgba(0,0,0,0)', x=1),
                  xaxis=dict(title='This is a title', showticklabels=True, ticks='outside', type='category')
                 )
# Make figure zoomable
config = dict({'scrollZoom': False})

fig.show(config=config)

newplot-13

One more thing I would like to get in there is error bars.

Like here

index

import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams
import pandas as pd
import numpy as np
import math

sns.set(style="white")

df = pd.read_csv('/Users/Jakob/Documents/python_notebooks/data/tips.csv')

#calculate standard error of the mean

std = df['total_bill'].std()
mean = df['total_bill'].mean()
count = df['total_bill'].count()
sem = std/math.sqrt(count)


#define sd and sem
mean = tips.groupby('day').total_bill.mean()
sem = tips.groupby('day').total_bill.std() / np.sqrt(tips.groupby('day').total_bill.count())
plt.errorbar(range(len(mean)), mean, yerr=sem, capsize=5, color='black', alpha=0.8,
             linewidth=2, linestyle='', marker='o')

#sns.barplot(x="day", y="total_bill", data=tips, capsize=0.1, ci="sd",
            #errwidth=1, linewidth=5, palette = 'Blues', alpha=0.3)
sns.swarmplot(x="day", y="total_bill", data=tips, color="black", alpha=1, palette='rainbow', zorder=1)
#sns.pointplot(x='day', y='total_bill', data=tips, #ci=95, linestyles='None',
              #color="grey", capsize=0.1, errwidth=1.5, opacity=0.1, estimator=np.mean)


sns.despine(left=True, bottom=True)
rcParams['figure.figsize'] = 10,8
plt.show()
print(sem)
print(count)

Do you also have a suggestion for that perhaps @Alexboiboi?

yes

just update the figure:

yourjittervalue = 1
fig.update_traces(jitter=yourjittervalue)
1 Like

You could maybe make use of the `px.box’ function

fig = px.box(df, x='day', y='total_bill', color="day", points='all')

or the ‘px.violin’ function:
fig = px.violin(df, x=‘day’, y=‘total_bill’, color=“day”, points=‘all’)

1 Like

Thanks a lot @Alexboiboi; however, I would like to plot just the data points (dots without boxes or violines), the mean, and the standard error of the mean?

do you mean like this, if you add an additional trace to your code:

dm = df.groupby('day').mean()
ds = df.groupby('day').std()

fig.add_scatter(x=dm.index, y=dm['total_bill'], 
                error_y_array=ds['total_bill'],
                mode='markers', showlegend=False)

1 Like

Yes, that is right!
And if I want to get a horizontal line for the sem (shown in magenta below) instead of the green dot?
Can I calculate the mean and sem as above (How to show overlap points in scatter plot) and then tell plotly to plot these values in the graph?

Hi @Alexboiboi, thanks a lot for your help with this; I really appreciate it.

I now got what I wanted with this code

import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

df = pd.read_excel("https://www.dropbox.com/s/za5e81lksyipztm/tips_2.xlsx?dl=1")

fig = px.strip(df, x='day', y='total_bill', color="day").update_traces(jitter = 1,
                                                                       opacity=0.8,
                                                                       marker_size=10,
                                                                       marker_line_width=1)

# Group and calculate the mean and sem
mean = df.groupby('day').mean()
sem = df.groupby('day').sem()


# Add traces for mean and sem
fig.add_trace(
    go.Scatter(
        mode='markers',
        x=dm.index, y=mean['total_bill'],
        error_y_array=sem['total_bill'],
        marker=dict(symbol='141', color='rgba(0,0,0,0.6)', size=30,
        line=dict(width=2)
        ),
        showlegend=False
    )
)

#  Customization of y-axis
#fig.update_yaxes(range=[0, 10])

# Figure layout
fig.update_layout(template='simple_white',  width=400, height=500, title='Main Title', yaxis_title='Distance moved',
                  legend=dict(title='', itemclick='toggle', itemsizing='constant', traceorder='normal',
                  bgcolor='rgba(0,0,0,0)', x=1),
                  #margin=dict(color="black",width=3),
                  xaxis=dict(title='This is a title', showticklabels=True, ticks='outside', type='category')
                 )

# Make figure zoomable
config = dict({'scrollZoom':True})

fig.show(config=config)

newplot-4

I still don’t understand the difference between fig.add_scatter and fig.add_trace? The result, however, appears fine.