Plotly express: Performance of huge data amount

Varlor · November 27, 2019, 12:45pm

I have a question about the performance of some plotly express figures. If I use parallel coordines or densitiy heatmap with a dataframe of 100k rows and 4 columns its not possible to show the figure. Jupyter and Jupyterlab freezes. Is there any possibilty to use some method arguments for disabling some interactivity or binning some points or any other thing to make this possible?

Emmanuelle · November 27, 2019, 5:11pm

Hi @Varlor could you help us narrowing down the diagnosis by benchmarking on dummy data? For example the code below (corresponding to one million of rows) executes correctly on my Ubuntu laptop, on Firefox. How is it for you? What is the size limit causing a freeze of Jupyter/lab?

import plotly.express as px
import numpy as np
N = 1000000
x, y = np.random.randn(2, N)
fig = px.density_heatmap(x=x, y=y)
fig.show()

As a rule of thumb the browser is having a hard time when the data is of the order of 100 Mb (here a 1 million array corresponds to 8 Mo I believe, then it depends on the number of other arrays that the Javascript has to create in order to create the figure).

In order to downsample your data you can either slice it (x[::5]) or take random samples from you data

import plotly.express as px
import numpy as np
N = 1000000
x, y = np.random.randn(2, N)
mask = np.random.random(N) > 0.9 # keep roughtly 1/10th of data 
fig = px.density_heatmap(x=x[mask], y=y[mask])
fig.show()

or do some binning of data (x = 0.5 * (x[1:] + x[:-1])), if it makes sense to average together the data
The best method depends on the type of data you have :-).

Topic		Replies	Views
Performance issues when displaying large matrices as heatmaps 📊 Plotly Python	1	1201	May 27, 2019
Heatmap is slow for large data arrays 📊 Plotly Python	4	10190	October 10, 2019
Using plotly - heatmap with Jupyterlab very slow 📊 Plotly Python	4	2351	May 17, 2019
Maximum table size if you upload?	4	3533	August 11, 2016
Heatmap of large 2D array using datashader 📊 Plotly Python	1	545	November 27, 2020

Plotly express: Performance of huge data amount

Related Topics