Issues with ff.create_distplot()

#1

I have a number of issues with ff.create_distplot():

1. Consider the following data:

import numpy as np

import plotly.graph_objs as go
import plotly.figure_factory as ff
m = np.random.normal(loc=0.08, scale=0.0008, size=5000)

Histogram of the data:

fig = go.FigureWidget()
fig.add_histogram(x=m)
fig

However, when I try to produce a density plot using the figure factory, it does not produce what I want:

hist_data = [m]

group_labels = ['m1']
colors = ['#333F44']

# Create distplot
fig = go.FigureWidget(ff.create_distplot(hist_data, group_labels, show_hist=True, colors=colors))
fig.layout.update(title='Density curve')
fig

I can perhaps tinker with it until it gives me the right plot, but I think there is an issue there.

If I set show_hist=False, the plot looks much better:

The problem seems to be with the bins of the histogram. If we set scale=0.08 we can see that the histogram is displayed only in one bin:


2. Even though the histnorm is set to probability density by default, I did not manage to make it look like a probability density. It looks more like a frequency “distplot”.


3. The curve_type is set to kde. What kind of KDE is being used? I would like to try the epanechnikov kernel for instance.
Is a kde curve type meant to produce something like the density function in R?


4. When several distplots are combined, e.g.:

hist_data = [m, m+0.001]

group_labels = ['m1', 'm2']
colors = ['#333F44', '#37AA9C']

# Create distplot
fig = go.FigureWidget(ff.create_distplot(hist_data, group_labels, show_hist=False, colors=colors))
fig.layout.update(title='Density curve',
                                   )
fig

The rug plot as well as the legend do not appear in the logical sequence.
Sure, we can set

fig.layout.update(legend=dict(traceorder='normal'))

but I think the default should be the order in which they were added.

I also think that the distance between the rug plots is disproportionately big.

#2

Thanks for the detailed description of the issues you’re having with distplot @ursus,

I haven’t actually dug into how this figure factory works yet, so unfortunately I don’t have much guidance to offer at the moment.

@nicolaskruchten, what do you think about eventually adding a px.kde or pd.distplot function to plotly_express (https://github.com/plotly/plotly_express) to handle the distplot usecase?

@ursus, if we decide this is something that makes sense to implement in plotly_express we’ll likely direct our efforts there since plotly_express provides a much more unified and powerful API than the distplot figure factory currently does.

Thanks,
-Jon

#3

Thank you!

plotly_express is great!
:slight_smile:

#4

We could do a few things with px here.

  1. We could add a marginal kwarg to px.histogram to get the ability to do rug, violin and box marginals similar to what we have in px.scatter and px.density_contour
  2. We could add a ‘px.kde’ function that leverages go.Violin under the hood and uses its built-in points system to get the rug. (With an optional marginal kwarg too, why not!)
  3. We could convince the JS guys to add a KDE option to go.Histogram
  4. We could convince the JS guys to add points to go.Histogram
#5

I have mentioned R’s density function.
Mathematica also has a similar command which is really nice: SmoothKernelDistribution.

#6

I’ve implemented idea 1 above: px.histogram() now has a marginal option so you can add the rug there. Still no KDE option though. Toying with the idea of a new kde trace type in plotly.js at the moment… basically a blend of violin and histogram minus histfunc. Would also allow for smooth cumulative density functions which would be nice.