Automatic generation of multifactorial boxplots with plotly in Python


#1

Hi all,

I am trying to use plotly for creating inline boxplots in jupyter notebooks running on a python kernel. My problem is that all example code I was able to find requires to define a trace for each box explicitly, e.g.:

import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np

y0 = np.random.randn(50)-1
y1 = np.random.randn(50)+1

trace0 = go.Box(
y=y0
)
trace1 = go.Box(
y=y1
)
data = [trace0, trace1]
py.iplot(data)

This is fine if I only have a few boxplots, but I generally work with multifactorial data. I am looking for a way to use grouping variables to generate boxplots. This is reasonably straightforward with plotly in R:

p <- plot_ly(diamonds, y = ~price, color = I(“black”),
alpha = 0.1, boxpoints = “suspectedoutliers”)
p1 <- p %>% add_boxplot(x = “Overall”)
p2 <- p %>% add_boxplot(x = ~cut)
subplot(
p1, p2, shareY = TRUE,
widths = c(0.2, 0.8), margin = 0
) %>% hide_legend()

However, I am still not sure this has the same flexibility as R base plotting would allow via the formula syntax, where I can easily plot multiple boxplots arranged by factorial levels (y~xuv etc). Is there any way to achieve this with plotly in python without handily defining a trace for each individual box (I tried using the R-code in my python kernel with rpy2 but the plot doesn’t show - it works on an IRkernel, though)?

Thanks
D


#2

@Thriceguy In order to plot many boxplots you should process your data, and define an adequate function
that returns a boxplot trace. More precisely your function should have arguments and kwargs that
cover all needed features for your boxplots. Here is an example.


#3

Thanks a lot! I somehow was unable to find this example…