Preventing Wasteful Parallel Callbacks


#1

In my app multiple callbacks depend on a shared result. Calculating this shared result once is slow enough, but having multiple processes (or threads) repeatedly doing the calculation is even slower.

I think I am looking for a cache-like solution that when the cache returns a miss, the requesting callbacks will simply wait until the first callback is finished and then grab the same result. I have tried using Flask-Caching but in my tests it seems like if there is a cache miss then all workers/threads will still execute the memoized function.

At the end of this post. I have some code to demonstrate the problem. Notice that “Calculating” gets printed at least twice to the console (and perhaps more times if you are using more than one worker).

I have a solution in place using Python’s thread synchronisation tools (locks etc) however this only works for a single worker/process. What can I do to fix this when using multiple workers/processes?

import dash
import dash_html_components as html
from flask_caching import Cache
import time

app = dash.Dash(__name__)

server = app.server

cache = Cache(app.server, config={
    'CACHE_TYPE': 'filesystem',
    'CACHE_DIR': 'cache-directory'
})

app.layout = html.Div(children=[
    html.Button('Submit', id='button'),
    html.Div(id='output-container-button1', children=[]),
    html.Div(id='output-container-button2', children=[]),
])

# This should only ever be called once!
@cache.memoize()
def slow_function(argument_1):
    print("Calculating")
    time.sleep(3)
    return 1

@app.callback(
    dash.dependencies.Output('output-container-button1', 'children'),
    [dash.dependencies.Input('button', 'n_clicks')])
def update_output1(n_clicks):
    value = slow_function(3)
    return 'The button has been clicked {} times'.format(value)

@app.callback(
    dash.dependencies.Output('output-container-button2', 'children'),
    [dash.dependencies.Input('button', 'n_clicks')])
def update_output2(n_clicks):
    value = slow_function(3)
    return 'The button has been clicked {} times'.format(value)

if __name__ == '__main__':
    app.run_server(debug=True)

#2

I was able to solve this using a combination of:

  • multiprocessing.lock() (instead of threading.lock())
  • gunicorn preloading

If you use the preload option with gunicorn then you can share objects between processes using the multiprocessing family of proxy objects.


#3

I’ve spun my code off into a module, which is available via pip https://github.com/sjtrny/jitcache

I hope that others get some use out of this. I have an example using Dash here https://jitcache.readthedocs.io/en/latest/dash.html, which I have copied below:

REDACTED: CODE OUTDATED. Refer to the following post for updated code https://community.plot.ly/t/preventing-wasteful-parallel-callbacks/18956/5?u=sjtrny

#4

Nice! Thanks for sharing!


#5

Since the other day I have changed the design of jitcache to be more in line with LRU Cache and Flask-Caching by using a decorator instead.

from jitcache import Cache

cache = Cache()

@cache.memoize
def slow_fn(input_1, input_2, input_3=10):
    return input_1 * input_2 * input_3

print(slow_fn(10, 2))

For plot.ly you can either decorate entire callbacks (just like in Dash’s Performance Docs) or you can decorate a subroutine. Below I demonstrate how to decorate callbacks (you can find more documentation here):

import dash
import dash_html_components as html
from jitcache import Cache
import dash_core_components as dcc

cache = Cache()

app = dash.Dash(__name__)

server = app.server
app.layout = html.Div(
    children=[
        html.Div(id="output-container-dropdown1", children=[]),
        html.Div(id="output-container-dropdown2", children=[]),
        dcc.Dropdown(
            options=[
                {"label": "New York City", "value": "NYC"},
                {"label": "Montréal", "value": "MTL"},
                {"label": "San Francisco", "value": "SF"},
            ],
            value="MTL",
            id="dropdown",
        ),
    ]
)

@app.callback(
    dash.dependencies.Output("output-container-dropdown1", "children"),
    [dash.dependencies.Input("dropdown", "value")],
)
@cache.memoize
def update_output1(input_dropdown):
    print("run1")

    return input_dropdown

@app.callback(
    dash.dependencies.Output("output-container-dropdown2", "children"),
    [dash.dependencies.Input("dropdown", "value")],
)
@cache.memoize
def update_output2(input_dropdown):
    print("run2")

    return input_dropdown

if __name__ == "__main__":
    app.run_server(debug=True)

#6

I know that jitcache is not mature, but perhaps one day it could be added to the Performance Docs as it provides improved functionality over both LRU Cache and Flask-Caching since duplicate calls to functions are held back when there is a cache miss. This means your computer doesn’t waste cycles computing the same function when you trigger a change in your app. Moreover it is thread/process safe and has no dependencies such as Redis.


#7

This looks really great. Once we investigate this a little bit more, we’ll consider adding it to the docs. Thanks for sharing this and keep the thread updated with your progress!