I have some callback functions that take forever and I was already using RQ. as a task queue, but some things were taking too long due to processing power and you can’t call a multiprocessing.Pool.apply_async
from gunicorn using eventlets…so I made multi-rq
to do the same thing but with RQ.
It’s a little different, in that you don’t create a list of objects and then call .get()
on them, and unlike RQ you don’t have to manually do the checking to wait until it’s finished. Instead, you just provide the function to apply and a list of argument iterables. You can optionally provide your own completion-checking and data processing functions.
Basic use
# basic_test.py
import time
def wait(i,j):
print(i)
return sum((i,j))
import rq
from multi_rq import MultiRQ
from basic_test import wait
mrq = MultiRQ()
nums = [(i,j) for i,j in zip(range(10),range(10))]
mrq.apply_async(wait, nums, mode='results')
# >>> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
# mode 'results' means you get the results of the jobs; 'jobs' means you get the jobs themselves.
Use with Dash
Example: Doing a big processing task asynchronously in chunks of 5000
def df_process_func(df, my_arg):
# do something with a df
return df
@app.callback(...)
def do_this(...):
mrq = MultiRQ()
df = pd.read_csv('large_data.csv')
df = [ (df.iloc[i:i+500], 'arg') for i in range(0,df.shape[0],5000) ]
# a list of the outputs of df_process_func, 5000 at a time
output = mrq.apply_async(df_process_func,df)
# put them all back together, for example
df = pd.concat(output)
This is a little more about web development than about Dash, but it’s an issue I ran into when trying to learn how to deploy Dash. I welcome feedback, issues, and pull requests.