-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Same task runs multiple times at once? #4400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Maybe this is related to visibility timeout? |
@georgepsarakis Could you please elaborate on your suspicion? |
As far as I know, this is a known issue for broker transports that do not have built-in acknowledgement characteristics of AMQP. The task will be assigned to a new worker if the task completion time exceeds the visibility timeout, thus you may see tasks being executed in parallel. |
@georgepsarakis So if the task is scheduled far ahead in the future, then I might see the above? The “visibility timeout” addresses that? From the documentation you linked:
Meaning that if within the hour the worker does not ack the task (i.e. run it?) that task is being sent to another worker which wouldn’t ack, and so on… Indeed this seems to be the case looking at the caveats section of the documentation; this related issue celery/kombu#337; or quoting from this blog:
Looks like setting the |
I would say that if you increase the visibility timeout to 2 hours, your tasks will be executed only once. So if you combine:
I think what happens is:
Looking into the Redis transport implementation, you will notice that it uses Sorted Sets, passing the queued time as a score to zadd. The message is restored based on that timestamp and comparing to an interval equal to the visibility timeout. Hope this explains a bit the internals of the Redis transport. |
@georgepsarakis, I’m now thoroughly confused. If a task’s ETA is set for two months from now, why would a worker pick it up one hour after the tasks has been scheduled? Am I missing something? My (incorrect?) assumption is that:
Your “I think what happens is:” above is quite different from my assumption. |
I also encountered the same problem,have you solved it? @jenstroeger Thanks! |
@jenstroeger that does not sound like a feasible flow, I think the worker just continuously requeues the message in order to postpone execution until the ETA condition is finally met. The concept of the queue is to distribute messages as soon as they arrive, so the worker examines the message and just requeues. Please note that this is my guess, I am not really aware of the internals of the ETA implementation. |
@zivsu, as mentioned above I’ve set the I do not know the cause of the original problem nor how to address it properly. |
@jenstroeger I read some blog, change |
@zivsu, can you please share the link to the blog? Did you use Redis before? |
@jenstroeger I can't find the blog, I used Redis as broker before. For schedule task, I choose rebbitmq to avoid the error happen again. |
I have exactly same issue, my config is: settings:
And that is the only one working combination of this settings. Also I have to schedule my periodic tasks in UTC timezone. |
I have the save problem. the eta task excute multiply times when using redis as a broker. |
Using redis, there is a well-known issue when you specify a timezone other than UTC. To work around the issue, subclass the default app, and add your own timezone handling function:
Hope that helps anyone else that is running into this problem. |
I get this problem sometimes when frequently restarting celery jobs with beat on multicore machines. I've gotten in the habit of running Best advice I have is to always make sure you see the "restart DONE" message before disconnecting from the machine. |
if we check Received task timestamps, every hour it will get new task with same id. The result is that all ETA messages are sent more than 10 times. Looks like rabbitmq is only option if we want to use ETA |
Rcently meet similar bug. Also |
Might another solution be disabling ack emulation entirely? i.e. |
can someone check this pr https://github.com/vinayinvicible/kombu/commit/a755ba14def558f2983b3ff3358086ba55521dcc |
This seems to be a feasible way, https://github.com/cameronmaske/celery-once |
Hi did this actually work ? . I have the same issue where the celery workers execute the same task multiple times. Does this actually prevents multiple workers from executing the same task or it prevents them from executing the same code. |
@omya3 I tried it and it doesn’t seem to work. I suggest changing the |
I have same problem. ETA tasks are executed multiple times with very low interval in-between (like .001 of second). |
Looking at the same problem right now. gevent seems to be what's causing it - or at least, switching to eventlet does not show this behaviour. Seems to be a few seconds between the workers receiving the task again. Sometimes, the first worker has started the task before it is received again. It can be received many times by the same worker, or it can be received many times by different workers. |
Did not work for me. Same issue with eventlet. |
same for me. having the same issues with eventlet. |
I had the same problem, I did set my task to run in the future (24 hours later) and the same problem occurred. using redis as a broker for celery causes the issue. |
I had same problem but seems if a task execute multiple time, all the executions will be on same worker run sequentially and their id will be same so a efficient (but not realy good) solution is cache id of task in first execution for nearly long time and after first execution block other ones that have same id, combination of this idea with nearly high # celery version 5.2.1
def block_multiple_celery_task_execution(task, prefix):
# prefix actully is not necessary but I thing it's good practice to seperate id of each task
is_task_executed = cache.get(f"{prefix}_{task.request.id}")
if not is_task_executed:
cache.set(f"{prefix}_{task.request.id}", True, timeout=60 * 30) # 30 min
return False
return True
@shared_task(bind=True)
def some_task(self):
prefix = "some_task"
if block_multiple_celery_task_execution(self, prefix):
return
# task codes goes here I tested it in development with just |
could celery create a delay queue use redis zset? And use delay queue to host ETA Task |
Are there any updates on this issue? |
I was having same issue, we managed to solve the problem... the problem was that we schedule the task in django (m/h/dm/my/d) * 13 * * 1-5. We made this to run the task at 13pm every day of week, but in django we need to specify minutes! after editing the schedule to (m/h/dm/my/d) 00 13 * * 1-5, problem solved! |
|
It is interesting, but my fix for this issue was just adding names for the task. For me this was simply @shared_task(name="yourname") |
The issue is a repost of an unattended Google groups post Same task runs multiple times?
My application schedules a single group of two, sometimes three tasks, each of which with their own ETA within one hour. When the ETA arrives, I see the following in my celery log:
This can repeat dozens of times. Note the first task’s 33 seconds execution time, and the use of different workers!
I have no explanation for this behavior, and would like to understand what’s going on here.
The text was updated successfully, but these errors were encountered: