Open
Description
The issue is a repost of an unattended Google groups post Same task runs multiple times?
> ./bin/celery -A celery_app report
software -> celery:4.1.0 (latentcall) kombu:4.1.0 py:3.6.1
billiard:3.5.0.3 redis:2.10.6
platform -> system:Linux arch:64bit, ELF imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:redis results:redis://localhost:6379/2
broker_url: 'redis://localhost:6379/2'
result_backend: 'redis://localhost:6379/2'
task_serializer: 'json'
result_serializer: 'json'
accept_content: ['json']
timezone: 'Europe/Berlin'
enable_utc: True
imports: 'tasks'
task_routes: {
'tasks': {'queue': 'celery-test-queue'}}
My application schedules a single group of two, sometimes three tasks, each of which with their own ETA within one hour. When the ETA arrives, I see the following in my celery log:
[2017-11-20 09:55:34,470: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 33.81780316866934s: None
[2017-11-20 09:55:34,481: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.009824380278587341s: None
[2017-11-20 09:55:34,622: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.14010038413107395s: None
…
[2017-11-20 09:55:37,890: INFO/ForkPoolWorker-8] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.012678759172558784s: None
[2017-11-20 09:55:37,891: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.01177949644625187s: None
[2017-11-20 09:55:37,899: INFO/ForkPoolWorker-8] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.008250340819358826s: None
…
This can repeat dozens of times. Note the first task’s 33 seconds execution time, and the use of different workers!
I have no explanation for this behavior, and would like to understand what’s going on here.
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
georgepsarakis commentedon Nov 22, 2017
Maybe this is related to visibility timeout?
jenstroeger commentedon Nov 23, 2017
@georgepsarakis Could you please elaborate on your suspicion?
georgepsarakis commentedon Nov 24, 2017
As far as I know, this is a known issue for broker transports that do not have built-in acknowledgement characteristics of AMQP. The task will be assigned to a new worker if the task completion time exceeds the visibility timeout, thus you may see tasks being executed in parallel.
jenstroeger commentedon Nov 24, 2017
@georgepsarakis So if the task is scheduled far ahead in the future, then I might see the above? The “visibility timeout” addresses that? From the documentation you linked:
Meaning that if within the hour the worker does not ack the task (i.e. run it?) that task is being sent to another worker which wouldn’t ack, and so on… Indeed this seems to be the case looking at the caveats section of the documentation; this related issue celery/kombu#337; or quoting from this blog:
Looks like setting the
visibility_timeout
to 31,540,000 seconds (one year) might be a quick fix.georgepsarakis commentedon Nov 25, 2017
I would say that if you increase the visibility timeout to 2 hours, your tasks will be executed only once.
So if you combine:
you get multiple executions on the task.
I think what happens is:
Looking into the Redis transport implementation, you will notice that it uses Sorted Sets, passing the queued time as a score to zadd. The message is restored based on that timestamp and comparing to an interval equal to the visibility timeout.
Hope this explains a bit the internals of the Redis transport.
jenstroeger commentedon Nov 27, 2017
@georgepsarakis, I’m now thoroughly confused. If a task’s ETA is set for two months from now, why would a worker pick it up one hour after the tasks has been scheduled? Am I missing something?
My (incorrect?) assumption is that:
Your “I think what happens is:” above is quite different from my assumption.
zivsu commentedon Dec 17, 2017
I also encountered the same problem,have you solved it? @jenstroeger
Thanks!
georgepsarakis commentedon Dec 17, 2017
@jenstroeger that does not sound like a feasible flow, I think the worker just continuously requeues the message in order to postpone execution until the ETA condition is finally met. The concept of the queue is to distribute messages as soon as they arrive, so the worker examines the message and just requeues.
Please note that this is my guess, I am not really aware of the internals of the ETA implementation.
jenstroeger commentedon Dec 21, 2017
@zivsu, as mentioned above I’ve set the
visibility_timeout
to a very large number and that seems to have resolved the symptoms. However, as @georgepsarakis points out, that seems to be a poor approach.I do not know the cause of the original problem nor how to address it properly.
zivsu commentedon Dec 21, 2017
@jenstroeger I read some blog, change
visibility_timeout
can not solve the problem completely, so I change my borker torabbitmq
.jenstroeger commentedon Dec 22, 2017
@zivsu, can you please share the link to the blog? Did you use Redis before?
zivsu commentedon Dec 22, 2017
@jenstroeger I can't find the blog, I used Redis as broker before. For schedule task, I choose rebbitmq to avoid the error happen again.
Anton-Shutik commentedon Apr 10, 2018
I have exactly same issue, my config is:
django==1.11.6
celery==4.2rc2
django-celery-beat==1.0.1
settings:
And that is the only one working combination of this settings. Also I have to schedule my periodic tasks in UTC timezone.
If you enable
CELERY_TIMEZONE
or disableCELERY_ENABLE_UTC
it starts running periodic tasks multiple times.80 remaining items