Skip to content

Same task runs multiple times at once? #4400

Open
@jenstroeger

Description

@jenstroeger

The issue is a repost of an unattended Google groups post Same task runs multiple times?

> ./bin/celery -A celery_app report

software -> celery:4.1.0 (latentcall) kombu:4.1.0 py:3.6.1
            billiard:3.5.0.3 redis:2.10.6
platform -> system:Linux arch:64bit, ELF imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:redis://localhost:6379/2

broker_url: 'redis://localhost:6379/2'
result_backend: 'redis://localhost:6379/2'
task_serializer: 'json'
result_serializer: 'json'
accept_content: ['json']
timezone: 'Europe/Berlin'
enable_utc: True
imports: 'tasks'
task_routes: {
 'tasks': {'queue': 'celery-test-queue'}}

My application schedules a single group of two, sometimes three tasks, each of which with their own ETA within one hour. When the ETA arrives, I see the following in my celery log:

[2017-11-20 09:55:34,470: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 33.81780316866934s: None
[2017-11-20 09:55:34,481: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.009824380278587341s: None
[2017-11-20 09:55:34,622: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.14010038413107395s: None
…
[2017-11-20 09:55:37,890: INFO/ForkPoolWorker-8] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.012678759172558784s: None
[2017-11-20 09:55:37,891: INFO/ForkPoolWorker-2] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.01177949644625187s: None
[2017-11-20 09:55:37,899: INFO/ForkPoolWorker-8] Task tasks._test_exec[bd08ab85-28a8-488f-ba03-c2befde10054] succeeded in 0.008250340819358826s: None
…

This can repeat dozens of times. Note the first task’s 33 seconds execution time, and the use of different workers!

I have no explanation for this behavior, and would like to understand what’s going on here.

Activity

georgepsarakis

georgepsarakis commented on Nov 22, 2017

@georgepsarakis
Contributor

Maybe this is related to visibility timeout?

jenstroeger

jenstroeger commented on Nov 23, 2017

@jenstroeger
Author

@georgepsarakis Could you please elaborate on your suspicion?

georgepsarakis

georgepsarakis commented on Nov 24, 2017

@georgepsarakis
Contributor

As far as I know, this is a known issue for broker transports that do not have built-in acknowledgement characteristics of AMQP. The task will be assigned to a new worker if the task completion time exceeds the visibility timeout, thus you may see tasks being executed in parallel.

jenstroeger

jenstroeger commented on Nov 24, 2017

@jenstroeger
Author

@georgepsarakis So if the task is scheduled far ahead in the future, then I might see the above? The “visibility timeout” addresses that? From the documentation you linked:

The default visibility timeout for Redis is 1 hour.

Meaning that if within the hour the worker does not ack the task (i.e. run it?) that task is being sent to another worker which wouldn’t ack, and so on… Indeed this seems to be the case looking at the caveats section of the documentation; this related issue celery/kombu#337; or quoting from this blog:

But when developers just start using it, they regularly face abnormal behaviour of workers, specifically multiple execution of the same task by several workers. The reason which causes it is a visibility timeout setting.

Looks like setting the visibility_timeout to 31,540,000 seconds (one year) might be a quick fix.

georgepsarakis

georgepsarakis commented on Nov 25, 2017

@georgepsarakis
Contributor

I would say that if you increase the visibility timeout to 2 hours, your tasks will be executed only once.

So if you combine:

  • Redis broker
  • Late acknowledgement
  • ETA equal/above the visibility timeout
    you get multiple executions on the task.

I think what happens is:

  • After one hour passes one worker process starts processing the task.
  • A second worker will see that this message has been in the queue for longer than the visibility timeout and is being processed by another worker.
  • Message is restored in the queue.
  • Another worker starts processing the same message.
  • The above happens for all worker processes.

Looking into the Redis transport implementation, you will notice that it uses Sorted Sets, passing the queued time as a score to zadd. The message is restored based on that timestamp and comparing to an interval equal to the visibility timeout.

Hope this explains a bit the internals of the Redis transport.

jenstroeger

jenstroeger commented on Nov 27, 2017

@jenstroeger
Author

@georgepsarakis, I’m now thoroughly confused. If a task’s ETA is set for two months from now, why would a worker pick it up one hour after the tasks has been scheduled? Am I missing something?

My (incorrect?) assumption is that:

  • I schedule a task with an ETA at any time in the future; then
  • the task (i.e. its marshaled arguments) and ETA will sit in the queue until ETA arrives; then
  • a worker begins processing the task at ETA.

Your “I think what happens is:” above is quite different from my assumption.

zivsu

zivsu commented on Dec 17, 2017

@zivsu

I also encountered the same problem,have you solved it? @jenstroeger

Thanks!

georgepsarakis

georgepsarakis commented on Dec 17, 2017

@georgepsarakis
Contributor

@jenstroeger that does not sound like a feasible flow, I think the worker just continuously requeues the message in order to postpone execution until the ETA condition is finally met. The concept of the queue is to distribute messages as soon as they arrive, so the worker examines the message and just requeues.

Please note that this is my guess, I am not really aware of the internals of the ETA implementation.

jenstroeger

jenstroeger commented on Dec 21, 2017

@jenstroeger
Author

@zivsu, as mentioned above I’ve set the visibility_timeout to a very large number and that seems to have resolved the symptoms. However, as @georgepsarakis points out, that seems to be a poor approach.

I do not know the cause of the original problem nor how to address it properly.

zivsu

zivsu commented on Dec 21, 2017

@zivsu

@jenstroeger I read some blog, change visibility_timeout can not solve the problem completely, so I change my borker to rabbitmq.

jenstroeger

jenstroeger commented on Dec 22, 2017

@jenstroeger
Author

@zivsu, can you please share the link to the blog? Did you use Redis before?

zivsu

zivsu commented on Dec 22, 2017

@zivsu

@jenstroeger I can't find the blog, I used Redis as broker before. For schedule task, I choose rebbitmq to avoid the error happen again.

Anton-Shutik

Anton-Shutik commented on Apr 10, 2018

@Anton-Shutik

I have exactly same issue, my config is:
django==1.11.6
celery==4.2rc2
django-celery-beat==1.0.1

settings:

CELERY_ENABLE_UTC = True
# CELERY_TIMEZONE = 'America/Los_Angeles'

And that is the only one working combination of this settings. Also I have to schedule my periodic tasks in UTC timezone.
If you enable CELERY_TIMEZONE or disable CELERY_ENABLE_UTC it starts running periodic tasks multiple times.

80 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @craigds@tomharvey@JanMalte@reimundklain@xolan

        Issue actions

          Same task runs multiple times at once? · Issue #4400 · celery/celery