Description
celery: 3.1.23 (Cipater)
rabbitmq-server:3.2.4-1
i found some task send to rabbitmq and then celery worker get duplicate task .
function:
def do_backup(name):
....
for example:
server: do_backup.delay("/tmp/__36")
worker02:
[2016-06-23 20:38:27,182: INFO/MainProcess] Received task: tasks.do_backup[40159186-1b88-4d98-aabf-839bb4b7a852]
[2016-06-23 20:38:27,183: INFO/MainProcess] tasks.do_backup[40159186-1b88-4d98-aabf-839bb4b7a852]: get file name: /tmp/__36
worker12:
[2016-06-23 20:36:21,647: INFO/MainProcess] Received task: tasks.do_backup[d265ffba-fa21-48b6-baad-2c0185a3168f]
........
[2016-06-23 20:37:01,492: INFO/MainProcess] tasks.do_backup[d265ffba-fa21-48b6-baad-2c0185a3168f]: get file name: /tmp/__36
[2016-06-23 20:37:01,554: INFO/MainProcess] tasks.do_backup[d265ffba-fa21-48b6-baad-2c0185a3168f]: do task
i make sure the server only send a "/tmp/__36"
Activity
ask commentedon Jun 28, 2016
They don't seem to be duplicates as they have different task ids?
anan-lee commentedon Jul 5, 2016
thanks for your reply, It is a fact that the same data (/tmp/__36) send to two works . I think the one task don't send ack in time, and so the rabbitmq send the this task to another one?
DurandA commentedon Sep 25, 2016
I have the same issue, I can see on the beat log that the task is only sent once but received twice by the worker a few seconds apart. When using
CELERY_ENABLE_UTC=True
, the worker receive two tasks, one in UTC time and the other in local time.I'm using Celery 3.1.23 (Cipater) with Redis 3.2.3.
cesterlizi commentedon Oct 7, 2016
Same Issue, but taskid is the same @ask @DurandA. Celery 3.1.23, Rabbit 3.6.5
I've also added some debug to verify that is ack -> https://github.com/cesterlizi/celery/blob/3daf891f7db10f3f4c6cd89da3c39cb637400b1f/celery/worker/job.py#L532-L534
late ack is enabled
[2016-10-07 12:08:25,875: INFO/MainProcess] Received task: recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]
[2016-10-07 12:08:25,876: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7fa3f9104e60> (args:(u'recrepli.tasks.replicate', u'7c508306-772f-464e-bb7c-103bc1dca784', [u'frogger', u'/home/gc/recordings_tmp/919/call_98277604.wav'], {}, {u'task': u'recrepli.tasks.replicate', u'group': None, u'is_eager': False, u'delivery_info': {u'priority': None, u'redelivered': False, u'routing_key': u'replicate', u'exchange': u'celery'}, u'args': [u'frogger', u'/home/gc/recordings_tmp/919/call_98277604.wav'], u'headers': {}, u'correlation_id': None, u'hostname': 'replicate_worker@toro', u'kwargs': {}, u'reply_to': None, u'id': u'7c508306-772f-464e-bb7c-103bc1dca784'}) kwargs:{})
[2016-10-07 12:08:25,876: DEBUG/MainProcess] on_accepted() for task recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]
[2016-10-07 12:08:25,876: INFO/Worker-8] recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]: Intentando replicar 0/None /home/gc/recordings_tmp/919/call_98277604.wav from AS frogger
[2016-10-07 12:08:25,876: DEBUG/MainProcess] Task accepted: recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784] pid:26289
[2016-10-07 12:08:25,876: DEBUG/Worker-8] recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]: Colectando /home/gc/recordings_tmp/919/call_98277604.wav from as frogger
......
[2016-10-07 12:08:26,587: DEBUG/MainProcess] on_sucess() for task recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784] return '0'
[2016-10-07 12:08:26,587: DEBUG/MainProcess] acknowledge(): for task recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]
[2016-10-07 12:08:26,587: INFO/MainProcess] Task recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784] succeeded in 0.710850441828s: 0
[2016-10-07 12:08:27,504: INFO/MainProcess] Received task: recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]
[2016-10-07 12:08:32,776: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7fa3f9104e60> (args:(u'recrepli.tasks.replicate', u'7c508306-772f-464e-bb7c-103bc1dca784', [u'frogger', u'/home/gc/recordings_tmp/919/call_98277604.wav'], {}, {u'task': u'recrepli.tasks.replicate', u'group': None, u'is_eager': False, u'delivery_info': {u'priority': None, u'redelivered': False, u'routing_key': u'replicate', u'exchange': u'celery'}, u'args': [u'frogger', u'/home/gc/recordings_tmp/919/call_98277604.wav'], u'headers': {}, u'correlation_id': None, u'hostname': 'replicate_worker@toro', u'kwargs': {}, u'reply_to': None, u'id': u'7c508306-772f-464e-bb7c-103bc1dca784'}) kwargs:{})
[2016-10-07 12:08:40,273: INFO/Worker-5] recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]: Intentando replicar 0/None /home/gc/recordings_tmp/919/call_98277604.wav from AS frogger
[2016-10-07 12:08:40,273: DEBUG/Worker-5] recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]: Colectando /home/gc/recordings_tmp/919/call_98277604.wav from as frogger
[2016-10-07 12:08:40,273: DEBUG/MainProcess] on_accepted() for task recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]
[2016-10-07 12:08:40,273: DEBUG/MainProcess] Task accepted: recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784] pid:26286
....
[2016-10-07 12:08:40,286: DEBUG/MainProcess] on_sucess() for task recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784] return '1'
[2016-10-07 12:08:40,286: DEBUG/MainProcess] acknowledge(): for task recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784]
[2016-10-07 12:08:40,286: INFO/MainProcess] Task recrepli.tasks.replicate[7c508306-772f-464e-bb7c-103bc1dca784] succeeded in 0.0134955914691s: 1
cesterlizi commentedon Oct 7, 2016
@ask I'am also attaching a log where yo can see that the task is duplicated and executed in parallel
[2016-10-07 12:13:25,966: INFO/MainProcess] Received task: recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]
[2016-10-07 12:13:25,966: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7fa3f9104e60> (args:(u'recrepli.tasks.replicate', u'8935556a-f6c5-4e0e-8709-d5958e437e08', [u'pacman', u'/home/gc/recordings_tmp/239/call_98278775.wav'], {}, {u'task': u'recrepli.tasks.replicate', u'group': None, u'is_eager': False, u'delivery_info': {u'priority': None, u'redelivered': False, u'routing_key': u'replicate', u'exchange': u'celery'}, u'args': [u'pacman', u'/home/gc/recordings_tmp/239/call_98278775.wav'], u'headers': {}, u'correlation_id': None, u'hostname': 'replicate_worker@toro', u'kwargs': {}, u'reply_to': None, u'id': u'8935556a-f6c5-4e0e-8709-d5958e437e08'}) kwargs:{})
[2016-10-07 12:13:25,967: DEBUG/MainProcess] on_accepted() for task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]
[2016-10-07 12:13:25,967: INFO/Worker-6] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Intentando replicar 0/None /home/gc/recordings_tmp/239/call_98278775.wav from AS pacman
[2016-10-07 12:13:25,967: DEBUG/MainProcess] Task accepted: recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08] pid:26287
[2016-10-07 12:13:25,967: DEBUG/Worker-6] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Colectando /home/gc/recordings_tmp/239/call_98278775.wav from as pacman
[2016-10-07 12:13:26,481: DEBUG/Worker-6] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Colectado /home/gc/recordings_collector/call_239_98278775.wav
[2016-10-07 12:13:26,482: DEBUG/Worker-6] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Convirtiendo a ogg /home/gc/recordings_collector/call_239_98278775.wav
[2016-10-07 12:13:26,754: DEBUG/Worker-6] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Ogg file /home/gc/recordings_collector/call_239_98278775.ogg
[2016-10-07 12:13:26,754: DEBUG/Worker-6] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Intentando replicar a los hosts
[2016-10-07 12:13:27,464: INFO/MainProcess] Received task: recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]
[2016-10-07 12:13:27,464: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7fa3f9104e60> (args:(u'recrepli.tasks.replicate', u'8935556a-f6c5-4e0e-8709-d5958e437e08', [u'pacman', u'/home/gc/recordings_tmp/239/call_98278775.wav'], {}, {u'task': u'recrepli.tasks.replicate', u'group': None, u'is_eager': False, u'delivery_info': {u'priority': None, u'redelivered': False, u'routing_key': u'replicate', u'exchange': u'celery'}, u'args': [u'pacman', u'/home/gc/recordings_tmp/239/call_98278775.wav'], u'headers': {}, u'correlation_id': None, u'hostname': 'replicate_worker@toro', u'kwargs': {}, u'reply_to': None, u'id': u'8935556a-f6c5-4e0e-8709-d5958e437e08'}) kwargs:{})
[2016-10-07 12:13:27,465: DEBUG/MainProcess] on_accepted() for task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]
[2016-10-07 12:13:27,465: DEBUG/MainProcess] Task accepted: recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08] pid:26291
[2016-10-07 12:13:27,465: INFO/Worker-10] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Intentando replicar 0/None /home/gc/recordings_tmp/239/call_98278775.wav from AS pacman
[2016-10-07 12:13:27,465: DEBUG/Worker-10] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Colectando /home/gc/recordings_tmp/239/call_98278775.wav from as pacman
[2016-10-07 12:13:27,980: DEBUG/Worker-10] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Colectado /home/gc/recordings_collector/call_239_98278775.wav
[2016-10-07 12:13:27,980: DEBUG/Worker-10] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Convirtiendo a ogg /home/gc/recordings_collector/call_239_98278775.wav
[2016-10-07 12:13:27,986: DEBUG/Worker-10] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Ogg file /home/gc/recordings_collector/call_239_98278775.ogg
[2016-10-07 12:13:27,986: DEBUG/Worker-10] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Intentando replicar a los hosts
[2016-10-07 12:13:28,627: DEBUG/Worker-6] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Replicado /home/gc/recordings_collector/call_239_98278775.ogg to hosts ['toro', 'bermeja', 'marsupial', 'pepe']
[2016-10-07 12:13:28,714: INFO/Worker-6] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Indexado: /home/gc/recordings_collector/call_239_98278775.ogg : {'timestamp': 1475853206, '_rev': u'1-ff292a5338b1a81855f44f1774b10010', 'storage': ['toro', 'bermeja', 'marsupial', 'pepe'], 'doctype': 'recording', 'filename': 'call_98278775.ogg', 'channels': 1, 'audioformat': 'ogg', 'duration': 193.44, '_id': u'98278775', 'tenantid': 239, 'size': 558034}
[2016-10-07 12:13:28,715: DEBUG/Worker-6] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: Enviando a replicar a google: /home/gc/recordings/239/call_98278775.ogg
[2016-10-07 12:13:28,723: DEBUG/MainProcess] on_sucess() for task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08] return '0'
[2016-10-07 12:13:28,723: DEBUG/MainProcess] acknowledge(): for task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]
[2016-10-07 12:13:28,724: INFO/MainProcess] Task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08] succeeded in 2.75686794892s: 0
[2016-10-07 12:13:28,910: ERROR/Worker-10] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: [Error 127] 'WebDav: LocalResourceNotFound()'
[2016-10-07 12:13:28,910: WARNING/Worker-10] recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]: No se pudo replicar: /home/gc/recordings_tmp/239/call_98278775.wav, reintentar
[2016-10-07 12:13:28,912: DEBUG/MainProcess] on_sucess() for task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08] return <ExceptionInfo: Retry(u'No se pudo replicar', 120)>
[2016-10-07 12:13:28,912: WARNING/MainProcess] on_success() for task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08] with ExceptionInfo: <ExceptionInfo: Retry(u'No se pudo replicar', 120)>
[2016-10-07 12:13:28,912: WARNING/MainProcess] on_failure(): for task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08] ExceptionInfo: <ExceptionInfo: Retry(u'No se pudo replicar', 120)>
[2016-10-07 12:13:28,912: WARNING/MainProcess] on_retry(): for task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08] ExceptionInfo: <ExceptionInfo: Retry(u'No se pudo replicar', 120)>
[2016-10-07 12:13:28,912: DEBUG/MainProcess] acknowledge(): for task recrepli.tasks.replicate[8935556a-f6c5-4e0e-8709-d5958e437e08]
cesterlizi commentedon Oct 13, 2016
Hi
I found that duplication is generated when I publish a task. I this case I'am sending the task to rabbit using JAVA rabbitmq API with message confirmation. For some reason (yet not understood) confirmation fails and I am sending the task again and two task message are enqueued to rabbitmq so two tasks are consumed and executed by celery worker
@ask dismiss my assertion in this issue
Carlos
Skyross commentedon Nov 24, 2016
Hi all
We've found same issue in several our projects when we used Redis as broker but seems it's not broker related problem. It's really looks like (we've checked the logs) any delayed task (ETA used, for example with
countdown
argument) several workers pulls same task (with same Task ID). And I found a lot of posts in Internet that describes that situation but in different words (post, issue). And there is repository specifically for that case. May be that lock should be implemented at Celery level for any broker?We using this our snippet to prevent such task behavior:
Feel free to test and point to any errors in this snippet.
andynguyen11 commentedon Nov 28, 2016
@Skyross I've been investigating a similar issue this morning as we are using Redis as our broker too. How long are your ETAs set to? We have multiple tasks with ETAs set anywhere in between 1-3 days. I suspect our issue (maybe yours as well) could possibly be this:
http://docs.celeryproject.org/en/latest/getting-started/brokers/redis.html#redis-visibility-timeout
topalhan commentedon Nov 28, 2016
@andynguyen11, I am not familiar with Redis but duplicate tasks happen on other brokers for various reasons (see #2976) and this solution would help.
@Skyross - Thanks for this. I see your solution uses django cache, which works for me but I am thinking if we can find a more generic solution. How about checking the list of existing scheduled tasks, which one can retrieve by using
celery.control.inspect().scheduled()
.Skip if the task is already scheduled e.g.
Also, are you using this for workflows or single tasks ?
Skyross commentedon Nov 30, 2016
Hi all again,
The previous version of my code contains at least one error: all tasks with
retry
will be skipped too. I'm testing new version of that class. As you may see now we don't release lock but just waiting for its expiration.Also I created public repository for that sandbox purpose (maybe later I'll move it to GitHub later if needed)
@andynguyen11 Yes, you were right! I came to the same reason. Anyway, sometimes we need to create delay further in the future than the
visibility_timeout
value.@topalhan Feel free to provide your version and compare performance 👍
UPD: moved repo to https://gitlab.com/the-past/sandbox/Django-1.8.13-Celery-3.1.23
auvipy commentedon Jan 15, 2018
what's it's situation after 4.1/master?
54 remaining items
auvipy commentedon Feb 25, 2021
can you check this celery/kombu#1098 (comment)
noamkush commentedon Feb 28, 2021
The task is received from sqs but isn't acked since there's a countdown. So celery waits till countdown is over but doesn't ack and meanwhile the visibility timeout expires and the task is redelivered (this is actually noted in caveats). What I suggested is that the celery process that is waiting for countdown would extend the visibility timeout so it won't be redelivered.
rui-ren commentedon Sep 8, 2021
If you have any insight for this case. How to solve it? I got the same issue. I only have one job and it is accepted on one instance and
1800 seconds
later another job is triggered to another instance. Because I was query databases and it crashes all my jobs.auvipy commentedon Sep 8, 2021
https://github.com/celery/celery/pull/6651/files
rui-ren commentedon Sep 8, 2021
Thank you ! I am using
rabbitmq
andcelery==4.2.0
, all the questions were forredis
, do you have any good links forrabittmq
setup?auvipy commentedon Sep 9, 2021
that is not a supported version.
rui-ren commentedon Sep 10, 2021