Skip to content

NSQ Requeue and Backoff

Jud White edited this page Sep 1, 2016 · 2 revisions

NSQ docs: http://nsq.io/clients/building_client_libraries.html#a-namebackoffbackoffa

When an exception occurs throwing the exception from your handler is often the best method of dealing with it. Your handler's Execute method is wrapped in a try/catch by the NSQ client, your message will be requeued, and the exception details will be sent to the IMessageAuditor you have configured for logging.

The default when an exception is thrown is "requeue with backoff" – this is the kind of behavior you want if your dependencies are hiccuping and need some time to recover; if you don't know the source of the problem it's good to allow the default behavior by allowing the exception to bubble up. If you have code which is throwing when it should be handling a situation gracefully (i.e. not a timeout or service fault) this can significantly impact your throughput on that topic since you'll be in a backoff state much more often. The good aspect is you'll probably notice a queue that's draining slowly and go fix the problem.

It's often sufficient to know your messages will eventually be redelivered and that throwing lots of exceptions when you don't need to will hurt throughput. If you get an exception and wish to requeue without causing a Consumer Backoff use _bus.CurrentThreadMessage.RequeueWithoutBackoff and then throw the exception, but use this with caution because backoff serves a purpose in allowing your dependencies to recover.

Details:

There are two things at work when exceptions are thrown from handlers – Requeue (of the message) and Backoff (slowing consumer processing).

Message Requeue:

  • You can explicitly requeue with your own delay value, and you have the option of requeuing with or without backoff.
  • Manual requeue still increases the Attempts count of the message; the next time you receive the message you'll see the Attempts count at 2.
  • The effect of an increasing Attempts count is:
    • When MaxAttempts is exceeded the message will be given up on permanently.
    • Logging (if you're logging the Attempts value)
    • Default requeue time (described below)
    • Any custom logic you've built around the Attempts count. It's not recommend to take special steps on the count, but the option is there if you need it.
  • When you let the client requeue your message automatically it's requeued according to the following formula and configuration options:
    • min(message.Attempts * config.DefaultRequeueDelay, config.MaxRequeueDelay)
    • Message requeue is always linear.
    • Default config.DefaultRequeueDelay = 90s
    • Default config.MaxRequeueDelay = 15m
    • Default config.MaxAttempts = 5 (if you set MaxAttempts = 0 it will try indefinitely)
    • Using the defaults a continually failing message would be requeued for 90s, 180s, 270s, 360s, and then given up on. IMessageAuditor will receive detailed information.
    • These requeue times are minimum times; a queue that's under heavy load does not guarantee when you will see the message again. It wouldn't be surprising to see a message take hours to reappear on a topic under heavy load.

Consumer Backoff:

  • When a backoff occurs the Consumer temporarily sets its RDY (ready) count to 0, which prevents nsqd from sending it more messages.
  • Backoff happens at a per-Consumer level, which typically means per-machine and per-topic. One bad handler or bad machine won't affect other handlers' or machines' processing rates.
  • After the backoff time has expired it will try 1 message. If it's successful it goes back to full speed; if not it will backoff for an increasing amount of time up until a max.
  • There are three configuration options which affect backoff:
    • BackoffMultiplier (default = 1s)
    • MaxBackoffDuration (default = 2m)
    • BackoffStrategy (default = ExponentialStrategy)
  • When a backoff occurs an internal Consumer-level backoff counter is incremented. The backoff time is determined by the BackoffStrategy, BackoffMultiplier, and backoff counter.
  • ExponentialStrategy:
    • Backoff time = min(BackoffMultiplier * 2^backoff counter, MaxBackoffDuration)
    • Example: 2s, 4s, 8s, 16s
  • FullJitterStrategy:
    • Backoff time = min(rand(0, ExponentialStrategy), MaxBackoffDuration)
    • The reason for FullJitterStrategy is to prevent lock-step retries and evidence shows it allows a system to recover faster in the typical case.
    • There's a full discussion here: http://www.awsarchitectureblog.com/2015/03/backoff.html
  • Once successful processing happens messages will be allowed to flow and the backoff counter is decremented for each successful message.
  • If another exception occurs before the backoff counter reaches 0 it will backoff according to the current backoff counter value. This allows the dependency more time to recover without resetting our backoff on the first successful message.
  • If the IBackoffStrategy implementation exceeds MaxBackoffDuration the backoff counter is not incremented.
    • This prevents the case where, for example, 100 backoffs occur due to a network issue. Although MaxBackoffDuration would still be respected, it would take 100 successful messages to return our backoff counter to 0. If after 50 messages process successfully one fails it would still use MaxBackoffDuration if the backoff counter wasn't capped. This is especially helpful for lower volume topics where retries could cause the backoff to run up quickly and successful messages to take much longer to run down the counter.
Clone this wiki locally