Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for max.poll.records consumer configuration #1653

Closed
sumitjainn opened this issue Jan 29, 2018 · 14 comments
Closed

Add support for max.poll.records consumer configuration #1653

sumitjainn opened this issue Jan 29, 2018 · 14 comments

Comments

@sumitjainn
Copy link

This config allows consumers to control their rate of consumption, which would be very helpful in upstream throttling scenarios:

From the kafka documentation
max.poll.records: The maximum number of records returned in a single call to poll()

It has been available since kafka 0.11

Can this be added please?

@edenhill
Copy link
Contributor

There are two options:

  • extract the consumer queue with (queue_get_consumer()) and then use consume_batch_queue()
  • make your own batch reader by wrapping consumer_poll() until you get the desired number of messages.

@sumitjainn
Copy link
Author

Actually I am using the node wrapper https://github.com/Blizzard/node-rdkafka, and so can't directly use these options.

Wouldn't it be useful to support this config so that all the wrappers get the functionality without modifying the code.

@edenhill
Copy link
Contributor

This would require a new API since the current consumer_poll() API returns a single message, so it wouldn't automatically trickle down into the bindings.

@sumitjainn
Copy link
Author

sumitjainn commented Jan 29, 2018

Correct me if I am wrong.. but you don't fetch single messages from kafka right? Becuase thats not possible to control without using max.poll.records. So it means you are fetching multiple messages, storing them, and only providing single messages in the API.

So API change is not required, as this config only controls the rate of fetching from kafka, you can provide messages one at a time to the library user as before.

@edenhill
Copy link
Contributor

Not sure I follow what you are asking for, do you want:

  • a consume API that returns a set of messages according to max.poll.records?
  • being able to control the number of messages fetched from the broker?

librdkafka pre-fetches messages from the broker into an internal queue which is then served by the application when it calls consumer_poll() (et.al.).

@sumitjainn
Copy link
Author

I only need to control the rate of fetch from kafka. I was saying it might not require any changes in the API, since that is independent of the fetch rate.

@edenhill
Copy link
Contributor

Can you explain your use-case in more detail?
There are existing configuration options to modify the fetching behaviour:
queued.min.messages, queued.max.messages.kbytes, fetch.wait.max.ms, etc.

https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md

@edenhill
Copy link
Contributor

Here's an example of how to implement a batch consume interface:
https://github.com/edenhill/librdkafka/blob/master/examples/rdkafka_consume_batch.cpp#L97

@sumitjainn
Copy link
Author

Can't consumer->consume(remaining_timeout) give you more messages than batch_size ?

queued.min.messages, queued.max.messages.kbytes, fetch.wait.max.ms allow us to limit the rate to a certain extent only, and are not a foolproof method to limit the fetch rate to say a 100 messages/sec especially if you have great variation in message sizes.

Instead, a single conf max.poll.records can be used to guarantee that we don't fetch more messages from kafka than configured. I think this is a very valuable feature, lack of throttling capability is a major hinderance to kafka adoption in my org. Official java client already has this feature, but we are restricted to nodejs.

@edenhill
Copy link
Contributor

edenhill commented Feb 22, 2018

Actually, there is no way to ask the broker for a maximum number of messages, only a maximum total size of messages, this means the Java consumer will also fetch more than max.poll.records but only return a subset of the fetched messages per poll() call.

Even with librdkafka's pre-fetching the fetch rate of the consumer will over time correspond to the consume rate of the application: as the internal fetchq fills up from pre-fetched messages the fetcher will stop fetching until the application has consumed enough messages to make the fetchq drop below the configured thresholds (be it queued.min.messages or queued.max.message.kbytes).

@choudhary001
Copy link

@edenhill Is there any plan to provide support for Java's 'max.poll.records' property in c/c++ ?

@edenhill
Copy link
Contributor

edenhill commented Apr 2, 2019

@choudhary001 You mean for the consume_batch() API?

@nick-zh
Copy link
Contributor

nick-zh commented Jul 25, 2019

@edenhill sry, i was reading this issue because there was a question in an issue regarding adding batch consume to php.
Do i understand correctly that with rd_kafka_consume_batch and it's parameter rkmessages_size i can control the amount of messages i get and this discussion is more about that a batch consume function could / should respect max.poll.records instead of relying on a parameter for the batch size?

@edenhill
Copy link
Contributor

@nick-zh Yep, that sounds about right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants