Closed
Description
This config allows consumers to control their rate of consumption, which would be very helpful in upstream throttling scenarios:
From the kafka documentation
max.poll.records: The maximum number of records returned in a single call to poll()
It has been available since kafka 0.11
Can this be added please?
Activity
edenhill commentedon Jan 29, 2018
There are two options:
sumitjainn commentedon Jan 29, 2018
Actually I am using the node wrapper https://github.com/Blizzard/node-rdkafka, and so can't directly use these options.
Wouldn't it be useful to support this config so that all the wrappers get the functionality without modifying the code.
edenhill commentedon Jan 29, 2018
This would require a new API since the current consumer_poll() API returns a single message, so it wouldn't automatically trickle down into the bindings.
sumitjainn commentedon Jan 29, 2018
Correct me if I am wrong.. but you don't fetch single messages from kafka right? Becuase thats not possible to control without using max.poll.records. So it means you are fetching multiple messages, storing them, and only providing single messages in the API.
So API change is not required, as this config only controls the rate of fetching from kafka, you can provide messages one at a time to the library user as before.
edenhill commentedon Jan 29, 2018
Not sure I follow what you are asking for, do you want:
max.poll.records
?librdkafka pre-fetches messages from the broker into an internal queue which is then served by the application when it calls consumer_poll() (et.al.).
sumitjainn commentedon Jan 29, 2018
I only need to control the rate of fetch from kafka. I was saying it might not require any changes in the API, since that is independent of the fetch rate.
edenhill commentedon Jan 29, 2018
Can you explain your use-case in more detail?
There are existing configuration options to modify the fetching behaviour:
queued.min.messages, queued.max.messages.kbytes, fetch.wait.max.ms, etc.
https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
edenhill commentedon Feb 22, 2018
Here's an example of how to implement a batch consume interface:
https://github.com/edenhill/librdkafka/blob/master/examples/rdkafka_consume_batch.cpp#L97
sumitjainn commentedon Feb 22, 2018
Can't consumer->consume(remaining_timeout) give you more messages than batch_size ?
queued.min.messages, queued.max.messages.kbytes, fetch.wait.max.ms allow us to limit the rate to a certain extent only, and are not a foolproof method to limit the fetch rate to say a 100 messages/sec especially if you have great variation in message sizes.
Instead, a single conf max.poll.records can be used to guarantee that we don't fetch more messages from kafka than configured. I think this is a very valuable feature, lack of throttling capability is a major hinderance to kafka adoption in my org. Official java client already has this feature, but we are restricted to nodejs.
edenhill commentedon Feb 22, 2018
Actually, there is no way to ask the broker for a maximum number of messages, only a maximum total size of messages, this means the Java consumer will also fetch more than
max.poll.records
but only return a subset of the fetched messages per poll() call.Even with librdkafka's pre-fetching the fetch rate of the consumer will over time correspond to the consume rate of the application: as the internal fetchq fills up from pre-fetched messages the fetcher will stop fetching until the application has consumed enough messages to make the fetchq drop below the configured thresholds (be it queued.min.messages or queued.max.message.kbytes).
choudhary001 commentedon Apr 2, 2019
@edenhill Is there any plan to provide support for Java's 'max.poll.records' property in c/c++ ?
edenhill commentedon Apr 2, 2019
@choudhary001 You mean for the consume_batch() API?
3 remaining items