Add support for max.poll.records consumer configuration #1653

Closed

Add support for max.poll.records consumer configuration#1653

Labels

This config allows consumers to control their rate of consumption, which would be very helpful in upstream throttling scenarios:

From the kafka documentation
max.poll.records: The maximum number of records returned in a single call to poll()

It has been available since kafka 0.11

Can this be added please?

edenhill

Contributor

There are two options:

extract the consumer queue with (queue_get_consumer()) and then use consume_batch_queue()
make your own batch reader by wrapping consumer_poll() until you get the desired number of messages.

sumitjainn

Author

Actually I am using the node wrapper https://github.com/Blizzard/node-rdkafka, and so can't directly use these options.

Wouldn't it be useful to support this config so that all the wrappers get the functionality without modifying the code.

edenhill

Contributor

This would require a new API since the current consumer_poll() API returns a single message, so it wouldn't automatically trickle down into the bindings.

sumitjainn

Author

Correct me if I am wrong.. but you don't fetch single messages from kafka right? Becuase thats not possible to control without using max.poll.records. So it means you are fetching multiple messages, storing them, and only providing single messages in the API.

So API change is not required, as this config only controls the rate of fetching from kafka, you can provide messages one at a time to the library user as before.

edenhill

Contributor

Not sure I follow what you are asking for, do you want:

a consume API that returns a set of messages according to max.poll.records?
being able to control the number of messages fetched from the broker?

librdkafka pre-fetches messages from the broker into an internal queue which is then served by the application when it calls consumer_poll() (et.al.).

sumitjainn

Author

I only need to control the rate of fetch from kafka. I was saying it might not require any changes in the API, since that is independent of the fetch rate.

edenhill

Contributor

Can you explain your use-case in more detail?
There are existing configuration options to modify the fetching behaviour:
queued.min.messages, queued.max.messages.kbytes, fetch.wait.max.ms, etc.

https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md

added

Contributor

Here's an example of how to implement a batch consume interface:
https://github.com/edenhill/librdkafka/blob/master/examples/rdkafka_consume_batch.cpp#L97

edenhill

closed this as completed

on Feb 22, 2018

sumitjainn

Author

Can't consumer->consume(remaining_timeout) give you more messages than batch_size ?

queued.min.messages, queued.max.messages.kbytes, fetch.wait.max.ms allow us to limit the rate to a certain extent only, and are not a foolproof method to limit the fetch rate to say a 100 messages/sec especially if you have great variation in message sizes.

Instead, a single conf max.poll.records can be used to guarantee that we don't fetch more messages from kafka than configured. I think this is a very valuable feature, lack of throttling capability is a major hinderance to kafka adoption in my org. Official java client already has this feature, but we are restricted to nodejs.

edenhill

Contributor

Actually, there is no way to ask the broker for a maximum number of messages, only a maximum total size of messages, this means the Java consumer will also fetch more than max.poll.records but only return a subset of the fetched messages per poll() call.

Even with librdkafka's pre-fetching the fetch rate of the consumer will over time correspond to the consume rate of the application: as the internal fetchq fills up from pre-fetched messages the fetcher will stop fetching until the application has consumed enough messages to make the fetchq drop below the configured thresholds (be it queued.min.messages or queued.max.message.kbytes).

choudhary001

@edenhill Is there any plan to provide support for Java's 'max.poll.records' property in c/c++ ?

edenhill

Contributor

@choudhary001 You mean for the consume_batch() API?

3 remaining items

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

enhancementwait-info

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for max.poll.records consumer configuration #1653

3 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Add support for max.poll.records consumer configuration #1653

Description

Activity

edenhill commented on Jan 29, 2018

sumitjainn commented on Jan 29, 2018

edenhill commented on Jan 29, 2018

sumitjainn commented on Jan 29, 2018

edenhill commented on Jan 29, 2018

sumitjainn commented on Jan 29, 2018

edenhill commented on Jan 29, 2018

edenhill commented on Feb 22, 2018

sumitjainn commented on Feb 22, 2018

edenhill commented on Feb 22, 2018

choudhary001 commented on Apr 2, 2019

edenhill commented on Apr 2, 2019

3 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions