Skip to content

Add support for max.poll.records consumer configuration #1653

Closed
@sumitjainn

Description

@sumitjainn

This config allows consumers to control their rate of consumption, which would be very helpful in upstream throttling scenarios:

From the kafka documentation
max.poll.records: The maximum number of records returned in a single call to poll()

It has been available since kafka 0.11

Can this be added please?

Activity

edenhill

edenhill commented on Jan 29, 2018

@edenhill
Contributor

There are two options:

  • extract the consumer queue with (queue_get_consumer()) and then use consume_batch_queue()
  • make your own batch reader by wrapping consumer_poll() until you get the desired number of messages.
sumitjainn

sumitjainn commented on Jan 29, 2018

@sumitjainn
Author

Actually I am using the node wrapper https://github.com/Blizzard/node-rdkafka, and so can't directly use these options.

Wouldn't it be useful to support this config so that all the wrappers get the functionality without modifying the code.

edenhill

edenhill commented on Jan 29, 2018

@edenhill
Contributor

This would require a new API since the current consumer_poll() API returns a single message, so it wouldn't automatically trickle down into the bindings.

sumitjainn

sumitjainn commented on Jan 29, 2018

@sumitjainn
Author

Correct me if I am wrong.. but you don't fetch single messages from kafka right? Becuase thats not possible to control without using max.poll.records. So it means you are fetching multiple messages, storing them, and only providing single messages in the API.

So API change is not required, as this config only controls the rate of fetching from kafka, you can provide messages one at a time to the library user as before.

edenhill

edenhill commented on Jan 29, 2018

@edenhill
Contributor

Not sure I follow what you are asking for, do you want:

  • a consume API that returns a set of messages according to max.poll.records?
  • being able to control the number of messages fetched from the broker?

librdkafka pre-fetches messages from the broker into an internal queue which is then served by the application when it calls consumer_poll() (et.al.).

sumitjainn

sumitjainn commented on Jan 29, 2018

@sumitjainn
Author

I only need to control the rate of fetch from kafka. I was saying it might not require any changes in the API, since that is independent of the fetch rate.

edenhill

edenhill commented on Jan 29, 2018

@edenhill
Contributor

Can you explain your use-case in more detail?
There are existing configuration options to modify the fetching behaviour:
queued.min.messages, queued.max.messages.kbytes, fetch.wait.max.ms, etc.

https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md

edenhill

edenhill commented on Feb 22, 2018

@edenhill
Contributor

Here's an example of how to implement a batch consume interface:
https://github.com/edenhill/librdkafka/blob/master/examples/rdkafka_consume_batch.cpp#L97

sumitjainn

sumitjainn commented on Feb 22, 2018

@sumitjainn
Author

Can't consumer->consume(remaining_timeout) give you more messages than batch_size ?

queued.min.messages, queued.max.messages.kbytes, fetch.wait.max.ms allow us to limit the rate to a certain extent only, and are not a foolproof method to limit the fetch rate to say a 100 messages/sec especially if you have great variation in message sizes.

Instead, a single conf max.poll.records can be used to guarantee that we don't fetch more messages from kafka than configured. I think this is a very valuable feature, lack of throttling capability is a major hinderance to kafka adoption in my org. Official java client already has this feature, but we are restricted to nodejs.

edenhill

edenhill commented on Feb 22, 2018

@edenhill
Contributor

Actually, there is no way to ask the broker for a maximum number of messages, only a maximum total size of messages, this means the Java consumer will also fetch more than max.poll.records but only return a subset of the fetched messages per poll() call.

Even with librdkafka's pre-fetching the fetch rate of the consumer will over time correspond to the consume rate of the application: as the internal fetchq fills up from pre-fetched messages the fetcher will stop fetching until the application has consumed enough messages to make the fetchq drop below the configured thresholds (be it queued.min.messages or queued.max.message.kbytes).

choudhary001

choudhary001 commented on Apr 2, 2019

@choudhary001

@edenhill Is there any plan to provide support for Java's 'max.poll.records' property in c/c++ ?

edenhill

edenhill commented on Apr 2, 2019

@edenhill
Contributor

@choudhary001 You mean for the consume_batch() API?

3 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @edenhill@sumitjainn@choudhary001@nick-zh

        Issue actions

          Add support for max.poll.records consumer configuration · Issue #1653 · confluentinc/librdkafka