Skip to content

[ISSUE #3585] [Part K] move execution of notifyMessageArriving() from ReputMessageService thread to PullRequestHoldService thread #3659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

areyouok
Copy link
Contributor

This commit speed up consume qps greatly, in our test up to 200,000 qps.

Make sure set the target branch to develop

What is the purpose of the change

XXXXX

Brief changelog

XX

Verifying this change

XXXX

Follow this checklist to help us incorporate your contribution quickly and easily. Notice, it would be helpful if you could finish the following 5 checklist(the last one is not necessary)before request the community to review your PR.

  • Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test(over 80% coverage) to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • Run mvn -B clean apache-rat:check findbugs:findbugs checkstyle:checkstyle to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Sorry, something went wrong.

@areyouok areyouok changed the title [ISSUE #3585] move execution of notifyMessageArriving() from ReputMessageService thread to PullRequestHoldService thread [ISSUE #3585] [Part K] move execution of notifyMessageArriving() from ReputMessageService thread to PullRequestHoldService thread Dec 16, 2021
@areyouok areyouok linked an issue Dec 16, 2021 that may be closed by this pull request
@coveralls
Copy link

coveralls commented Dec 16, 2021

Coverage Status

Coverage decreased (-0.09%) to 51.183% when pulling aa69a93 on areyouok:492_PartK into ca92d36 on apache:develop.

@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2021

Codecov Report

Merging #3659 (9d4b981) into develop (ecb061a) will decrease coverage by 2.35%.
The diff coverage is 46.95%.

Impacted file tree graph

@@              Coverage Diff              @@
##             develop    #3659      +/-   ##
=============================================
- Coverage      49.69%   47.33%   -2.36%     
- Complexity      4725     5050     +325     
=============================================
  Files            555      628      +73     
  Lines          36798    41496    +4698     
  Branches        4853     5395     +542     
=============================================
+ Hits           18286    19643    +1357     
- Misses         16214    19429    +3215     
- Partials        2298     2424     +126     
Impacted Files Coverage Δ
...a/org/apache/rocketmq/store/StoreStatsService.java 29.96% <ø> (-0.20%) ⬇️
...tmq/broker/longpolling/PullRequestHoldService.java 17.19% <17.77%> (-3.29%) ⬇️
.../java/org/apache/rocketmq/common/BrokerConfig.java 32.55% <25.00%> (-0.38%) ⬇️
...e/rocketmq/broker/longpolling/PullNotifyQueue.java 98.27% <98.27%> (ø)
.../rocketmq/broker/filter/ConsumerFilterManager.java 72.19% <0.00%> (-0.90%) ⬇️
...he/rocketmq/client/impl/consumer/ProcessQueue.java 60.55% <0.00%> (-0.85%) ⬇️
...a/org/apache/rocketmq/client/impl/MQAdminImpl.java 5.09% <0.00%> (-0.03%) ⬇️
...ain/java/org/apache/rocketmq/store/MappedFile.java 50.18% <0.00%> (-0.01%) ⬇️
... and 88 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ecb061a...9d4b981. Read the comment docs.

@areyouok
Copy link
Contributor Author

areyouok commented Dec 16, 2021

Flame chart before this commit (only of the ReputMessageService thread):

thread

@areyouok areyouok added the progress/wip Work in progress. Accept or refused to be continue. label Dec 17, 2021
@caigy
Copy link
Contributor

caigy commented Dec 21, 2021

I found System.nanoTime() was used instead of System.currentTimeMillis(), will it bring benefits to performance improvement? It also brings some conversion between nanosecond variables and config values defined in millisecond.

@areyouok
Copy link
Contributor Author

I found System.nanoTime() was used instead of System.currentTimeMillis(), will it bring benefits to performance improvement? It also brings some conversion between nanosecond variables and config values defined in millisecond.

There is no performance benefit.

nanoTime() are more strictly than currentTimeMillis() for measuring time elapse. For example, NTP service may adjust system clock, the nanoTime() method will not affect by it. See javadoc of the method.

@areyouok
Copy link
Contributor Author

I found the current commit may increase consume lantency in performance test (abount 45ms).

I'm working on it.

@areyouok areyouok removed the progress/wip Work in progress. Accept or refused to be continue. label Dec 31, 2021
@areyouok
Copy link
Contributor Author

finished

for (int i = 0; i < batch; i++) {
Runnable runnable = notifyList.poll();
if (runnable != null) {
runnable.run();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if runnable.run() throw throwable, the checkHoldRequest method in the below will not invoked.

@dongeforever
Copy link
Member

is there any impact on the "end to end" latency?

Maybe a performance test report is needed.

LinkedList<T> result = new LinkedList<>();
long tps = config.getTps();
if (tps <= config.getTpsThreshold()) {
T data = queue.poll(100, TimeUnit.MILLISECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the store put Tps to control the long polling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to improve throughput and decrease cpu cost when tps is greater than threshold.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end-to-end latency is more important in most cases for rocketmq. How about adjusting the default tps threshold to Integer.MaxValue?
if someone wants to promote throughput, it could be adjusted.

BTW, The cpu cost is a problem. The tag filter and property filter are unnecessary in the notification process, It will be done twice for the pull process will do that work too.
The notification should be lightweight as far as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most case tps is below 100000, we enable this when tps is greater than 100000 to protect rocketmq.

in our tests, 600 queue, 150000+ tps, end-to-end latency is 18ms; 18 queue, 150000+ tps, end-to-end latency is 2ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can modify the threshold to a greater number, such as 150000.

In case tps greater than 150000 and a lot of queue is writing, I think the latency is not so important.

if (data != null) {
result.add(data);
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation here is :

  • if the queue has data, poll it as quickly as possible
  • if the queue doesn't have data, just return the already polled data.

The code maybe be simplified as:
int pollWaitTimeMs = 0;
while(true) {
T data = queue.poll(pollWaitTimeMs, TimeUnit.MILLISECONDS);
if (data == null) {
pollWaitTimeMs = 100;
} else {
pollWaitTimeMs = 0;
result.add(data);
}
if(result.size() > batchMax) {
break;
}
if(result.size() >0 && pollWaitTimeMs > 0) {
break;
}
if(System.nanoTime() - start > maxWaitTimeNonas) {
break;
}
}

The tps control seems unnecessary.

@dongeforever dongeforever self-requested a review January 20, 2022 08:34
@vongosling
Copy link
Member

Any update?

@areyouok
Copy link
Contributor Author

Any update?

There is no update. But we have run it in out product environment for 3 months.

@github-actions
Copy link

This PR is stale because it has been open for 365 days with no activity. It will be closed in 3 days if no further activity occurs. If you wish not to mark it as stale, please leave a comment in this PR.

@github-actions github-actions bot added the stale label Jul 19, 2023
@areyouok areyouok closed this Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance for 4.9.2
10 participants