[Improvement] historical fast restart by lazy load columns metadata(20X faster) #6988

pzhdfy · 2019-02-02T03:30:07Z

We have large data in druid, historical (12 * 2T SATA HDD) will have over 100k segments and 10TB size.
When we want to restart a historical to change configuration or after OOM， it will take 40 minutes , that is too slow.

We profile the restart progress, and make a flame graph.

We can see io.druid.segment.IndexIO$V9IndexLoader.deserializeColumn cost most time.
It was because HDD has only 100 IOPS, and segment often has over 100 columns, so it makes too many random I/O, the disk util gets 100%.

So we can make columns metadata lazy load , until it gets first used.

After optimize, historical restart will only spend 2 minutes( 20X faster).

And the flame graph after optimize is below, we can see load metadata spend little time.

we add a new config druid.segmentCache.lazyLoadOnStart (default is false), whether to do this optimize.
We suggest to open this on HDD historicals, while SDD historical is fast enough.

jihoonson · 2019-02-06T01:37:31Z

Hi @pzhdfy, have you had a chance to test druid.segmentCache.numBootstrapThreads option (http://druid.io/docs/latest/configuration/index.html#storing-segments)? It's to set # of segments to load concurrently from local storage at startup. I wonder how different its effect is from the lazy loading of column metadata.

clintropolis · 2019-02-06T01:38:36Z

Interesting idea, I too have felt the pain of multi-day rollouts for tiers of densely loaded historical servers.

I've only scanned changes so far, I'll try to do a full review later. In the mean time, if possible could you run query benchmarks before this patch and after this patch with lazyLoadOnStart = false, as a sanity check to make sure that there are not any odd performance effects of introducing the memoized suppliers? I don't really expect any noticeable effect, but it would make buy in to this change easier. (No worries if you don't have a chance to get to it, i'll try to do this myself once I get around to full review).

I would also be interested in the performance cost for queries that do have to eat the deserialization when lazyLoadOnStart = true, and whether it opens up scenarios where the memoizing supplier becomes a sort of choke point for the processing pool if the cost is high.

gianm · 2019-02-06T01:39:37Z

IMO a lazy loading option is nice for very dense historical nodes. So I am supportive of this idea. It should be off by default, though, since it defers work from startup to query time, and for a medium/low-density historical node you'd prefer to do that work at startup to keep queries fast.

pzhdfy · 2019-02-14T07:40:53Z

historical size: 100k segments and 10TB size
Each time, We will drop the page cache

1. without this patch

1) druid.segmentCache.numBootstrapThreads = 1
40min
2) druid.segmentCache.numBootstrapThreads = 10
also 40 min, because when numBootstrapThreads=1, reading all columns metadata has cost 100% disk util, setting numBootstrapThreads a higher number makes no sense

2. with this patch and lazyLoadOnStart = false
the result is very similar with scenario 1，there are not any odd performance effects， becauce we don't use memoized suppliers when lazyLoadOnStart = false

3. with this patch and lazyLoadOnStart = true
1) druid.segmentCache.numBootstrapThreads = 1
8min
2) druid.segmentCache.numBootstrapThreads = 10
2 min, 4 times faster than numBootstrapThreads = 1, because when numBootstrapThreads=1 and lazyLoadOnStart = true , we don't read all columns metadata , disk util will less than 100%, setting numBootstrapThreads a higher number will benefit

kaijianding · 2019-02-14T07:58:31Z

@pzhdfy I did similar lazy load thing in my code base, but I found there is a consequence. If historical node is force killed(kill -9) when it is asked to download new segment, it is very likely unzip failure and left a corrupted segment folder. When lazy load=false, this segment can be ignored during historical startup(loading will fail), but when lazy load=true, this corruption is only known when a query comes in. Currently there is no interface to tell SegmentLoaderLocalCacheManager to unload this segment or reload this segment, thus queries always fail. Currently the only solution is to delete this segment folder and restart historical node to ignore this segment(because folder is gone).

Better introduce unload/reload interface together in this PR.

stale · 2019-04-15T18:07:30Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

bputt · 2019-04-15T20:47:55Z

commenting as i'm not sure we want this to go stale or not?

jihoonson · 2019-04-15T20:50:06Z

@bputt thanks for commenting! I think this PR is worth. Would you please fix the conflicts?

himanshug

thanks, feature sounds useful.
I think, in your case, Historicals don't have enough memory to have all segments in page cache or else on process restart much disk activity shouldn't be expected because segment data would be in the page cache. if you do have enough memory then the behavior described in PR description looks suspicious and maybe there is something else going on.

processing/src/main/java/org/apache/druid/segment/IndexIO.java

processing/src/main/java/org/apache/druid/segment/SimpleQueryableIndex.java

server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderConfig.java

JackyYangPassion · 2019-05-14T02:52:19Z

Recently,I want to apply this PR to 0.12.1;but have many conflicts;

error: indexing-service/src/test/java/io/druid/indexing/common/task/CompactionTaskTest.java: does not match index error: patch failed: indexing-service/src/test/java/io/druid/indexing/common/task/SameIntervalMergeTaskTest.java:230 error: indexing-service/src/test/java/io/druid/indexing/common/task/SameIntervalMergeTaskTest.java: patch does not apply error: processing/src/main/java/io/druid/segment/IndexIO.java: does not match index error: processing/src/main/java/io/druid/segment/SimpleQueryableIndex.java: does not match index error: patch failed: processing/src/main/java/io/druid/segment/loading/MMappedQueryableSegmentizerFactory.java:42 error: processing/src/main/java/io/druid/segment/loading/MMappedQueryableSegmentizerFactory.java: patch does not apply error: processing/src/main/java/io/druid/segment/loading/SegmentizerFactory.java: does not match index error: patch failed: server/src/main/java/io/druid/segment/loading/SegmentLoader.java:29 error: server/src/main/java/io/druid/segment/loading/SegmentLoader.java: patch does not apply error: server/src/main/java/io/druid/segment/loading/SegmentLoaderConfig.java: does not match index error: server/src/main/java/io/druid/segment/loading/SegmentLoaderLocalCacheManager.java: does not match index error: server/src/main/java/io/druid/server/SegmentManager.java: does not match index error: server/src/main/java/io/druid/server/coordination/SegmentLoadDropHandler.java: does not match index error: patch failed: server/src/test/java/io/druid/segment/loading/CacheTestSegmentLoader.java:48 error: server/src/test/java/io/druid/segment/loading/CacheTestSegmentLoader.java: patch does not apply error: server/src/test/java/io/druid/server/SegmentManagerTest.java: does not match index error: server/src/test/java/io/druid/server/coordination/ServerManagerTest.java: does not match index Patch failed at 0001 historical fast restart by lazy load columns metadata Use 'git am --show-current-patch' to see the failed patch

I want to Know which point fork from for the historical_fast_restart branch?
I have seen the git log the branch historical_fast_restart checkout from historical_fast_restart

| * 6f964e148 (HEAD -> historical_fast_restart, origin/historical_fast_restart) historical fast restart by lazy load columns metadata |/
I find the base branch from druid-hll 0.13.0-incubating

pzhdfy · 2019-05-15T04:25:49Z

@JackyYangPassion
yes, this is for 0.13+, with package name 'org.apache.druid',
you can modify the code in 0.12 manually

JackyYangPassion · 2019-05-16T05:28:20Z

@pzhdfy yes! I have manually this PR for 0.12.1 ;

JackyYangPassion · 2019-05-21T09:18:36Z

apply this pr to 0.12.1
https://github.com/JackyYangPassion/incubator-druid/tree/0.12.1-fast-restart-historical

RofiYao · 2019-07-15T08:33:55Z

I apply it to 0.12.3, but inoperative. I just apply and restart on one historical node of cluster.
Need I apply it on all nodes and restart?

pzhdfy · 2019-07-17T01:49:56Z

I apply it to 0.12.3, but inoperative. I just apply and restart on one historical node of cluster.
Need I apply it on all nodes and restart?

Just on historical.

capistrant · 2019-09-03T14:49:30Z

@pzhdfy does this change push the work to query time for the first time a segment is queried only? Or is this metadata loading pushed to query time for every query run against segments instead of only at historical startup?

pzhdfy · 2019-09-04T04:03:03Z

@pzhdfy does this change push the work to query time for the first time a segment is queried only? Or is this metadata loading pushed to query time for every query run against segments instead of only at historical startup?

just push the work to query time for the first time a segment is queried only

capistrant · 2019-09-20T15:10:47Z

@clintropolis @gianm @pzhdfy We love this patch. It makes our lives much easier when rolling out changes and running through upgrades.

@pzhdfy I did similar lazy load thing in my code base, but I found there is a consequence. If historical node is force killed(kill -9) when it is asked to download new segment, it is very likely unzip failure and left a corrupted segment folder. When lazy load=false, this segment can be ignored during historical startup(loading will fail), but when lazy load=true, this corruption is only known when a query comes in. Currently there is no interface to tell SegmentLoaderLocalCacheManager to unload this segment or reload this segment, thus queries always fail. Currently the only solution is to delete this segment folder and restart historical node to ignore this segment(because folder is gone).

Better introduce unload/reload interface together in this PR.

I am nervous about this though. As of now we have accepted the risk and are willing to intervene when needed. Do you have any plans of addressing this comment in this PR or would this be an additional PR. I do think that it would get more support if we did analysis on this and solved it if it is indeed a problem.

Interesting idea, I too have felt the pain of multi-day rollouts for tiers of densely loaded historical servers.

I've only scanned changes so far, I'll try to do a full review later. In the mean time, if possible could you run query benchmarks before this patch and after this patch with lazyLoadOnStart = false, as a sanity check to make sure that there are not any odd performance effects of introducing the memoized suppliers? I don't really expect any noticeable effect, but it would make buy in to this change easier. (No worries if you don't have a chance to get to it, i'll try to do this myself once I get around to full review).

I would also be interested in the performance cost for queries that do have to eat the deserialization when lazyLoadOnStart = true, and whether it opens up scenarios where the memoizing supplier becomes a sort of choke point for the processing pool if the cost is high.

Regarding Clint's comment on performance. We have been running it for nearly a week on a cluster with hundreds of thousands of segments and hundreds of thousands of queries a day and our metrics collection shows negligible change in performance (but this is vs druid 11. If druid 15 had large performance gains across the board before this patch there could be a bigger change). We added this as a part of our upgrade to druid 15 though so we have not seen performance with it off in druid 15, just in druid 11.

Regardless, we are eager to get this reviewed and accepted upstream. Our experience with it so far has been great and we'd love to help in whatever way possible to get it merged.

capistrant · 2019-11-20T17:32:40Z

Throwing another comment at this to make sure it doesn't get marked as stale. We continue to run in production with no apparent issues so far (cluster with 100s of thousands of segments). Allowed us to complete a rolling prod upgrade of 70+ historical nodes in a single day which was not possible for us before. If there is anything that @mohammadjkhan or I can do to help get this moving forward again, let us know!

clintropolis · 2019-11-22T02:39:13Z

Hi @pzhdfy, my apologies for totally forgetting about this PR. Could you please fix the conflicts and address the comments @himanshug had, at least about adding this to the documentation (I think here would be most appropriate), so we can try to get this merged? Overall lgtm 👍

pzhdfy · 2019-11-22T08:46:35Z

Hi @pzhdfy, my apologies for totally forgetting about this PR. Could you please fix the conflicts and address the comments @himanshug had, at least about adding this to the documentation (I think here would be most appropriate), so we can try to get this merged? Overall lgtm 👍

ok, I will fix conflicts and add documentation later

pzhdfy · 2019-11-23T02:25:25Z

@clintropolis done

docs/configuration/index.md

clintropolis · 2019-11-26T10:18:42Z

I did some testing with the latest version of this PR and everything is still working me. I also noticed that segment metadata queries performed by the broker to collect schema information for Druid SQL will likely do a fair bit of the work of loading the segments after the historical initializes, by lucky side-effect. Though I suspect that will not cover all replicas, it should definitely lessen the impact of using lazy loading I think.

clintropolis

lgtm, thanks 👍

himanshug · 2019-11-29T06:49:16Z

@pzhdfy thanks for the updates, I should be able to review it early next week. however , please don't force push the changes once a PR review is started or else it is difficult for me , as reviewer, to see what changed since I last reviewed .

docs/configuration/index.md

himanshug · 2019-12-03T01:34:21Z

LGTM aside from https://github.com/apache/incubator-druid/pull/6988/files#r352946425

zhanglistar · 2019-12-03T12:36:14Z

great！！！

zhangyue19921010 · 2020-12-17T13:57:22Z

Hi there. I have made a PR #10688. Maybe can solve the little shortcomings when historical node restart lazily. Please let me know if you have any questions!

historical fast restart by lazy load columns metadata

6f964e1

pzhdfy changed the title ~~[Improvement] historical fast restart by lazy load columns metadata~~ [Improvement] historical fast restart by lazy load columns metadata(20X faster) Feb 2, 2019

stale bot added the stale label Apr 15, 2019

stale bot removed the stale label Apr 15, 2019

himanshug reviewed May 7, 2019

View reviewed changes

delete repeated code

7d11ca2

pzhdfy force-pushed the historical_fast_restart branch from ec4f0d4 to 7d11ca2 Compare September 25, 2019 11:59

Merge branch 'master' into historical_fast_restart

849cab1

dengfangyuan added 2 commits November 22, 2019 16:49

Merge branch 'master' into historical_fast_restart

112e8b5

add documentation for druid.segmentCache.lazyLoadOnStart

e45c4fb

dengfangyuan added 2 commits November 22, 2019 17:50

fix unit test fail

db7318c

fix spellcheck

22b1d47

clintropolis reviewed Nov 26, 2019

View reviewed changes

docs/configuration/index.md Outdated Show resolved Hide resolved

update docs

e37ac12

pzhdfy force-pushed the historical_fast_restart branch from 70b66e2 to e37ac12 Compare November 27, 2019 09:40

clintropolis approved these changes Nov 27, 2019

View reviewed changes

pzhdfy requested a review from himanshug November 27, 2019 13:09

himanshug reviewed Dec 3, 2019

View reviewed changes

docs/configuration/index.md Outdated Show resolved Hide resolved

update docs mentioning a catch

3ca2ca5

himanshug added Improvement Performance labels Dec 3, 2019

himanshug merged commit 187cf0d into apache:master Dec 3, 2019

clintropolis added the Release Notes label Dec 3, 2019

jon-wei added this to the 0.17.0 milestone Dec 17, 2019

jon-wei mentioned this pull request Dec 28, 2019

0.17.0 release notes #9066

Closed

navis mentioned this pull request Jun 1, 2020

Skip eager loading of columns in bootstrapping of historical node metatron-app/metatron-discovery#3232

Closed

zhangyue19921010 mentioned this pull request Dec 17, 2020

Historical unloads damaged segments automatically when lazy on start. #10688

Merged

9 tasks

[Improvement] historical fast restart by lazy load columns metadata(20X faster) #6988

[Improvement] historical fast restart by lazy load columns metadata(20X faster) #6988

Conversation

pzhdfy commented Feb 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jihoonson commented Feb 6, 2019

Uh oh!

clintropolis commented Feb 6, 2019

Uh oh!

gianm commented Feb 6, 2019

Uh oh!

pzhdfy commented Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaijianding commented Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stale bot commented Apr 15, 2019

Uh oh!

bputt commented Apr 15, 2019

Uh oh!

jihoonson commented Apr 15, 2019

Uh oh!

himanshug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JackyYangPassion commented May 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pzhdfy commented May 15, 2019

Uh oh!

JackyYangPassion commented May 16, 2019

Uh oh!

JackyYangPassion commented May 21, 2019

Uh oh!

RofiYao commented Jul 15, 2019

Uh oh!

pzhdfy commented Jul 17, 2019

Uh oh!

capistrant commented Sep 3, 2019

Uh oh!

pzhdfy commented Sep 4, 2019

Uh oh!

capistrant commented Sep 20, 2019

Uh oh!

capistrant commented Nov 20, 2019

Uh oh!

clintropolis commented Nov 22, 2019

Uh oh!

pzhdfy commented Nov 22, 2019

Uh oh!

pzhdfy commented Nov 23, 2019

Uh oh!

Uh oh!

clintropolis commented Nov 26, 2019

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

himanshug commented Nov 29, 2019

Uh oh!

Uh oh!

himanshug commented Dec 3, 2019

Uh oh!

zhanglistar commented Dec 3, 2019

Uh oh!

zhangyue19921010 commented Dec 17, 2020

Uh oh!

pzhdfy commented Feb 2, 2019 •

edited

Loading

pzhdfy commented Feb 14, 2019 •

edited

Loading

kaijianding commented Feb 14, 2019 •

edited

Loading

JackyYangPassion commented May 14, 2019 •

edited

Loading