-
Notifications
You must be signed in to change notification settings - Fork 3.4k
(improvement)[bucket] Add auto bucket implement #15250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5ad9d10
to
62117bf
Compare
TeamCity pipeline, clickbench performance test result: |
5ae2d53
to
e136c9d
Compare
Need to add show create table with autobucket && estimate_partition_size settings |
Better using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some unit tests and regression tests for this
fe/fe-core/src/main/java/org/apache/doris/common/util/PropertyAnalyzer.java
Outdated
Show resolved
Hide resolved
|
||
int buckets = 0; | ||
for (Backend backend : backends.values()) { | ||
if (!backend.isLoadAvailable()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why judge isLoadAvailable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If backend is not loadAvailable,it would not be treated as a machine that could take on data.
fe/fe-core/src/main/java/org/apache/doris/common/util/AutoBucketUtils.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/common/util/AutoBucketUtils.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/clone/DynamicPartitionScheduler.java
Outdated
Show resolved
Hide resolved
… property auto_bucket to _auto_bucket (apache#15250)
fe/fe-core/src/main/java/org/apache/doris/analysis/CreateTableStmt.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/analysis/DistributionDesc.java
Outdated
Show resolved
Hide resolved
table.getPartitions() in DynamicPartitionScheduler::getBucketsNum (apache#15250)
…ions sort by PartitionItem (apache#15250)
… property auto_bucket to _auto_bucket (apache#15250)
table.getPartitions() in DynamicPartitionScheduler::getBucketsNum (apache#15250)
025ca63
to
1f3ae47
Compare
…ions sort by PartitionItem (apache#15250)
…ons sort by PartitionItem (apache#15250)
e425a96
to
3b3e76d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
int buckets = 0; | ||
for (Backend backend : backends.values()) { | ||
if (!backend.isLoadAvailable()) { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break; | |
continue; |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
how can it support Colocation Join ? |
+1 same question |
Proposed changes
Problem summary
用户经常设置不合适的bucket,导致各种问题,这里提供一种方式,来自动设置分桶数。暂时而言只对olap表生效
实现思路
根据数据量,计算分桶数。
对于分区表,可以根据历史分区的数据量、机器数、盘数,确定一个分桶。
主要问题是初始桶数不好确定。
这里提供两种方式:
详细设计
初始分桶计算
这种基本上不太靠谱。使用了default_bucket_num(10)。
这里我们先假设给的是单副本文本格式的数据量
先根据数据量得出一个桶数:N
首先数据量除以5(按5比1的压缩比算)
< 100MB : 1
< 1G: 2
根据桶数和盘数的乘机得出一个桶数 M
每个BE节点算1
磁盘容量,每50G算1
min(M, N, 128),如果这个值小于N,也小于机器数。取机器数。
举例:
计算未来分桶
仅针对分区表。
根据最多前7个分区的数据量的指数平均值,作为estimate_partition_size,进行评估。
需要判断历史分区的趋势:
比如前五个分区,每个都比前一个大,说明数据再增长,则此时不能求平均值,而应该取趋势值。
仅考虑递增和递减的情况。其他情况,求平均。
Checklist(Required)
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...