-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[pulsar-sql] Make partition as internal column #4888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
|
||
@Override | ||
public Object getData(RawMessage message) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get partition
from the messageId, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawMessage does not contains partition now and get partition by split will reduce memory usage more than storing a partition per RawMessage.
} | ||
|
||
int actualNumSplits = Math.max(numPartitions, numSplits); | ||
List<Integer> predicatePartitions = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move line 152-181 to a function "getPredicatedPartitions" or "getFilteredPartitions"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
2b8d7b0
to
9e8fd54
Compare
run cpp tests |
@codelipenghui I think the tests are failing due to this pull request.
|
@@ -388,4 +431,8 @@ public boolean apply(Entry entry) { | |||
} | |||
}); | |||
} | |||
|
|||
public PulsarAdmin getPulsarAdmin() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
high = range.getHigh().getValueBlock().get().getInt(0, 0); | ||
} | ||
while (low <= high) { | ||
if (!predicatePartitions.contains(low)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this conditional check?
if (!range.getHigh().isLowerUnbounded() && range.getHigh().getValueBlock().isPresent()) { | ||
high = range.getHigh().getValueBlock().get().getInt(0, 0); | ||
} | ||
while (low <= high) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler and easier to read if we just use a for loop e.g.
for (int i = low; i <= high; i++)
@@ -91,6 +92,7 @@ | |||
// are empty or not | |||
private final long splitSize; | |||
private long entriesProcessed = 0; | |||
private int partition = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-partitioned topics, should the partition column be NULL? Currently, for internal columns, if there is not an applicable value, the value is displayed as "NULL"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to be consistent with the messageId? If not, agree use NULL for non-partitioned topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally I am in favor of using -1 to be consistent with how we use it in MessageId and it make a consistent view for both partitioned topic and non-partitioned topic.
A partitioned topic has N partitions, a non-partitioned topic has one partition with partition id "-1".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with either. @merlimat any opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after discussing with @karthikramasamy since he has experience with databases, it seems that "null" can have all sort of meanings. Perhaps its just better to use -1 for non-partitioned topics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good! I left a few comments
@jerrypeng I have addressed your comments, please take a look. |
run cpp tests |
} | ||
for (int i = low; i <= high; i++) { | ||
predicatePartitions.add(i); | ||
low++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, fixed
run java8 tests |
53dce27
to
5ad6e99
Compare
conflict with #4890 mark as 2.4.1 |
Fixes #4785 ### Motivation 1. Stop return partition name in table list, just return the partitioned topic name in table list. This will avoid huge tables while user create large number of partition. 2. Make partition as internal column, provide users with the ability to get which partition data in and filtration based on partition. For example: ``` SELECT * FROM "my-table" WHERE "__partition__" = 0; SELECT * FROM "my-table" WHERE "__partition__" in (2,3); SELECT * FROM "my-table" WHERE "__partition__" < 1; ``` ### Modifications 1. Add "__partition__" internal column. 2. Add domain handle for "__partition__". ### Verifying this change Added new unit test to verify this change ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (yes) - The schema: (no) - The default values of configurations: (no) - The wire protocol: ( no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) ### Documentation - Does this pull request introduce a new feature? (yes) (cherry picked from commit 5adc522)
Fixes #4785
Motivation
Modifications
Verifying this change
Added new unit test to verify this change
Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changesDocumentation