-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[flink] Assign splits with fixed batch size in StaticFileStoreSplitEnumerator #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -79,10 +82,15 @@ public void handleSplitRequest(int subtask, @Nullable String hostname) { | |||
// The following batch assignment operation is for two purposes: | |||
// To distribute splits evenly when batch reading to prevent a few tasks from reading all | |||
// the data (for example, the current resource can only schedule part of the tasks). | |||
// TODO: assignment is already created in constructor, here can just assign per batch | |||
List<FileStoreSourceSplit> splits = pendingSplitAssignment.remove(subtask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make pendingSplitAssignment
to Map<Integer, Queue<FileStoreSourceSplit>>
?
And poll a batch from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -37,6 +37,9 @@ | |||
public class StaticFileStoreSplitEnumerator | |||
implements SplitEnumerator<FileStoreSourceSplit, PendingSplitsCheckpoint> { | |||
|
|||
/** Default batch splits size to avoid exceed `akka.framesize`. */ | |||
private static final int DEFAULT_SPLITS_SIZE = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we introduce an option for this? Can be an option in FlinkConnectorOptions
?
@@ -26,6 +26,12 @@ | |||
<td>String</td> | |||
<td>The log system used to keep changes of the table.<br /><br />Possible values:<br /><ul><li>"none": No log system, the data is written only to file store, and the streaming read will be directly read from the file store.</li></ul><ul><li>"kafka": Kafka log system, the data is double written to file store and kafka, and the streaming read will be read from kafka.</li></ul></td> | |||
</tr> | |||
<tr> | |||
<td><h5>scan.bounded.splits.size</h5></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scan.split-enumerator.batch-size
?
@@ -94,6 +98,6 @@ public SplitEnumerator<FileStoreSourceSplit, PendingSplitsCheckpoint> restoreEnu | |||
} | |||
|
|||
Snapshot snapshot = snapshotId == null ? null : snapshotManager.snapshot(snapshotId); | |||
return new StaticFileStoreSplitEnumerator(context, snapshot, splits); | |||
return new StaticFileStoreSplitEnumerator(context, snapshot, splits, splitsSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we can get splitBatchSize
from table. (via table.options())
List<FileStoreSourceSplit> splits = pendingSplitAssignment.remove(subtask); | ||
if (splits != null && splits.size() > 0) { | ||
context.assignSplits(new SplitsAssignment<>(Collections.singletonMap(subtask, splits))); | ||
Queue<FileStoreSourceSplit> allSubTaskSplits = pendingSplitAssignment.get(subtask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: rename to taskSplits
if (splits != null && splits.size() > 0) { | ||
context.assignSplits(new SplitsAssignment<>(Collections.singletonMap(subtask, splits))); | ||
Queue<FileStoreSourceSplit> allSubTaskSplits = pendingSplitAssignment.get(subtask); | ||
List<FileStoreSourceSplit> batchAssignment = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: rename to assignment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wxplovecc , +1 for the pr
Purpose
Assign all splits in one time will produce exceed
akka.framesize
to exceptionsAdd
scan.bounded.splits.size
config option to detemine how many splits will be assign per batchTests
StaticFileStoreSplitEnumeratorTest.testSplitBatch