Skip to content

[FLINK-9031] Fix DataSet Union operator translation bug. #5742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

fhueske
Copy link
Contributor

@fhueske fhueske commented Mar 21, 2018

What is the purpose of the change

  • Fixes a bug reported in FLINK-9031
  • Union nodes had a partitioning strategy on the outgoing channel, that was (intentionally) not translated by the JobGraphGenerator because the JobGraphGenerator assumed that Union nodes would always have outgoing FORWARD strategies.
  • Not translating the partitioning resulted in an incorrect result because data was not correctly distributed.

Brief change log

  • Add a check in JobGraphGenerator to fail if a union node with non-FORWARD outgoing strategy is found.
  • Add a pre-optimization plan traversal that fixes the strategy of union outputs to FORWARD.
  • Add a test based on a simplified version of the reported program.

Verifying this change

  • Run the added test.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? n/a

fhueske added 2 commits March 21, 2018 22:40
- Adds a pass over the pre-optimized plan that fixes the output strategy of union nodes to FORWARD.
Copy link
Contributor

@StephanEwen StephanEwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, with one minor comment.

/**
* FilterFunction for optimizer tests.
*/
public class IdentityFilter<T> implements FilterFunction<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just using lambdas (filter always works) instead of adding this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
Didn't think about that as the other tests follow that pattern.
Will remove the class.

@fhueske
Copy link
Contributor Author

fhueske commented Mar 22, 2018

The fix is still broken. Some tests are failing.
Need to look into it a bit more.

@fhueske
Copy link
Contributor Author

fhueske commented Mar 22, 2018

Updated the PR.
Tests are passing (mod an unrelated failure).

@StephanEwen
Copy link
Contributor

Thanks for updating this.
The code looks good and the fix conceptually make perfect sense.

Merging this...

StephanEwen pushed a commit to StephanEwen/flink that referenced this pull request Mar 27, 2018
- Adds a pass over the pre-optimized plan that fixes the output strategy of union nodes to FORWARD.

This closes apache#5742
@asfgit asfgit closed this in 87ff6eb Mar 29, 2018
asfgit pushed a commit that referenced this pull request Apr 4, 2018
- Adds a pass over the pre-optimized plan that fixes the output strategy of union nodes to FORWARD.

This closes #5742
sampathBhat pushed a commit to sampathBhat/flink that referenced this pull request Jul 26, 2018
- Adds a pass over the pre-optimized plan that fixes the output strategy of union nodes to FORWARD.

This closes apache#5742
@fhueske fhueske deleted the dataSetUnionBug branch February 25, 2019 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants