Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23679][YARN] Setting RM_HA_URLS for AmIpFilter to avoid redirect failure in YARN mode #22164

Closed
wants to merge 1 commit into from

Conversation

jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

YARN AmIpFilter adds a new parameter "RM_HA_URLS" to support RM HA, but Spark on YARN doesn't provide a such parameter, so it will be failed to redirect when running on RM HA. The detailed exception can be checked from JIRA. So here fixing this issue by adding "RM_HA_URLS" parameter.

How was this patch tested?

Local verification.

@SparkQA
Copy link

SparkQA commented Aug 21, 2018

Test build #94991 has finished for PR 22164 at commit da33554.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

@vanzin @tgravescs would you please help to review, thanks!

@jerryshao
Copy link
Contributor Author

Gently ping again @vanzin @tgravescs . Thanks!

@vanzin
Copy link
Contributor

vanzin commented Aug 24, 2018

Is this something that changed in Hadoop 3 (or some post-2.6 version)? I'm pretty sure the existing PROXY_URI_BASES config has been working with RM HA here...

@vanzin
Copy link
Contributor

vanzin commented Aug 24, 2018

Also, usual nit: PR title should explain the fix, not the problem.

@jerryshao jerryshao changed the title [SPARK-23679][YARN] Fix AmIpFilter cannot work in RM HA scenario [SPARK-23679][YARN] Setting RM_HA_URLS for AmIpFilter to avoid redirect failure in YARN mode Aug 27, 2018
@jerryshao
Copy link
Contributor Author

I think it should be related to this JIRA (https://issues.apache.org/jira/browse/YARN-7269). Seems like a Hadoop 2.9/3.0+ issue.

@@ -126,4 +136,21 @@ private[spark] class YarnRMClient extends Logging {
}
}

private def getUrlByRmId(conf: Configuration, rmId: String): String = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks ok, but it also looks similar to this:

https://github.com/apache/hadoop/blob/branch-2.6/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java

I'm wondering if we could just call that class instead, somehow? It seems available in 2.6 which is the oldest version we support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Spark usage, I think it may not be so useful to use AmFilterInitializer, because we need to pass the filter parameters to driver either from RPC (client mode) or from configuration (cluster mode), in either way we should know how to set each parameter, so from my understanding using AmFilterInitializer seems not so useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, it would be hard to use that class in the client case.

@vanzin
Copy link
Contributor

vanzin commented Aug 28, 2018

Merging to master.

@asfgit asfgit closed this in 4e3f3ce Aug 28, 2018
@jerryshao
Copy link
Contributor Author

Thanks @vanzin .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants