Skip to content

[SPARK-28895][K8S] Support defining HADOOP_CONF_DIR and config map at the same time #25609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Aug 28, 2019

What changes were proposed in this pull request?

Changes in this pull request will support users to define HADOOP_CONF_DIR and spark.kubernetes.hadoop.configMapName at the same time. When both of them are defined, Spark will take precedence over the config map to be mounted on the driver pod. This enables the spark client process to communicate any Hadoop cluster if it needs.

Why are the changes needed?

The BasicDriverFeatureStep for Spark on Kubernetes will upload the files/jars specified by --files/–jars to a Hadoop compatible file system configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, the spark-submit process can recognize the file system, but when using spark.kubernetes.hadoop.configMapName which only will be mount on the Pods not applied back to our client process.

 Kent@KentsMacBookPro~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3  bin/spark-submit --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf --jars /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf spark.kerberos.principal=kyuubi/dev@HADOOP.HZ.NETEASE.COM --conf  spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  --name hehe --deploy-mode cluster --class org.apache.spark.examples.HdfsTest   local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl
Listening for transport dt_socket at address: 50014
# spark.master=k8s://https://10.120.238.100:7443
19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
Listening for transport dt_socket at address: 50014
Exception in thread "main" org.apache.spark.SparkException: Uploading file /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar failed...
	at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287)
	at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:237)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245)
	at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur# spark.master=k8s://https://10.120.238.100:7443
eStep.scala:165)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163)
	at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:89)
	at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
	at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:101)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10(KubernetesClientApplication.scala:236)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10$adapted(KubernetesClientApplication.scala:229)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2567)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:229)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:198)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:920)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:179)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:202)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:89)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:999)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1008)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hz-cluster10
	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
	at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
	at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
	at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1881)
	at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:278)
	... 30 more
Caused by: java.net.UnknownHostException: hz-cluster10
	... 43 more

Other related spark configurations

spark.master=k8s://https://10.120.238.100:7443
# spark.master=k8s://https://10.120.238.253:7443
spark.kubernetes.container.image=harbor-inner.sparkonk8s.netease.com/tenant1-project1/spark:v3.0.0-20190813
# spark.kubernetes.driver.container.image=harbor-inner.sparkonk8s.netease.com/tenant1-project1/spark:v3.0.0-20190813
# spark.kubernetes.executor.container.image=harbor-inner.sparkonk8s.netease.com/tenant1-project1/spark:v3.0.0-20190813
spark.executor.instanses=5
spark.kubernetes.namespace=ns1
spark.kubernetes.container.image.pullSecrets=mysecret
spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir
spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf
spark.kerberos.principal=kyuubi/dev@HADOOP.HZ.NETEASE.COM
spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab

Does this PR introduce any user-facing change?

I guess this pr will now use config map of Hadoop from k8s cluster to the local client process if there are files to upload to Hadoop.

How was this patch tested?

manually tested with spark + k8s cluster + standalone kerberized hdfs cluster
add an unit test

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

Test build #109859 has finished for PR 25609 at commit ad10f94.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

Test build #109861 has finished for PR 25609 at commit 528dfc9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 28, 2019

Test build #109862 has finished for PR 25609 at commit b31d487.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// should download the configmap to our client side.
// 1. add configurations from k8s configmap to hadoopConf
conf.get(KUBERNETES_HADOOP_CONF_CONFIG_MAP).foreach { cm =>
val hadoopConfFiles = client.configMaps().withName(cm).get().getData.asScala
Copy link
Contributor

@skonto skonto Sep 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaooqinn This happens at submission time at the launcher machine, it is weird to fetch the configmap from the cluster locally and not the way to go in my opinion. You could just point to the right hadoop config and spark submit will pick it up. spark.kubernetes.hadoop.configMapName was meant to be used at the driver pod and so that the hadoop files can be mounted on the fly within the cluster. Even if you launch cluster mode in the cluster you can do the same, mount the configmap and point to the files via the HADOOP_CONF_CONFIG var. @erikerlandson @dongjoon-hyun @ifilonenko fyi.
Btw configamaps are namespaced.

Copy link
Member Author

@yaooqinn yaooqinn Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. spark.kubernetes.hadoop.configMapName has no driver word in its name, so I guess it is ok to be used all over the spark application including the client process.
  2. Using spark.kubernetes.hadoop.configMapName and HADOOP_CONF_DIR is an either-or thing, check here. It is more like that they both contain the same thing and are used by client to define the driver pod, so they should be equal. For now, the only difference between them is that the HADOOP_CONF_DIR is in the classpath of client process and the other not. With spark.kubernetes.hadoop.configMapName, if our application has no extra file (--jars/--files) to upload, it works. But once it does need one or more, it fails. I guess this behavior for spark.kubernetes.hadoop.configMapName is kind of unacceptable.

Copy link
Contributor

@skonto skonto Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that they are either-or (extraneous is meant for the cluster deployment not this new feature), but I think if you specify them both (modify the code in there to allow both of them to be defined not having either-or and select according to user's preference or use the pod template feature to emulate the configmap mounting) it should work as spark submit is supposed to pick up the hadoop credentials (eg.

if ((clusterManager == MESOS || clusterManager == KUBERNETES)
). Initially config map was not meant for uploading files from the client machine (in general accessing hadoop from there) so the logic may need to be modified to play well with HADOOP_CONF_DIR, but I do find fetching the configmap from the cluster redundant. If you can download the configuration you could just add it at spark submit time at the client machine (it is not safer or anything if you fetch it afaik)?

Copy link
Contributor

@skonto skonto Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ifilonenko thoughts here? Do you think we can get the kerberos integration tests PR merged? That is the only viable way to make sure things are stable in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are both specified, the main corner case seems like the scenario where they aren't consistent, but this could be checked for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with setting them both, which is my original idea too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skonto ill make sure to get that PR resolved so that we can use it for testing, yeah

// 2. add configurations from arguments or spark properties file to hadoopConf
SparkHadoopUtil.appendS3AndSparkHadoopConfigurations(conf, hadoopConf)
// 3. set or rest user group information
UserGroupInformation.setConfiguration(hadoopConf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark submit should take care of the security stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. But for now, the spark submit is quite yarn-specific. For security stuff contains in configMap of k8s, it's not working.

This reverts commit b31d487.
This reverts commit 528dfc9.
…nfigmap if the client process has files to upload"

This reverts commit ad10f94.
@yaooqinn yaooqinn changed the title [SPARK-28896][K8S] Download hadoop configurations from k8s configmap if the client process has files to upload [SPARK-28896][K8S] Support HADOOP_CONF_DIR and config map at the same time Sep 5, 2019
@yaooqinn yaooqinn changed the title [SPARK-28896][K8S] Support HADOOP_CONF_DIR and config map at the same time [SPARK-28896][K8S] Support defining HADOOP_CONF_DIR and config map at the same time Sep 5, 2019
@SparkQA
Copy link

SparkQA commented Sep 5, 2019

Test build #110165 has finished for PR 25609 at commit 04a6aab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

@SparkQA
Copy link

SparkQA commented Sep 5, 2019

@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/15644/

@SparkQA
Copy link

SparkQA commented Sep 16, 2019

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/15644/

@yaooqinn
Copy link
Member Author

gentle ping @skonto

is configured, the hadoop configurations will only be used by the Driver and its Executors. If your client process has
extra dependencies to upload to `spark.kubernetes.file.upload.path`, you may need to configure `HADOOP_CONF_DIR` too.
When these two variables are both set, Spark will prefer `spark.kubernetes.hadoop.configMapName` to be mounted on the
Driver/Executor pods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept looks good to me, @ifilonenko any corner cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is still okay to warn the user about, the configuration is being picked from the configMap, inspite of the fact, HADOOP_CONF_DIR is defined.

@skonto
Copy link
Contributor

skonto commented Oct 11, 2019

@erikerlandson LGTM, we need an integration test for this @ifilonenko gentle ping.

@SparkQA
Copy link

SparkQA commented Oct 12, 2019

Test build #111938 has finished for PR 25609 at commit 7f2c957.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 12, 2019

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/16953/

@SparkQA
Copy link

SparkQA commented Oct 12, 2019

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/16953/

@yaooqinn
Copy link
Member Author

anything blocks this? @skonto

@yaooqinn
Copy link
Member Author

retest this please

@yaooqinn
Copy link
Member Author

cc @cloud-fan @vanzin, could any active committers help review this? thanks in advance.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115066 has finished for PR 25609 at commit 7f2c957.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/19884/

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/19884/

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115069 has finished for PR 25609 at commit 7f2c957.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/19886/

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115072 has finished for PR 25609 at commit 7f2c957.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/19886/

@erikerlandson
Copy link
Contributor

Does this have an integration test now?
cc @ifilonenko @skonto

@SparkQA
Copy link

SparkQA commented Jan 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/21297/

@SparkQA
Copy link

SparkQA commented Jan 11, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/21297/

@SparkQA
Copy link

SparkQA commented May 18, 2020

Test build #122767 has finished for PR 25609 at commit 7f2c957.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Aug 26, 2020
@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 26, 2020

Test build #127901 has finished for PR 25609 at commit 7f2c957.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 26, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/32527/

@SparkQA
Copy link

SparkQA commented Aug 26, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/32527/

@github-actions github-actions bot closed this Aug 27, 2020
@yaooqinn yaooqinn changed the title [SPARK-28896][K8S] Support defining HADOOP_CONF_DIR and config map at the same time [SPARK-28895][K8S] Support defining HADOOP_CONF_DIR and config map at the same time Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants