Skip to content

[SUPPORT] Cannot create table via Spark thrift server #6185

Closed
@paul8263

Description

@paul8263
Contributor

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

Cannot create table via Spark thrift server.

Create table SQL:

create table test_hudi_cow_pt_tbl (
  id bigint,
  name string,
  ts bigint,
  dt string,
  hh string
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts'
 )
partitioned by (dt, hh)
location '/zy/spark_test/hudi_cow_pt_tbl';

The exception is:

SQL error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path Unable to find a hudi table for the user provided paths.
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path Unable to find a hudi table for the user provided paths.
	at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:88)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:352)
	at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:325)
	... 16 more

The SQL can be successfully executed via spark-sql. And the table location folder has been created in advance.

To Reproduce

Steps to reproduce the behavior:

  1. Connect to Spark thrift server via jdbc:hive2://{ip}:10016
  2. Execute the create table sql.

Expected behavior

Create table successfully via Spark thrift server.

Environment Description

  • Hudi version : 0.11.1

  • Spark version : 3.1.1

  • Hive version : 3.1.0

  • Hadoop version : 3.1.1

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) : no

Stacktrace

SQL error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path Unable to find a hudi table for the user provided paths.
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path Unable to find a hudi table for the user provided paths.
	at org.apache.hudi.DataSourceUtils.getTablePath(DataSourceUtils.java:88)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:94)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:352)
	at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:325)
	... 16 more

Activity

added
priority:majordegraded perf; unable to move forward; potential bugs
sparkIssues related to spark
on Jul 26, 2022
KnightChess

KnightChess commented on Aug 3, 2022

@KnightChess
Contributor

Confirm whether there is any authority to access and create for the given path

paul8263

paul8263 commented on Aug 3, 2022

@paul8263
ContributorAuthor

Hi @KnightChess
Thank you for your suggestion.

I can successfully query data with the same Spark thrift server environment.

Also I tested creating and inserting data with the same table path(using spark-sql), and the data can be queried via Spark thrift server. So it might not be related to authority problems.

YannByron

YannByron commented on Aug 12, 2022

@YannByron
Contributor

@paul8263 Check whether these configs is right on spark thrift server.
spark.sql.extensions -> org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.sql.catalog.spark_catalog -> org.apache.spark.sql.hudi.catalog.HoodieCatalog

paul8263

paul8263 commented on Aug 15, 2022

@paul8263
ContributorAuthor

Hi @YannByron ,

Thank you very much for your reply.
It seems that hudi-spark3.1-bundle_2.12-0.11.1.jar does not contain org.apache.spark.sql.hudi.catalog.HoodieCatalog, as the Spark version is 3.1.1. So I added spark.serializer=org.apache.spark.serializer.KryoSerializer to Spark thrift server conf. But it still does not work.

Does it mean that Spark 3.1.1 thrift server cannot support Hudi create table statement?

nsivabalan

nsivabalan commented on Aug 28, 2022

@nsivabalan
Contributor

@YannByron : we dont' need

spark.sql.extensions -> org.apache.spark.sql.hudi.HoodieSparkSessionExtension
spark.sql.catalog.spark_catalog -> org.apache.spark.sql.hudi.catalog.HoodieCatalog

for spark 3.1 right? I thought its mandatory only for 3.2 and above.

xushiyan

xushiyan commented on Oct 29, 2022

@xushiyan
Member

this is mostly due to setup and config issue. I've tried the following locally and it worked with latest master version.

# download derby and spark and install to /opt
export DERBY_HOME=/opt/db-derby-10.14.1.0-bin
export SPARK_HOME=/opt/spark-3.1.3-bin-hadoop3.2

# copy derby driver jars to spark
cp $DERBY_HOME/lib/{derby,derbyclient}.jar $SPARK_HOME/jars/

# start derby local server
$DERBY_HOME/bin/startNetworkServer -h 0.0.0.0

# start spark thrift server
$SPARK_HOME/sbin/start-thriftserver.sh \
--jars hudi-spark3.1-bundle_2.12-0.13.0-SNAPSHOT.jar \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
--conf spark.sql.warehouse.dir=/tmp/hudi/hive/warehouse \
--hiveconf hive.aux.jars.path=hudi-hadoop-mr-bundle-0.13.0-SNAPSHOT.jar \
--hiveconf hive.metastore.warehouse.dir=/tmp/hudi/hive/warehouse \
--hiveconf hive.metastore.schema.verification=false \
--hiveconf datanucleus.schema.autoCreateAll=true \
--hiveconf javax.jdo.option.ConnectionDriverName=org.apache.derby.jdbc.ClientDriver \
--hiveconf javax.jdo.option.ConnectionURL=jdbc:derby://localhost:1527/default;create=true \

# start beeline to query
$SPARK_HOME/bin/beeline \
--hiveconf hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat \
-u jdbc:hive2://localhost:10000/default;user=${USER};password=

Then run query

create table mytable (
  id bigint,
  name string,
  ts bigint,
  dt string,
  hh string
) using hudi
tblproperties (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'ts'
 )
partitioned by (dt, hh)
location '/tmp/mytable';

it showed the created table

0: jdbc:hive2://localhost:10000/default> show tables;
+-----------+------------+--------------+
| database  | tableName  | isTemporary  |
+-----------+------------+--------------+
| default   | mytable    | false        |
+-----------+------------+--------------+
3 rows selected (0.059 seconds)

will proceed to close this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

priority:majordegraded perf; unable to move forward; potential bugssparkIssues related to sparkspark-sql

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @nsivabalan@xushiyan@paul8263@YannByron@KnightChess

      Issue actions

        [SUPPORT] Cannot create table via Spark thrift server · Issue #6185 · apache/hudi