In PySpark it you can define a schema and read data sources with this pre-defined schema, e. g.:
Schema = StructType([ StructField("temperature", DoubleType(), True),
StructField("temperature_unit", StringType(), True),
StructField("humidity", DoubleType(), True),
StructField("humidity_unit", StringType(), True),
StructField("pressure", DoubleType(), True),
StructField("pressure_unit", StringType(), True)
])
For some datasources it is possible to infer the schema from the data-source and get a dataframe with this schema definition.
Is it possible to get the schema definition (in the form described above) from a dataframe, where the data has been inferred before?
df.printSchema()
prints the schema as a tree, but I need to reuse the schema, having it defined as above,so I can read a data-source with this schema that has been inferred before from another data-source.