pyspark rename columns
There can be multiple reason to rename the Spark Data frame . Even though withColumnRenamed can be used to rename at the root level , it wont work at the nested column. This can be achieved but using the following method.
Example
df = spark.createDataFrame( [Row(struct_type_A=Row(col_1=99, col_2=88.21), struct_type_B=Row(col_3="New York",col_4=True) )]) df.printSchema() root |-- struct_type_A: struct (nullable = true) | |-- col_1: long (nullable = true) | |-- col_2: double (nullable = true) |-- struct_type_B: struct (nullable = true) | |-- col_3: string (nullable = true) | |-- col_4: boolean (nullable = true)
For changing the names of nested columns, the following method can be used.
You can create a new schema with StructType(), and you need to use the type casting on the original struct column that you already defined.
from pyspark.sql.types import * new_struct = StructType([ StructField("new_Column_1", LongType()), StructField("new_Column_2", DoubleType()) ]) df_renamed = df.withColumn("struct_type_A", col("struct_type_A").cast(new_struct)).printSchema()
root |-- struct_type_A: struct (nullable = true) | |-- new_Column_1: long (nullable = true) | |-- col_2: double (nullable = true) |-- struct_type_B: struct (nullable = true) | |-- col_3: string (nullable = true) | |-- col_4: boolean (nullable = true)