concat_ws : With concat_ws () function you can concatenates multiple input string columns together into a single string column, using the given separator. You need to include concat_ws from pyspark.sql.functions
from pyspark.sql import SparkSession from pyspark.sql.functions import concat_ws spark = SparkSession.builder.appName("Concatenate Sample @ Freshers.in ").getOrCreate() raw_data =[('01','BMW','1-Series','2021'), ('02','BMW','3-Series','2021'), ('03','BMW','5-Series','2021'), ('04','BMW','7-Series','2021'), ('05','Audi','Q3','2021'), ('06','Audi','Q5','2021'), ('07','Audi','Q7','2021'), ('08','Benz','GLS','2021')] raw_data_column_name = ["ID","Make","Model","Year"] df = spark.createDataFrame(data=raw_data, schema = raw_data_column_name) df.show(truncate=False) df2=df.select(concat_ws('_',df.ID,df.Make,df.Model).\ alias("concatenated_value"),"ID","Make","Model") df2.show(truncate=False)
Result : Before and After concat_ws
>>> df.show(truncate=False) [Stage 6:> (0 + 0) / 1] +---+----+--------+----+ |ID |Make|Model |Year| +---+----+--------+----+ |01 |BMW |1-Series|2021| |02 |BMW |3-Series|2021| |03 |BMW |5-Series|2021| |04 |BMW |7-Series|2021| |05 |Audi|Q3 |2021| |06 |Audi|Q5 |2021| |07 |Audi|Q7 |2021| |08 |Benz|GLS |2021| +---+----+--------+----+ >>> df2=df.select(concat_ws('_',df.ID,df.Make,df.Model).alias("concatenated_value"),"ID","Make","Model") >>> df2.show(truncate=False) +------------------+---+----+--------+ |concatenated_value|ID |Make|Model | +------------------+---+----+--------+ |01_BMW_1-Series |01 |BMW |1-Series| |02_BMW_3-Series |02 |BMW |3-Series| |03_BMW_5-Series |03 |BMW |5-Series| |04_BMW_7-Series |04 |BMW |7-Series| |05_Audi_Q3 |05 |Audi|Q3 | |06_Audi_Q5 |06 |Audi|Q5 | |07_Audi_Q7 |07 |Audi|Q7 | |08_Benz_GLS |08 |Benz|GLS | +------------------+---+----+--------+