createOrReplaceTempView
There are many scenario in which you can do the transformation using sql instead of direct spark dataframe operations. There can be also some sql migration task that you can convert the sql to spark SQL.
Function
pyspark.sql.DataFrame.createOrReplaceTempView
createOrReplaceTempView will creates or replaces a local temporary view with the dataframe mentioned. This temp views lifetime is tied to the SparkSession. In short , the lifetime of this temporary table is tied to the SparkSession that was used to create this DataFrame. Once you created a table you can drop the table before the spark session exit with dropTempView command as well ( spark.catalog.dropTempView(“table_name _to_drop”))
Example
from pyspark.sql import SparkSession from pyspark.sql.types import MapType,StringType,IntegerType from pyspark.sql.types import StructType,StructField, StringType spark = SparkSession.builder.appName('www.freshers.in training').getOrCreate() #State sample data state_data = ([ (1,"California",90201,"Los Angeles County"),\ (2,"California",90213,"Los Angeles County"),\ (3,"California",90222,"Los Angeles County"),\ (4,"California",90295,"Los Angeles County"),\ (5,"California",90240,"Los Angeles County")]) #Zip code and city name zip_city = [ (90201,"Bell"), (90213,"Beverly Hills"), (90222,"Compton"), (90295,"Venice"), (90240,"Downey")] #Predefined schema for state_date state_data_schema=StructType([ StructField("si_no",IntegerType(),True), StructField("state",StringType(),True), StructField("zip_code",IntegerType(),True), StructField("county",StringType(),True)]) #Predefined schema for zipcode and city zip_city_schema=StructType([ StructField("zip_code",IntegerType(),True), StructField("city_name",StringType(),True)]) state_data_df = spark.createDataFrame(data=state_data,schema=state_data_schema) zip_city_df = spark.createDataFrame(data=zip_city,schema=zip_city_schema) state_data_df.show(20,False) zip_city_df.show(20,False) #Creating temp tables state_data_df.createOrReplaceTempView("tbl_state_date") zip_city_df.createOrReplaceTempView("tbl_zip_city") #Executing as SQL spark.sql("select * from tbl_state_date").show(20,False) #Executing SQL with Sample Join spark.sql("select st.si_no,st.state,st.zip_code,z.city_name,st.county from tbl_state_date st join tbl_zip_city z on st.zip_code = z.zip_code").show(20,False) spark.catalog.dropTempView("tbl_state_date") spark.catalog.dropTempView("tbl_zip_city")
Reference