Tag: PySpark

PySpark @ Freshers.in

Convert data from the PySpark DataFrame columns to Row format or get elements in columns in row

pyspark.sql.functions.collect_list(col) This is an aggregate function and returns a list of objects with duplicates. To retrieve the data from the PySpark…

PySpark @ Freshers.in

PySpark: How to add months to a date column in Spark DataFrame (add_months)

I have a use case where I want to add months to a date column in spark DataFrame Function :…

PySpark @ Freshers.in

PySpark how to find the date difference between two date and how to round it just days without decimal (datediff,floor)

pyspark.sql.functions.datediff and pyspark.sql.functions.floor In this article we will learn two function , mainly datediff and floor. pyspark.sql.functions.datediff : To get…

PySpark @ Freshers.in

PySpark – How to convert string date to Date datatype

pyspark.sql.functions.to_date In this article will give you brief on how can you convert string date to Date datatype . With…

PySpark @ Freshers.in

PySpark-How to returns the first column that is not null

pyspark.sql.functions.coalesce If you want to return the first non zero from list of column you can use coalesce function in…

PySpark @ Freshers.in

How can you convert PySpark Dataframe to JSON ?

pyspark.sql.DataFrame.toJSON There may be some situation that you need to send your dataframe to a file to a server or…

PySpark @ Freshers.in

How can I see the full column values in a Spark Dataframe ?

When we do a dataframe.show () , we can see that some of the column values got truncated. Here we…

PySpark @ Freshers.in

What is the difference between repartition() and coalesce() ?

The repartition algorithm will perform a full shuffle and creates new partitions with data that’s distributed evenly. The repartition algorithm makes…