Tag: Spark_Interview

PySpark @ Freshers.in

PySpark-How to returns the first column that is not null

pyspark.sql.functions.coalesce If you want to return the first non zero from list of column you can use coalesce function in…

PySpark @ Freshers.in

How can you convert PySpark Dataframe to JSON ?

pyspark.sql.DataFrame.toJSON There may be some situation that you need to send your dataframe to a file to a server or…

PySpark @ Freshers.in

How can I see the full column values in a Spark Dataframe ?

When we do a dataframe.show () , we can see that some of the column values got truncated. Here we…

PySpark @ Freshers.in

What is the difference between repartition() and coalesce() ?

The repartition algorithm will perform a full shuffle and creates new partitions with data that’s distributed evenly.┬áThe repartition algorithm makes…

PySpark @ Freshers.in

Converts a column containing a StructType, ArrayType or a MapType into a JSON string-PySpark(to_json)

You can convert a column containing a StructType, ArrayType or a MapType into a JSON string using to_json function. pyspark.sql.functions.to_json…

PySpark @ Freshers.in

How to round the given value to scale decimal places using HALF_EVEN rounding in Spark – PySpark

bround function bround function returns the rounded expr using HALF_EVEN rounding mode. That means bround will round the given value…

PySpark @ Freshers.in

How to create UDF in PySpark ? What are the different ways you can call PySpark UDF ( With example)

PySpark UDF PySpark UDF is used to extend the PySpark build in capabilities. UDF (User Defined Functions) are used to…

PySpark @ Freshers.in

How to convert MapType to multiple columns based on Key using PySpark ?

Use case : Converting Map to multiple columns. There can be raw data with Maptype with multiple key value pair….

PySpark @ Freshers.in

What is the difference between concat and concat_ws in Pyspark

concat vs concat_ws Syntax: pyspark.sql.functions.concat(*cols) pyspark.sql.functions.concat_ws(sep, *cols) concat : concat concatenates multiple input columns together into a single column. The…