pyspark.sql.functions.decode The pyspark.sql.functions.decode Function in PySpark PySpark is a popular library for processing big data…
Tag: Big Data
Kafka streaming with PySpark – Things you need to know – With Example
To use Kafka streaming with PySpark, you will need to have a good understanding of the following concepts: Kafka: Kafka…
How do you break a lineage in Apache Spark ? Why we need to break a lineage in Apache Spark ?
In Apache Spark, a lineage refers to the series of RDD (Resilient Distributed Dataset) operations that are performed on a…
When you should not use Apache Spark ? Explain with reason.
There are a few situations where it may not be appropriate to use Apache Spark, which is a powerful open-source…
PySpark : How to create a map from a column of structs : map_from_entries
pyspark.sql.functions.map_from_entries map_from_entries(col) is a function in PySpark that creates a map from a column of structs, where the structs have…
PySpark : Converting Unix timestamp to a string representing the timestamp in a specific format
pyspark.sql.functions.from_unixtime The “from_unixtime()” function is a PySpark function that allows you to convert a Unix timestamp (a long integer representing…
PySpark : Check if two or more arrays in a DataFrame column have any common elements
pyspark.sql.functions.arrays_overlap The arrays_overlap function is a PySpark function that allows you to check if two or more arrays in a…
PySpark : Combine the elements of two or more arrays in a DataFrame column
pyspark.sql.functions.array_union The array_union function is a PySpark function that allows you to combine the elements of two or more arrays…
PySpark : Sort an array of elements in a DataFrame column
pyspark.sql.functions.array_sort The array_sort function is a PySpark function that allows you to sort an array of elements in a DataFrame…
PySpark : How to sort a dataframe column in ascending order while putting the null values first ?
pyspark.sql.Column.asc_nulls_first In PySpark, the asc_nulls_first() function is used to sort a column in ascending order while putting the null values…
PySpark : How to number up to the nearest integer
pyspark.sql.functions.ceil In PySpark, the ceil() function is used to round a number up to the nearest integer. This function is…