Map-side join is a method of joining two datasets in PySpark where one dataset is…
Tag: PySpark
PySpark : How to create a map from a column of structs : map_from_entries
pyspark.sql.functions.map_from_entries map_from_entries(col) is a function in PySpark that creates a map from a column of structs, where the structs have…
PySpark : Converting Unix timestamp to a string representing the timestamp in a specific format
pyspark.sql.functions.from_unixtime The “from_unixtime()” function is a PySpark function that allows you to convert a Unix timestamp (a long integer representing…
PySpark : Check if two or more arrays in a DataFrame column have any common elements
pyspark.sql.functions.arrays_overlap The arrays_overlap function is a PySpark function that allows you to check if two or more arrays in a…
PySpark : Combine the elements of two or more arrays in a DataFrame column
pyspark.sql.functions.array_union The array_union function is a PySpark function that allows you to combine the elements of two or more arrays…
PySpark : Sort an array of elements in a DataFrame column
pyspark.sql.functions.array_sort The array_sort function is a PySpark function that allows you to sort an array of elements in a DataFrame…
PySpark : How to sort a dataframe column in ascending order while putting the null values first ?
pyspark.sql.Column.asc_nulls_first In PySpark, the asc_nulls_first() function is used to sort a column in ascending order while putting the null values…
PySpark : How to number up to the nearest integer
pyspark.sql.functions.ceil In PySpark, the ceil() function is used to round a number up to the nearest integer. This function is…
Learn about PySparks broadcast variable with example
In PySpark, the broadcast variable is used to cache a read-only variable on all the worker nodes, which can be…
PySpark : Removing all occurrences of a specified element from an array column in a DataFrame
pyspark.sql.functions.array_remove Syntax pyspark.sql.functions.array_remove(col, element) pyspark.sql.functions.array_remove is a function that removes all occurrences of a specified element from an array column…
PySpark : Finding the position of a given value in an array column.
pyspark.sql.functions.array_position The array_position function is used to find the position of a given value in an array column. This is…