pyspark.sql.functions.arrays_overlap The arrays_overlap function is a PySpark function that allows you to check if two…
Category: article
How to get the common elements from two arrays in two columns in PySpark (array_intersect)
array_intersect When you want to get the common elements from two arrays in two columns in PySpark you can use…
How to find difference between two arrays in PySpark(array_except)
array_except In PySpark , array_except will returns an array of the elements in one column but not in another column…
How to convert Array elements to Rows in PySpark ? PySpark – Explode Example code.
Function : pyspark.sql.functions.explode To converts the Array of Array Columns to row in PySpark we use “explode” function. Explode returns…
How to find array contains a given value or values using PySpark ( PySpark search in array)
array_contains You can find specific value/values in an array using spark sql function array_contains. array_contains(array, value) will return true if…
How to removes duplicate values from array in PySpark
This blog will show you , how to remove the duplicates in an column with array elements. Consider the below example….
How to extract Date, Time into two separate cell from Timestamp in Excel ?
Consider you have a date timestamp value in cell A1 A1 = 01/21/2022 11:35 PM You want to extract date…
What are the Python libraries provided by AWS Glue Version 2.0
The defaults Python libraries available in AWS Glue version 2.0 are as below boto3==1.12.4 botocore==1.15.4 certifi==2019.11.28 chardet==3.0.4 cycler==0.10.0 Cython==0.29.15 docutils==0.15.2…
How to add additional Python Libraries in a AWS Glue Development Endpoint
There are multiple scenario that you may need to use different set of python libraries in your python code or…
AWS Glue : Example on how to read a sample csv file with PySpark
Reading a sample csv file using PySpark Here assume that you have your CSV data in AWS S3 bucket. The…
How to renaming Spark Dataframe having a complex schema with AWS Glue – PySpark: pyspark rename columns
pyspark rename columns There can be multiple reason to rename the Spark Data frame . Even though withColumnRenamed can be…