How to get the common elements from two arrays in two columns in PySpark (array_intersect)

array_intersect When you want to get the common elements from two arrays in two columns in PySpark you can use…

How to find difference between two arrays in PySpark(array_except)

array_except In PySpark , array_except will returns an array of the elements in one column but not in another column…

How to convert Array elements to Rows in PySpark ? PySpark – Explode Example code.

Function : pyspark.sql.functions.explode To converts the Array of Array Columns to row in PySpark we use “explode” function. Explode returns…

How to find array contains a given value or values using PySpark ( PySpark search in array)

array_contains You can find specific value/values in an array using spark sql function array_contains. array_contains(array, value) will return true if…

How to removes duplicate values from array in PySpark

This blog will show you , how to remove the duplicates in an column with array elements.¬†Consider the below example….

What are the Python libraries provided by AWS Glue Version 2.0

The defaults Python libraries available in AWS Glue version 2.0 are as below boto3==1.12.4 botocore==1.15.4 certifi==2019.11.28 chardet==3.0.4 cycler==0.10.0 Cython==0.29.15 docutils==0.15.2…

AWS Glue : Example on how to read a sample csv file with PySpark

Here assume that you have your CSV data in AWS S3 bucket. The next step is the crawl the data…

How to renaming Spark Dataframe having a complex schema with AWS Glue – PySpark

There can be multiple reason to rename the Spark Data frame . Even though withColumnRenamed can be used to rename…

PySpark how to get rows having nulls for a column or columns without nulls or count of Non null

pyspark.sql.Column.isNotNull isNotNull() : True if the current expression is NOT null. isNull() :¬†True if the current expression is null. With…