Category: spark

Spark User full article

PySpark @

How to get the common elements from two arrays in two columns in PySpark (array_intersect)

array_intersect When you want to get the common elements from two arrays in two columns in PySpark you can use…

PySpark @

How to find difference between two arrays in PySpark(array_except)

array_except In PySpark , array_except will returns an array of the elements in one column but not in another column…

PySpark @

How to convert Array elements to Rows in PySpark ? PySpark – Explode Example code.

Function : pyspark.sql.functions.explode To converts the Array of Array Columns to row in PySpark we use “explode” function. Explode returns…

PySpark @

How to find array contains a given value or values using PySpark ( PySpark search in array)

array_contains You can find specific value/values in an array using spark sql function array_contains. array_contains(array, value) will return true if…

PySpark @

How to removes duplicate values from array in PySpark

This blog will show you , how to remove the duplicates in an column with array elements.¬†Consider the below example….

AWS Glue @

What are the Python libraries provided by AWS Glue Version 2.0

The defaults Python libraries available in AWS Glue version 2.0 are as below boto3==1.12.4 botocore==1.15.4 certifi==2019.11.28 chardet==3.0.4 cycler==0.10.0 Cython==0.29.15 docutils==0.15.2…

PySpark @

AWS Glue : Example on how to read a sample csv file with PySpark

Here assume that you have your CSV data in AWS S3 bucket. The next step is the crawl the data…

PySpark @

How to renaming Spark Dataframe having a complex schema with AWS Glue – PySpark

There can be multiple reason to rename the Spark Data frame . Even though withColumnRenamed can be used to rename…

PySpark @

PySpark how to get rows having nulls for a column or columns without nulls or count of Non null

pyspark.sql.Column.isNotNull isNotNull() : True if the current expression is NOT null. isNull() :¬†True if the current expression is null. With…