pyspark.sql.functions.arrays_zip In PySpark, the arrays_zip function can be used to combine two or more arrays…
Category: spark
Spark User full article
How to find difference between two arrays in PySpark(array_except)
array_except In PySpark , array_except will returns an array of the elements in one column but not in another column…
How to convert Array elements to Rows in PySpark ? PySpark – Explode Example code.
Function : pyspark.sql.functions.explode To converts the Array of Array Columns to row in PySpark we use “explode” function. Explode returns…
How to find array contains a given value or values using PySpark ( PySpark search in array)
array_contains You can find specific value/values in an array using spark sql function array_contains. array_contains(array, value) will return true if…
How to removes duplicate values from array in PySpark
This blog will show you , how to remove the duplicates in an column with array elements. Consider the below example….
What are the Python libraries provided by AWS Glue Version 2.0
The defaults Python libraries available in AWS Glue version 2.0 are as below boto3==1.12.4 botocore==1.15.4 certifi==2019.11.28 chardet==3.0.4 cycler==0.10.0 Cython==0.29.15 docutils==0.15.2…
AWS Glue : Example on how to read a sample csv file with PySpark
Reading a sample csv file using PySpark Here assume that you have your CSV data in AWS S3 bucket. The…
How to renaming Spark Dataframe having a complex schema with AWS Glue – PySpark: pyspark rename columns
pyspark rename columns There can be multiple reason to rename the Spark Data frame . Even though withColumnRenamed can be…
PySpark – How to read a text file as RDD using Spark3 and Display the result in Windows 10
Here we will see how to read a sample text file as RDD using Spark Environment and version which we…
PySpark how to get rows having nulls for a column or columns without nulls or count of Non null
pyspark.sql.Column.isNotNull isNotNull() : True if the current expression is NOT null. isNull() : True if the current expression is null. With…
PySpark – groupby with aggregation (count, sum, mean, min, max)
pyspark.sql.DataFrame.groupBy PySpark groupby functions groups the DataFrame using the specified columns to run aggregation ( count,sum,mean, min, max) on them….