Category: spark

Spark User full article

PySpark @ Freshers.in

How to removes duplicate values from array in PySpark

This blog will show you , how to remove the duplicates in an column with array elements. Consider the below example….

Continue Reading How to removes duplicate values from array in PySpark
AWS Glue @ Freshers.in

What are the Python libraries provided by AWS Glue Version 2.0

The defaults Python libraries available in AWS Glue version 2.0 are as below boto3==1.12.4 botocore==1.15.4 certifi==2019.11.28 chardet==3.0.4 cycler==0.10.0 Cython==0.29.15 docutils==0.15.2…

Continue Reading What are the Python libraries provided by AWS Glue Version 2.0
PySpark @ Freshers.in

PySpark – groupby with aggregation (count, sum, mean, min, max)

pyspark.sql.DataFrame.groupBy PySpark groupby functions groups the DataFrame using the specified columns to run aggregation ( count,sum,mean, min, max) on them….

Continue Reading PySpark – groupby with aggregation (count, sum, mean, min, max)