Tag: Big Data

PySpark @ Freshers.in

How to removes duplicate values from array in PySpark

This blog will show you , how to remove the duplicates in an column with array elements. Consider the below example….

Continue Reading How to removes duplicate values from array in PySpark
AWS Glue @ Freshers.in

What are the Python libraries provided by AWS Glue Version 2.0

The defaults Python libraries available in AWS Glue version 2.0 are as below boto3==1.12.4 botocore==1.15.4 certifi==2019.11.28 chardet==3.0.4 cycler==0.10.0 Cython==0.29.15 docutils==0.15.2…

Continue Reading What are the Python libraries provided by AWS Glue Version 2.0

Explain distributed cache in Hadoop ?

Distributed cache is a facility provided by Hadoop map reduce framework to access small file needed by application during its…

Continue Reading Explain distributed cache in Hadoop ?