Tag: Big Data

Coping files from Hadoop’s HDFS (Hadoop Distributed File System) to your local machine

To copy files from Hadoop’s HDFS (Hadoop Distributed File System) to your local machine, you can use the hadoop fs…

Continue Reading Coping files from Hadoop’s HDFS (Hadoop Distributed File System) to your local machine
PySpark @ Freshers.in

Optimizing PySpark queries with adaptive query execution – (AQE) – Example included

Spark 3+ brought numerous enhancements and features, and one of the notable ones is Adaptive Query Execution (AQE). AQE is…

Continue Reading Optimizing PySpark queries with adaptive query execution – (AQE) – Example included
AWS Glue @ Freshers.in

Navigating job dependencies in AWS glue – Managing ETL workflows

AWS Glue manages dependencies between jobs using triggers. Triggers can start jobs based on the completion status of other jobs,…

Continue Reading Navigating job dependencies in AWS glue – Managing ETL workflows
PySpark @ Freshers.in

Spark repartition() vs coalesce() – A complete information

In PySpark, managing data across different partitions is crucial for optimizing performance, especially for large-scale data processing tasks. Two methods…

Continue Reading Spark repartition() vs coalesce() – A complete information