Tag: big_data_interview

PySpark @ Freshers.in

Optimizing PySpark queries with adaptive query execution – (AQE) – Example included

Spark 3+ brought numerous enhancements and features, and one of the notable ones is Adaptive Query Execution (AQE). AQE is…

Continue Reading Optimizing PySpark queries with adaptive query execution – (AQE) – Example included
AWS Glue @ Freshers.in

Navigating job dependencies in AWS glue – Managing ETL workflows

AWS Glue manages dependencies between jobs using triggers. Triggers can start jobs based on the completion status of other jobs,…

Continue Reading Navigating job dependencies in AWS glue – Managing ETL workflows
PySpark @ Freshers.in

Spark repartition() vs coalesce() – A complete information

In PySpark, managing data across different partitions is crucial for optimizing performance, especially for large-scale data processing tasks. Two methods…

Continue Reading Spark repartition() vs coalesce() – A complete information