Tag: PySpark

PySpark @ Freshers.in

PySpark-What is map side join and How to perform map side join in Pyspark

user January 28, 2023 0 Comments

Map-side join is a method of joining two datasets in PySpark where one dataset is broadcast to all executors, and…

Continue Reading

PySpark @ Freshers.in

Installing Apache Spark standalone on Linux

user January 28, 2023 0 Comments

Installing Spark on a Linux machine can be done in a few steps. The following is a detailed guide on…

Continue Reading

PySpark @ Freshers.in

How to use if condition in spark SQL , explanation with example

user January 28, 2023 0 Comments

In PySpark, you can use the if statement within a SQL query to conditionally return a value based on a…

Continue Reading

PySpark @ Freshers.in

What is GC (Garbage Collection) time in Spark UI ?

user January 27, 2023 0 Comments

In the Spark UI, GC (Garbage Collection) time refers to the amount of time spent by the JVM (Java Virtual…

Continue Reading

PySpark @ Freshers.in

PySpark : How do I read a parquet file in Spark

user January 27, 2023 0 Comments

To read a Parquet file in Spark, you can use the spark.read.parquet() method, which returns a DataFrame. Here is an…

Continue Reading

PySpark @ Freshers.in

Learn how to connect Hive with Apache Spark.

user January 27, 2023 0 Comments

HiveContext is a Spark SQL module that allows you to work with Hive data in Spark. It provides a way…

Continue Reading

PySpark @ Freshers.in

PySpark : Connecting and updating postgres table in spark SQL

user January 27, 2023 0 Comments

Apache Spark is an open-source, distributed computing system that can process large amounts of data quickly. Spark SQL is a…

Continue Reading

PySpark @ Freshers.in

Kafka streaming with PySpark – Things you need to know – With Example

user January 27, 2023 0 Comments

To use Kafka streaming with PySpark, you will need to have a good understanding of the following concepts: Kafka: Kafka…

Continue Reading

PySpark @ Freshers.in

How do you break a lineage in Apache Spark ? Why we need to break a lineage in Apache Spark ?

user January 27, 2023 0 Comments

In Apache Spark, a lineage refers to the series of RDD (Resilient Distributed Dataset) operations that are performed on a…

Continue Reading

PySpark @ Freshers.in

When you should not use Apache Spark ? Explain with reason.

user January 27, 2023 0 Comments

There are a few situations where it may not be appropriate to use Apache Spark, which is a powerful open-source…

Continue Reading

Copyright © 2025 Freshers.in