If you have a situation that you can easily get the result using SQL/ SQL…
Category: article
PySpark : Inserting row in Apache Spark Dataframe.
In PySpark, you can insert a row into a DataFrame by first converting the DataFrame to a RDD (Resilient Distributed…
PySpark : How to write Scala code in spark shell ?
To write Scala code in the Spark shell, you can simply start the Spark shell by running the command “spark-shell”…
PySpark : What happens once you do a spark submit command ?
When you submit a Spark application using the spark-submit command, a series of steps occur to start and execute the…
PySpark : What is predicate pushdown in Spark and how to enable it ?
Predicate pushdown is a technique used in Spark to filter data as early as possible in the query execution process,…
PySpark : How would you set the number of executors in any spark ? On what basis we will set the number of executors in a Spark?
The number of executors in a Spark-based application can be set by passing the –num-executors command line argument to the…
PySpark-What is map side join and How to perform map side join in Pyspark
Map-side join is a method of joining two datasets in PySpark where one dataset is broadcast to all executors, and…
Installing Apache Spark standalone on Linux
Installing Spark on a Linux machine can be done in a few steps. The following is a detailed guide on…
SQL : How to execute large dynamic query in SQL
There are a few ways to execute large dynamic queries in SQL, but one common method is to use a…
How to use if condition in spark SQL , explanation with example
In PySpark, you can use the if statement within a SQL query to conditionally return a value based on a…
What is GC (Garbage Collection) time in Spark UI ?
In the Spark UI, GC (Garbage Collection) time refers to the amount of time spent by the JVM (Java Virtual…