In PySpark, spark.table() is used to read a table from the Spark catalog, whereas spark.read.table()…
Category: spark
Spark User full article
PySpark : What happens once you do a spark submit command ?
When you submit a Spark application using the spark-submit command, a series of steps occur to start and execute the…
PySpark : What is predicate pushdown in Spark and how to enable it ?
Predicate pushdown is a technique used in Spark to filter data as early as possible in the query execution process,…
PySpark : How would you set the number of executors in any spark ? On what basis we will set the number of executors in a Spark?
The number of executors in a Spark-based application can be set by passing the –num-executors command line argument to the…
PySpark-What is map side join and How to perform map side join in Pyspark
Map-side join is a method of joining two datasets in PySpark where one dataset is broadcast to all executors, and…
Installing Apache Spark standalone on Linux
Installing Spark on a Linux machine can be done in a few steps. The following is a detailed guide on…
How to use if condition in spark SQL , explanation with example
In PySpark, you can use the if statement within a SQL query to conditionally return a value based on a…
What is GC (Garbage Collection) time in Spark UI ?
In the Spark UI, GC (Garbage Collection) time refers to the amount of time spent by the JVM (Java Virtual…
PySpark : How do I read a parquet file in Spark
To read a Parquet file in Spark, you can use the spark.read.parquet() method, which returns a DataFrame. Here is an…
Learn how to connect Hive with Apache Spark.
HiveContext is a Spark SQL module that allows you to work with Hive data in Spark. It provides a way…
PySpark : Connecting and updating postgres table in spark SQL
Apache Spark is an open-source, distributed computing system that can process large amounts of data quickly. Spark SQL is a…