In PySpark, you can set various parameters to configure your Spark application. These parameters can be set in different ways depending on your use case. Here are a few examples:
Setting Spark properties programmatically:
You can set Spark properties directly in your PySpark code using the SparkConf object. Here’s an example:
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
conf = SparkConf()
conf.setAppName("MyPySparkApp")
conf.setMaster("local[2]")
conf.set("spark.executor.memory", "32g")
conf.set("spark.driver.memory", "32g")
spark = SparkSession.builder.config(conf=conf).getOrCreate()
In this example, we create a SparkConf object and set several properties using the set method. We set the application name, the master URL to run the application locally with 2 cores, and the amount of memory allocated for both the driver and the executor to 32GB. Finally, we pass the conf object to the SparkSession builder using the config method to create a Spark session.
Setting Spark properties through spark-defaults.conf file:
You can also set Spark properties by creating a spark-defaults.conf file and placing it in the conf directory of your Spark installation. In this file, you can specify Spark properties and their values, one per line. For example:
spark.app.name=MyPySparkApp
spark.master=local[2]
spark.executor.memory=32g
spark.driver.memory=32g
In this case, we set the same properties as in the previous example, but through a configuration file. Note that you need to make sure the spark-defaults.conf file is properly placed and configured to take effect.
Setting Spark properties through command-line options:
You can also set Spark properties through command-line options when you run your PySpark application. For example:
pyspark --conf spark.app.name=MyPySparkApp --conf spark.master=local[2] --conf spark.executor.memory=32g --conf spark.driver.memory=32g
In this case, we use the –conf option followed by the property name and value to set the same properties as in the previous examples.
Spark important urls to refer