In this article, we will explore the use of subtractByKey in PySpark, a transformation that returns an RDD consisting of key-value pairs from one RDD by removing any pair that has a key present in another RDD. We will provide a detailed example using hardcoded values as input.
First, let’s create two PySpark RDDs
#Using subtractByKey in PySpark @Freshers.in
from pyspark import SparkContext
sc = SparkContext("local", "subtractByKey @ Freshers.in ")
data1 = [("America", 1), ("Botswana", 2), ("Costa Rica", 3), ("Denmark", 4), ("Egypt", 5)]
data2 = [("Botswana", 20), ("Denmark", 40), ("Finland", 60)]
rdd1 = sc.parallelize(data1)
rdd2 = sc.parallelize(data2)
Using subtractByKey
Now, let’s use the subtractByKey method to create a new RDD by removing key-value pairs from rdd1 that have keys present in rdd2:
result_rdd = rdd1.subtractByKey(rdd2)
result_data = result_rdd.collect()
print("Result of subtractByKey:")
for element in result_data:
print(element)
In this example, we used the subtractByKey method on rdd1 and passed rdd2 as an argument. The method returns a new RDD containing key-value pairs from rdd1 after removing any pair with a key present in rdd2. The collect method is then used to retrieve the results.
Interpreting the Results
Result of subtractByKey:
('Costa Rica', 3)
('America', 1)
('Egypt', 5)
The resulting RDD contains key-value pairs from rdd1 with the key-value pairs having keys “Botswana” and “Denmark” removed, as these keys are present in rdd2.
In this article, we explored the use of subtractByKey in PySpark, a transformation that returns an RDD consisting of key-value pairs from one RDD by removing any pair that has a key present in another RDD. We provided a detailed example using hardcoded values as input, showcasing how to create two RDDs with key-value pairs, use the subtractByKey method, and interpret the results. subtractByKey can be useful in various scenarios, such as filtering out unwanted data based on keys or performing set-like operations on key-value pair RDDs.
Spark important urls to refer