PySpark’s map_values Function : Extract the values from a map column.

user October 30, 2023

In PySpark’s realm, the map_values function is employed to extract the values from a map column. Drawing a parallel to Python, it’s akin to invoking .values() on a dictionary. However, map_values operates at a DataFrame level, targeting individual columns.

Use map_values for

Value Analysis: To understand the distribution or characteristics of values in a map column.

Data Transformation: Before reshaping values into distinct columns or rows.

Filtering Data: To curate rows based on the content or absence of specific values in a map column.

Advantages of map_values:

Performance: Given Spark’s distributed nature, map_values can process mammoth datasets swiftly.

Intuitive: Its use brings clarity and precision to PySpark code, enhancing readability.

Flexibility: Seamless integration with other DataFrame operations allows for comprehensive data processing.

from pyspark.sql import SparkSession
from pyspark.sql.functions import map_values
# Setting up Spark Session
spark = SparkSession.builder.appName("map_values_demo Learning @ Freshers.in").getOrCreate()
# Crafting a DataFrame with a map column
data = [(1, {"Sachin": 10, "India": 20}),
        (2, {"Ramesh": 30, "USA": 40}),
        (3, {"Raju": 50, "Ireland": 60})]
df = spark.createDataFrame(data, ["id", "country"])
df.show(20,False)
# Deploying map_values to extract the values from the map column
df_values = df.select("id", map_values(df["country"]).alias("age"))
df_values.show(20,False)

Output

+---+---------------------------+
|id |country                    |
+---+---------------------------+
|1  |{India -> 20, Sachin -> 10}|
|2  |{USA -> 40, Ramesh -> 30}  |
|3  |{Raju -> 50, Ireland -> 60}|
+---+---------------------------+

+---+--------+
|id |age     |
+---+--------+
|1  |[20, 10]|
|2  |[40, 30]|
|3  |[50, 60]|
+---+--------+

Spark important urls to refer

Post Views: 22

Author: user

PySpark’s map_values Function : Extract the values from a map column.

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget