PySpark : Combine two or more arrays into a single array of tuple

user January 18, 2023 Leave a Comment

pyspark.sql.functions.arrays_zip

In PySpark, the arrays_zip function can be used to combine two or more arrays into a single array of tuple. Each tuple in the resulting array contains elements from the corresponding position in the input arrays. This will returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

from pyspark.sql.functions import arrays_zip
df = spark.createDataFrame([(([1, 2, 3], ['Sam John', 'Perter Walter', 'Johns Mike']))], ['si_no', 'name'])
df.show(20,False)

+---------+-------------------------------------+
|si_no    |name                                 |
+---------+-------------------------------------+
|[1, 2, 3]|[Sam John, Perter Walter, Johns Mike]|
+---------+-------------------------------------+

zipped_array = df.select(arrays_zip(df.si_no,df.name))
zipped_array.show(20,False)

Result

zipped_array = df.select(arrays_zip(df.si_no,df.name))
zipped_array.show(20,False)

You can also use arrays_zip with more than two arrays as input. For example:

from pyspark.sql.functions import arrays_zip
df = spark.createDataFrame([(([1, 2, 3], ['Sam John', 'Perter Walter', 'Johns Mike'],[23,43,41]))], ['si_no', 'name','age'])
zipped_array = df.select(arrays_zip(df.si_no,df.name,df.age))
zipped_array.show(20,False)

Result

+----------------------------------------------------------------+
|arrays_zip(si_no, name, age)                                    |
+----------------------------------------------------------------+
|[[1, Sam John, 23], [2, Perter Walter, 43], [3, Johns Mike, 41]]|
+----------------------------------------------------------------+

Spark important urls

Post Views: 304

Author: user

PySpark : Combine two or more arrays into a single array of tuple

pyspark.sql.functions.arrays_zip

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

pyspark.sql.functions.arrays_zip

Related Posts

Related Articles

Leave a Reply Cancel reply

Trending

Recent Posts

Featured Posts – Slider Widget