PySpark : Creation of data series with customizable parameters

user March 12, 2024

Series() enables users to create data series akin to its Pandas counterpart. Let’s delve into its functionality and explore practical examples to grasp its utility.

Understanding Series()

The Series() function in the Pandas API on Spark is designed to create data series, akin to Pandas Series, allowing users to manipulate and analyze data effectively. It offers customizable parameters to tailor the series to specific requirements, providing flexibility in data handling.

Syntax

Series([data, index, dtype, name, copy, ...])

data (optional): The data to initialize the series.
index (optional): The index for the series.
dtype (optional): The data type for the series.
name (optional): The name for the series.
copy (optional): Specifies whether to copy data or not.
Additional parameters for customizing the series.

Practical Examples

Let’s explore practical examples to understand how Series() functions and its versatility in data manipulation.

Example 1

from pyspark.sql import SparkSession
import pandas as pd
# Initialize Spark Session
spark = SparkSession.builder \
    .appName("series_example @ Freshers.in Learning") \
    .getOrCreate()
# Create a Pandas Series
data = [10, 20, 30, 40, 50]
index = ['A', 'B', 'C', 'D', 'E']
series = pd.Series(data, index=index)
# Convert to Pandas-on-Spark Series
sdf = spark.createDataFrame(pd.DataFrame({'data': data, 'index': index}))
# Show Series
sdf.show()

Output:

+----+-----+
|data|index|
+----+-----+
|  10|    A|
|  20|    B|
|  30|    C|
|  40|    D|
|  50|    E|
+----+-----+

Example 2: Customizing Series

# Create a Pandas Series with custom dtype and name
data = [10.5, 20.3, 30.7, 40.2, 50.9]
index = ['A', 'B', 'C', 'D', 'E']
dtype = 'float64'
name = 'MySeries'
series = pd.Series(data, index=index, dtype=dtype, name=name)
# Convert to Pandas-on-Spark Series
sdf = spark.createDataFrame(pd.DataFrame({'data': data, 'index': index}))
# Show Series
sdf.show()

Output:

+-----+-----+
| data|index|
+-----+-----+
| 10.5|    A|
| 20.3|    B|
| 30.7|    C|
| 40.2|    D|
| 50.9|    E|
+-----+-----+

Example 3: Copying Series

# Create a Pandas Series and copy it
data = {'A': 10, 'B': 20, 'C': 30, 'D': 40, 'E': 50}
series = pd.Series(data)
copy_series = series.copy()
# Convert to Pandas-on-Spark Series
sdf = spark.createDataFrame(pd.DataFrame({'data': data.values(), 'index': data.keys()}))
# Show Series
sdf.show()

Output:

+----+-----+
|data|index|
+----+-----+
|  10|    A|
|  20|    B|
|  30|    C|
|  40|    D|
|  50|    E|
+----+-----+

Spark important urls to refer

Post Views: 4

Author: user

PySpark : Creation of data series with customizable parameters

Understanding Series()

Syntax

Practical Examples

Example 1

Example 2: Customizing Series

Example 3: Copying Series

Trending

Recent Posts

Featured Posts – Slider Widget

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Impact of Shard Count Modification on AWS Kinesis Streams

How to map values of a Series according to an input correspondence:SSeries.map()

Understanding Series.transform(func[, axis])

Series.aggregate(func) : Pandas API on Spark

Series.agg(func) : Pandas API on Spark

Most Viewed Posts

Understanding Series()

Syntax

Practical Examples

Example 1

Example 2: Customizing Series

Example 3: Copying Series

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget