Spark : How to reveal the underlying data’s dimensions – Series.axes

user February 14, 2024

When dealing with large datasets, the distributed computing power of Apache Spark becomes indispensable. Integrating Pandas with Spark offers the best of both worlds, allowing for seamless scalability and enhanced performance. One crucial aspect of data analysis is understanding the shape of the dataset, and the Series.shape method plays a pivotal role in this regard.

Understanding `Series.shape`

The Series.shape method in Pandas API on Spark returns a tuple representing the dimensions of the underlying data. It provides insights into the structure of the dataset, crucial for various data manipulation tasks.

Example 1: Exploring Dataset Dimensions

Consider a scenario where we have a Pandas Series on Spark containing temperature data:

import pandas as pd
from pyspark.sql import SparkSession
# Create a SparkSession
spark = SparkSession.builder \
    .appName("Pandas API On Spark : series Learning @ Freshers.in") \
    .getOrCreate()
# Sample temperature data
data = [28, 32, 25, 30, 27]
# Create a Pandas Series on Spark
series = pd.Series(data)
# Get the shape of the Series
shape = series.shape
print("Shape of the Series:", shape)

Output

Shape of the Series: (5,)

In this example, the shape of the Series is (5,), indicating that it has one dimension with five elements.

Example 2: Handling Multi-dimensional Data

Now, let’s examine a more complex scenario involving multi-dimensional data:

# Sample multi-dimensional data
multi_data = [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
# Create a Pandas DataFrame on Spark
df = pd.DataFrame(multi_data)
# Convert DataFrame to Series
series_from_df = df.iloc[:, 0]
# Get the shape of the Series
shape_df = series_from_df.shape
print("Shape of the Series from DataFrame:", shape_df)

Output

Shape of the Series from DataFrame: (3,)

In this example, we extracted the first column from a DataFrame, resulting in a Series with three elements, hence the shape (3,).

Spark important urls to refer

Post Views: 1

Author: user

Spark : How to reveal the underlying data’s dimensions – Series.axes

Understanding `Series.shape`

Example 1: Exploring Dataset Dimensions

Example 2: Handling Multi-dimensional Data

Trending

Recent Posts

Featured Posts – Slider Widget

Electronics and Instrumentation

Chemical Engineering

Civil Engineering

Backpressure in AWS Kinesis Streams: Optimizing Data Processing

Troubleshooting Data Ingestion and Processing Issues with AWS Kinesis Streams

Impact of Shard Count Modification on AWS Kinesis Streams

How to map values of a Series according to an input correspondence:SSeries.map()

Understanding Series.transform(func[, axis])

Series.aggregate(func) : Pandas API on Spark

Series.agg(func) : Pandas API on Spark

Most Viewed Posts

Understanding Series.shape

Example 1: Exploring Dataset Dimensions

Example 2: Handling Multi-dimensional Data

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget

Understanding `Series.shape`