Spark : How to reveal the underlying data’s dimensions – Series.axes

user February 14, 2024

When dealing with large datasets, the distributed computing power of Apache Spark becomes indispensable. Integrating Pandas with Spark offers the best of both worlds, allowing for seamless scalability and enhanced performance. One crucial aspect of data analysis is understanding the shape of the dataset, and the Series.shape method plays a pivotal role in this regard.

Understanding `Series.shape`

The Series.shape method in Pandas API on Spark returns a tuple representing the dimensions of the underlying data. It provides insights into the structure of the dataset, crucial for various data manipulation tasks.

Example 1: Exploring Dataset Dimensions

Consider a scenario where we have a Pandas Series on Spark containing temperature data:

import pandas as pd
from pyspark.sql import SparkSession
# Create a SparkSession
spark = SparkSession.builder \
    .appName("Pandas API On Spark : series Learning @ Freshers.in") \
    .getOrCreate()
# Sample temperature data
data = [28, 32, 25, 30, 27]
# Create a Pandas Series on Spark
series = pd.Series(data)
# Get the shape of the Series
shape = series.shape
print("Shape of the Series:", shape)

Output

Shape of the Series: (5,)

In this example, the shape of the Series is (5,), indicating that it has one dimension with five elements.

Example 2: Handling Multi-dimensional Data

Now, let’s examine a more complex scenario involving multi-dimensional data:

# Sample multi-dimensional data
multi_data = [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
# Create a Pandas DataFrame on Spark
df = pd.DataFrame(multi_data)
# Convert DataFrame to Series
series_from_df = df.iloc[:, 0]
# Get the shape of the Series
shape_df = series_from_df.shape
print("Shape of the Series from DataFrame:", shape_df)

Output

Shape of the Series from DataFrame: (3,)

In this example, we extracted the first column from a DataFrame, resulting in a Series with three elements, hence the shape (3,).

Spark important urls to refer

Post Views: 1

Author: user

Spark : How to reveal the underlying data’s dimensions – Series.axes

Understanding `Series.shape`

Example 1: Exploring Dataset Dimensions

Example 2: Handling Multi-dimensional Data

Trending

Recent Posts

Featured Posts – Slider Widget

How PARTITION BY Works in Snowflake, and SQL in general

Stash a specific file using Git

Prevent your computer from locking : Python to simulate mouse movements

AWS EC2 vs Azure Virtual Machines

Production and Industrial Engineering

Engineering Technical campus placement question and answers

JavaScript’s reduceRight() method to iterate over an array from right to left

Merging Multiple Images into a Single PDF File Using Python

Nanotechnology

Electronics and Instrumentation

Most Viewed Posts

Understanding Series.shape

Example 1: Exploring Dataset Dimensions

Example 2: Handling Multi-dimensional Data

Related Articles

Trending

Recent Posts

Featured Posts – Slider Widget

Understanding `Series.shape`