Series.values
method provides a Numpy representation of the DataFrame or the Series, offering a versatile data format for analysis and processing. In this article, we’ll explore the intricacies of Series.values
through comprehensive examples.
Understanding Series.values
The Series.values
method is a fundamental component of the Pandas API, seamlessly integrated into Spark, a distributed computing framework. Its primary purpose is to return a Numpy representation of the DataFrame or the Series, enabling efficient data manipulation and analysis.
Syntax:
Series.values
The Series.values
method returns a Numpy array representing the data in the Series.
Examples:
Let’s delve into examples to gain a deeper understanding of how Series.values
operates within the context of Spark.
Example 1: Extracting Values from a Series
Consider a scenario where we have a Series containing some data. Let’s use Series.values
to extract its values.
Output:
Numpy representation of the Series:
[1 2 3 4 5]
As observed, the Series.values
method returns a Numpy array containing the values from the Series.
Example 2: Extracting Values from a DataFrame
Let’s explore a scenario where we have a DataFrame, and we want to extract values from a specific column.
# Create a Spark DataFrame with multiple columns
multi_column_data = [(1, 'A'), (2, 'B'), (3, 'C'), (4, 'D'), (5, 'E')]
df_multi_column = spark.createDataFrame(multi_column_data, schema=["num_col INT", "char_col STRING"])
# Convert the DataFrame to Pandas Series
series_from_df = df_multi_column["num_col"]
# Extract values from the DataFrame
df_values = series_from_df.values
print("Numpy representation of the DataFrame column:")
print(df_values)
Output
Numpy representation of the DataFrame column:
[1 2 3 4 5]
In this example, Series.values
enables us to extract values from a specific column in the DataFrame, providing a Numpy array representation.
Spark important urls to refer