Pandas API on Spark’s DataFrame.to_excel Function : to_excel

The Pandas API on Spark serves as a powerful tool for combining the simplicity of Pandas with the scalability of Spark. One valuable feature is DataFrame.to_excel, enabling users to write Spark DataFrames to Excel sheets seamlessly. In this article, we’ll explore the usage of this function with practical examples and outputs.

Understanding DataFrame.to_excel

The DataFrame.to_excel function in the Pandas API on Spark facilitates the export of Spark DataFrames to Excel sheets, offering a straightforward solution for data output operations. This functionality empowers users to generate Excel files directly from Spark DataFrames, enhancing data export capabilities and facilitating further analysis or sharing. Let’s dive into its usage with examples.

Example Usage

Suppose we have a Spark DataFrame that we want to export to an Excel sheet named output.xlsx. We can achieve this using DataFrame.to_excel.

from pyspark.sql import SparkSession
import pandas as pd

# Initialize SparkSession
spark = SparkSession.builder \
    .appName("Exporting Spark DataFrame to Excel") \
    .getOrCreate()

# Create a sample Spark DataFrame
data = [('Sachin', 30, 'Female'),
        ('Ram', 35, 'Male'),
        ('Charlie', 40, 'Male'),
        ('Dravid', 45, 'Male')]

columns = ['Name', 'Age', 'Gender']

df_spark = spark.createDataFrame(data, columns)

# Export Spark DataFrame to Excel file
df_pandas = df_spark.toPandas()
df_pandas.to_excel("output.xlsx", index=False)

# Stop SparkSession
spark.stop()

Output

Upon executing the code, the Spark DataFrame will be written to an Excel file named output.xlsx with the following contents:

|   Name  |  Age  | Gender |
|---------|-------|--------|
|  Sachin|   30  | Female |
|   Ram|   35  |  Male  |
| Charlie |   40  |  Male  |
|  Dravid  |   45  |  Male  |

DataFrame.to_excel in the Pandas API on Spark provides a convenient way to export Spark DataFrames to Excel sheets, combining the power of Spark with the versatility of Excel. Whether you’re generating reports, sharing data with colleagues, or conducting further analysis, this functionality streamlines the data export process, enhancing productivity and efficiency.

Author: user