Pyspark, how to format the number X to a format like ‘#,–#,–#.–’, rounded to d decimal places

PySpark @ Freshers.in

pyspark.sql.functions.format_number

The format_number function is used to format a number as a string. The function takes two arguments: the number to be formatted and the number of decimal places to include.

Here is an example of how to use the format_number function:

from pyspark.sql.functions import format_number
# Create a dataframe with a column of numbers
data = [
("Alice Wonderland", 3.14152332), 
("Bob Miclel", 2.718223234), 
("Charlie Wincent", 1.632123424)]
df = spark.createDataFrame(data, ["name", "number"])
df.show()
df.printSchema()

Input Data and its Schema

+----------------+-----------+
|            name|     number|
+----------------+-----------+
|Alice Wonderland| 3.14152332|
|      Bob Miclel|2.718223234|
| Charlie Wincent|1.632123424|
+----------------+-----------+
root
 |-- name: string (nullable = true)
 |-- number: double (nullable = true)

Above you can see the number datatype as double

Use the format_number function to format the numbers with 2 decimal places

formatted_df = df.select("name", format_number("number", 2).alias("formatted_number"))
formatted_df.show()
formatted_df.printSchema()

Formatted Data and its Schema

+----------------+----------------+
|            name|formatted_number|
+----------------+----------------+
|Alice Wonderland|            3.14|
|      Bob Miclel|            2.72|
| Charlie Wincent|            1.63|
+----------------+----------------+
root
 |-- name: string (nullable = true)
 |-- formatted_number: string (nullable = true)

You can see the formatted number datatype changed to String.

In this example, we first create a dataframe with a column of numbers, then we use the format_number function to format the numbers with 2 decimal places and rename the column to formatted_number, the output will be a dataframe with two columns: name, and formatted_number, where the second column will contain the numbers rounded to 2 decimal places.

This will formats the number X to a format like ‘#,–#,–#.–’, rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string.

Spark important urls

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply