pyspark.sql.functions.format_number
The format_number function is used to format a number as a string. The function takes two arguments: the number to be formatted and the number of decimal places to include.
Here is an example of how to use the format_number function:
from pyspark.sql.functions import format_number
# Create a dataframe with a column of numbers
data = [
("Alice Wonderland", 3.14152332),
("Bob Miclel", 2.718223234),
("Charlie Wincent", 1.632123424)]
df = spark.createDataFrame(data, ["name", "number"])
df.show()
df.printSchema()
Input Data and its Schema
+----------------+-----------+
| name| number|
+----------------+-----------+
|Alice Wonderland| 3.14152332|
| Bob Miclel|2.718223234|
| Charlie Wincent|1.632123424|
+----------------+-----------+
root
|-- name: string (nullable = true)
|-- number: double (nullable = true)
Above you can see the number datatype as double
Use the format_number function to format the numbers with 2 decimal places
formatted_df = df.select("name", format_number("number", 2).alias("formatted_number"))
formatted_df.show()
formatted_df.printSchema()
Formatted Data and its Schema
+----------------+----------------+
| name|formatted_number|
+----------------+----------------+
|Alice Wonderland| 3.14|
| Bob Miclel| 2.72|
| Charlie Wincent| 1.63|
+----------------+----------------+
root
|-- name: string (nullable = true)
|-- formatted_number: string (nullable = true)
You can see the formatted number datatype changed to String.
In this example, we first create a dataframe with a column of numbers, then we use the format_number function to format the numbers with 2 decimal places and rename the column to formatted_number, the output will be a dataframe with two columns: name, and formatted_number, where the second column will contain the numbers rounded to 2 decimal places.
This will formats the number X to a format like ‘#,–#,–#.–’, rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string.
Spark important urls