PySpark : Converting Unix timestamp to a string representing the timestamp in a specific format

pyspark.sql.functions.from_unixtime

The “from_unixtime()” function is a PySpark function that allows you to convert a Unix timestamp (a long integer representing the number of seconds since the Unix epoch) to a string representing the timestamp in a specific format. The function takes one or two arguments: the first argument is the timestamp in Unix format, and the second argument is an optional format string that specifies the desired output format of the timestamp.

Syntax

pyspark.sql.functions.from_unixtime(timestamp, format='yyyy-MM-dd HH:mm:ss')

Example of how to use the from_unixtime() function:

from pyspark.sql import SparkSession
from pyspark.sql.functions import from_unixtime
# Create a SparkSession
spark = SparkSession.builder.appName("PySparkFromUnixTime").getOrCreate()
# Create a DataFrame with sample data
data = [(1609430380, "Diana Niel"), (1609430480, "Sam Peter"), (1609430360, "Wincent John")]
df = spark.createDataFrame(data, ["timestamp", "name"])
# Convert the timestamp to a string using from_unixtime
df = df.select("name", from_unixtime("timestamp").alias("timestamp"))
# Show the DataFrame
df.show()

In this example, we create a DataFrame with two columns: “name” and “timestamp”. The “timestamp” column contains the timestamp in Unix format. We then use the from_unixtime() function to convert the timestamp to a string. The function returns a new DataFrame with a new column that contains the timestamp in string format, which we assign to the variable df.

Output

+------------+-------------------+
|        name|          timestamp|
+------------+-------------------+
|  Diana Niel|2020-12-31 21:29:40|
|   Sam Peter|2020-12-31 21:31:20|
|Wincent John|2020-12-31 21:29:20|
+------------+-------------------+

As you can see, the new column contains the timestamp in string format.

You can also pass an optional format string as a second argument to the function, in order to specify the desired output format of the timestamp.

Example:

df2 = df.select("name",from_unixtime(col("timestamp"), "yyyy-MM-dd")).alias("timestamp")
df2.show()

This code will change the format to only the date,

Result
+------------+------------------------------------+
|        name|from_unixtime(timestamp, yyyy-MM-dd)|
+------------+------------------------------------+
|  Diana Niel|                          2020-12-31|
|   Sam Peter|                          2020-12-31|
|Wincent John|                          2020-12-31|
+------------+------------------------------------+

This function is particularly useful when you want to convert timestamp column from unix format to human readable format, making it easier to query and analyze the data in the DataFrame.

Author: user

Leave a Reply