You can also use the row_number()
function with over()
clause to generate a sequence number based on a specific order of the DataFrame. In PySpark, you can use the row_number()
function, which is part of the Window functions, to generate a unique row number for each row in a DataFrame. Here is an example of how to use the row_number()
function:
from pyspark.sql import Window
from pyspark.sql.functions import row_number
# Create a DataFrame
data = [
("Peter Sam", 11),
("Twinkle John", 23),
("Marrie Bob", 33),
("Sharone Rode", 43),
("Baby Jonnah", 24),
("Bobby Robert", 53),
("Shakewille Jane", 39)
]
df = spark.createDataFrame(data, ["name", "age"])
df.show()
+---------------+---+
| name|age|
+---------------+---+
| Peter Sam| 11|
| Twinkle John| 23|
| Marrie Bob| 33|
| Sharone Rode| 43|
| Baby Jonnah| 24|
| Bobby Robert| 53|
|Shakewille Jane| 39|
+---------------+---+
# Create a Window specification
windowSpec = Window.partitionBy().orderBy("age")
# Add a new column with the row number
df = df.withColumn("row_number", row_number().over(windowSpec))
# Show the DataFrame
df.show()
+---------------+---+----------+
| name|age|row_number|
+---------------+---+----------+
| Peter Sam| 11| 1|
| Twinkle John| 23| 2|
| Baby Jonnah| 24| 3|
| Marrie Bob| 33| 4|
|Shakewille Jane| 39| 5|
| Sharone Rode| 43| 6|
| Bobby Robert| 53| 7|
+---------------+---+----------+
Here, the windowSpec
is defined by partitioning the DataFrame by nothing, and ordering it by the “age” column. The row_number()
function is then applied to this window specification using the over()
method, and the result is added as a new column called “row_number”.
You can also partition the DataFrame by multiple columns to get the row number for that specific partition.
windowSpec = Window.partitionBy("name").orderBy("age")
df = df.withColumn("row_number", row_number().over(windowSpec))
+---------------+---+----------+
| name|age|row_number|
+---------------+---+----------+
| Baby Jonnah| 24| 1|
| Peter Sam| 11| 1|
|Shakewille Jane| 39| 1|
| Twinkle John| 23| 1|
| Marrie Bob| 33| 1|
| Bobby Robert| 53| 1|
| Sharone Rode| 43| 1|
+---------------+---+----------+
Spark important urls to refer