## pyspark.sql.functions.array_max

The **array_max** function is a built-in function in Pyspark that finds the maximum value in an array column of a DataFrame. Its a **collection function** andÂ returns the maximum value of the array.

Here is an example of how to use array_max:

```
from pyspark.sql.functions import array_max
# Create a DataFrame with an array column
data = [([1, 2, 3],), ([4, 5, 6],), ([7, 8, 9],)]
df = spark.createDataFrame(data, ["numbers"])
# Find the maximum value in the array column
df.select(array_max("numbers").alias("max_number")).show()
```

**Result**

```
+----------+
|max_number|
+----------+
| 3|
| 6|
| 9|
+----------+
```

The advantages of using **array_max** are:

- It is a built-in function in Pyspark, so it does not require any additional imports or dependencies.
- It is easy to use, as it takes only one argument, the name of the array column.
- It returns the maximum value of an array column in a DataFrame, making it a simple and efficient way to find the maximum value in a large dataset.
- It can also be combined with other Pyspark functions for more complex data processing tasks.

This only works with an array column and can not be used to find max value of other columns.

**Spark important urls to refer**