Tag: Spark_Interview

PySpark @ Freshers.in

PySpark : How to Compute the cumulative distribution of a column in a DataFrame

pyspark.sql.functions.cume_dist The cumulative distribution is a method used in probability and statistics to determine the distribution of a random variable,…

Continue Reading PySpark : How to Compute the cumulative distribution of a column in a DataFrame
PySpark @ Freshers.in

PySpark : Truncate date and timestamp in PySpark [date_trunc and trunc]

pyspark.sql.functions.date_trunc(format, timestamp) Truncation function offered by Spark Dateframe SQL functions is date_trunc(), which returns Date in the format “yyyy-MM-dd HH:mm:ss.SSSS”…

Continue Reading PySpark : Truncate date and timestamp in PySpark [date_trunc and trunc]