Tag: PySpark

Comparing PySpark with Map Reduce programming

PySpark is the Python library for Spark programming. It allows developers to interface with RDDs (Resilient Distributed Datasets) and perform…

Continue Reading Comparing PySpark with Map Reduce programming
GCP @ Freshers.in

How to start a serverless spark from GCP

To start a serverless Spark job on Google Cloud Platform (GCP), you can use the Cloud Dataproc service. Cloud Dataproc…

Continue Reading How to start a serverless spark from GCP
PySpark @ Freshers.in

PySpark : How to read date datatype from CSV ?

We specify schema = true when a CSV file is being read. Spark determines the data type of a column…

Continue Reading PySpark : How to read date datatype from CSV ?