AWS Glue : Example on how to read a sample csv file with PySpark

PySpark @

Here assume that you have your CSV data in AWS S3 bucket. The next step is the crawl the data that is in AWS S3 bucket. Once its done , you can find the crawler has created a metadata table for your csv data. 

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
glueContext = GlueContext(SparkContext.getOrCreate())
freshers_data ="com.databricks.spark.csv").option(
"header", "true").option(
"inferSchema", "true").load(


|-- Freshers def: string (nullable = true)
|-- student Id: string (nullable = true)
|-- student Name: string (nullable = true)
|-- student Street Address: string (nullable = true)
|-- student City: string (nullable = true)
|-- student State: string (nullable = true)
|-- student Zip Code: integer (nullable = true)

Spark Reference

Spark Official Doc

Author: user

Leave a Reply