Here assume that you have your CSV data in AWS S3 bucket. The next step is the crawl the data that is in AWS S3 bucket. Once its done , you can find the crawler has created a metadata table for your csv data.
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate()) freshers_data = spark.read.format("com.databricks.spark.csv").option( "header", "true").option( "inferSchema", "true").load( 's3://freshers_in_datasets/training/students/final_year.csv') freshers_data.printSchema()
Result
root |-- Freshers def: string (nullable = true) |-- student Id: string (nullable = true) |-- student Name: string (nullable = true) |-- student Street Address: string (nullable = true) |-- student City: string (nullable = true) |-- student State: string (nullable = true) |-- student Zip Code: integer (nullable = true)