How to create AWS Glue table where partitions have different columns?

AWS Glue @ Freshers.in

AWS Glue is a serverless data integration service. There can be a condition where you can expect new column in JSON file regularly . There can be a case such that tables partitions contain different schemas. When you try to read from AWS Athena , you will get the schema mismatch error as HIVE_PARTITION_SCHEMA_MISMATCH. In situation like this if we have multiple files, then the schema can get by reading all the source files together from S3 location , as schema of each partition can varry. To handle this AWS Glue crawler have a specific configuration options as “Update all new and existing partitions with metadata from the table.” You need to check the check box to handle this.

Read more AWS Glue

Configure crawler as belowAWS Glue table - Partitions have different columns @ Freshers.in

Reference

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply