How to create AWS Glue table where partitions have different columns?

AWS Glue @ Freshers.in

There can be a condition where you can expect new column in JSON file regularly . There can be a case such that tables partitions contain different schemas. When you try to read from AWS Athena , you will get the schema mismatch error as HIVE_PARTITION_SCHEMA_MISMATCH. In situation like this if we have multiple files, then the schema can get by reading all the source files together from S3 location , as schema of each partition can varry. To handle this AWS Glue crawler have a specific configuration options as “Update all new and existing partitions with metadata from the table.” You need to check the check box to handle this.

Configure crawler as belowAWS Glue table - Partitions have different columns @ Freshers.in

Reference

  1. Spark Examples
  2. PySpark Blogs
  3. Bigdata Blogs
  4. Spark Interview Questions
  5. Official Page
Author: user

Leave a Reply