Amazon Athena quick reference and cheat sheet

AWS Athena @ Freshers.in

1. Amazon Athena is an interactive query service to analyze data in Amazon S3 using standard SQL.
2. Athena is server less,
3. For Athena you pay only for the queries that you run.
4. Amazon Athena uses Presto with full standard SQL support.
5. Athena works with a variety of standard data formats, which includes CSV, JSON, ORC, Apache Parquet and Avro.
6. Athena is optimized for fast performance with Amazon S3.
7. Athena executes queries in parallel ( No need to anything from developer side, Athena will take care of itself) , so that you get query results in seconds, even on large datasets.
8. Athena gets you results within seconds.
9. There is no need for complex ETL jobs to prepare data for analysis.
10. Amazon Athena uses Hive only for DDL and for creation/modification and deletion of tables and/or partitions.
11. Athena is out-of-the-box integrated with AWS Glue Data Catalog.
12. The AWS Glue Data Catalog provides a unified metadata repository across a variety of data sources and data formats.
13. In Athena you point your data ( Amazon S3 location) , define the schema, and you can start querying using standard SQL.
14. Amazon Athena uses a managed Data Catalog to store information and schemas about the databases and tables.
15. Amazon Athena themselves manage or tuning clusters to get fast performance.
16. An AWS Glue crawler can automatically scan your data sources, identify data formats, and infer schema.
17. You can also use Glue’s fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize cost and improve performance.
18. Athena provides one of the easiest way to run ad-hoc queries for data in S3 without the need to setup or manage any servers.
19. In Athena, you pay only for the queries that you run. You are charged based on the amount of data scanned by each query.
20. You can have savings and performance gains by compressing, partitioning, or converting your data to a columnar format, because each of those operations reduces the amount of data that Athena needs to scan to execute a query.
21. Athena can also handle complex analysis, including large joins, window functions, and arrays.
22. Athena is great if you just need to run a quick query on some web logs to troubleshoot a performance issue on your site.
23. Athena’s data catalog is Hive metastore compatible. If you’re using EMR and already have a Hive metastore, you simply execute your DDL statements on Amazon Athena, and then you can start querying your data right away without impacting your Amazon EMR jobs.
24. Federated query in Athena allows you to run SQL queries across variety of relational, non-relational, and custom data sources. You get a unified way to run SQL queries across various data stores.(A federated query is a way to send a query statement to an external database and get the result back as a temporary table.)
25. Amazon Athena supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, LZO, and GZIP formats. By compressing, partitioning, and using columnar formats you can improve performance and reduce your costs.
26. Amazon Athena supports both simple data types such as INTEGER, DOUBLE, VARCHAR and complex data types such as MAPS, ARRAY and STRUCT.
27. Athena uses Presto when you run SQL queries on Amazon S3.
28. Amazon Athena allows you to partition your data on any column. Partitions allow you to limit the amount of data each query scans, leading to cost savings and faster performance.
29. Athena queries data directly from Amazon S3 so there’s no data movement or loading required.
30. Amazon Athena integrates with Amazon QuickSight, allowing you to easily visualize your data stored in Amazon S3.

Author: user

Leave a Reply