FAQ: AWS-Athena

16 November 2020

AWS-Athena

Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

Athena automatically executes queries in parallel, so that you get query results in seconds, even on large datasets

It is serverless.

Uses Presto, an open source, distributed SQL query engine optimized for low latency, ad hoc analysis of data.

Athena supports a wide variety of data formats such as CSV, JSON, ORC, Avro, or Parquet.

Athena integrates out-of-the-box with AWS Glue.

Athena integrates with Amazon QuickSight for easy data visualization.

Users can use It to generate reports or to explore data with business intelligence tools or SQL clients, connected via an ODBC or JDBC driver.

It can be accessed via the AWS Management Console, an API or an ODBC or JDBC driver.

It can handle complex analysis, including large joins, window functions and arrays.

User can invoke their SageMaker machine learning models in an Athena SQL query to run inference.

With User-Defined Functions (UDFs), users can now write their own functions in Java and invoke them in Athena SQL query.

Users can connect Athena to their external Apache Hive Metastore.

It uses Apache Hive DDL to define tables.

It supports compressed data in Snappy, Zlib, LZO and GZIP formats.

It supports both simple data types such as INTEGER, DOUBLE, VARCHAR and complex data types such as MAPS, ARRAY and STRUCT.

Users can run ANSI-Compliant SQL SELECT statements to query their data in Amazon S3.

It uses SerDes (Serializer/Deserializer) to interpret the data read from Amazon S3.

Parquet and ORC files created via Spark can be read in Athena.

It allows users to partition their data on any column.

It integrates with Amazon QuickSight, allowing users to easily visualize their data stored in Amazon S3.

It has open sourced data source connectors to Apache HBase, Amazon DocumentDB, Amazon DynamoDB and Amazon CloudWatch Logs and CloudWatch Metrics.

All Athena query results are stored in an Amazon S3 location that user set.

It allows users to control access to their data by using AWS IAM policies, Access Control Lists (ACLs) and Amazon S3 bucket policies.

It integrates with KMS and provides users an option to encrypt their result sets.

It is priced per query and charges based on the amount of data scanned by the query.

Users can save 30%-90% on their query costs and get better performance by compressing, partitioning & converting their data into columnar formats.

Users are not charged for failed queries.

If user cancel a query manually, they are charged for the amount of data scanned up to the point at which they cancelled the query.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)