- Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
- Athena automatically executes queries in parallel, so that you get query results in seconds, even on large datasets
- It is serverless.
- Uses Presto, an open source, distributed SQL query engine optimized for low latency, ad hoc analysis of data.
- Athena supports a wide variety of data formats such as CSV, JSON, ORC, Avro, or Parquet.
- Athena integrates out-of-the-box with AWS Glue.
- Athena integrates with Amazon QuickSight for easy data visualization.
- Users can use It to generate reports or to explore data with business intelligence tools or SQL clients, connected via an ODBC or JDBC driver.
- It can be accessed via the AWS Management Console, an API or an ODBC or JDBC driver.
- It can handle complex analysis, including large joins, window functions and arrays.
- User can invoke their SageMaker machine learning models in an Athena SQL query to run inference.
- With User-Defined Functions (UDFs), users can now write their own functions in Java and invoke them in Athena SQL query.
- Users can connect Athena to their external Apache Hive Metastore.
- It uses Apache Hive DDL to define tables.
- It supports compressed data in Snappy, Zlib, LZO and GZIP formats.
- It supports both simple data types such as INTEGER, DOUBLE, VARCHAR and complex data types such as MAPS, ARRAY and STRUCT.
- Users can run ANSI-Compliant SQL SELECT statements to query their data in Amazon S3.
- It uses SerDes (Serializer/Deserializer) to interpret the data read from Amazon S3.
- Parquet and ORC files created via Spark can be read in Athena.
- It allows users to partition their data on any column.
- It integrates with Amazon QuickSight, allowing users to easily visualize their data stored in Amazon S3.
- It has open sourced data source connectors to Apache HBase, Amazon DocumentDB, Amazon DynamoDB and Amazon CloudWatch Logs and CloudWatch Metrics.
- All Athena query results are stored in an Amazon S3 location that user set.
- It allows users to control access to their data by using AWS IAM policies, Access Control Lists (ACLs) and Amazon S3 bucket policies.
- It integrates with KMS and provides users an option to encrypt their result sets.
- It is priced per query and charges based on the amount of data scanned by the query.
- Users can save 30%-90% on their query costs and get better performance by compressing, partitioning & converting their data into columnar formats.
- Users are not charged for failed queries.
- If user cancel a query manually, they are charged for the amount of data scanned up to the point at which they cancelled the query.
No comments:
Post a Comment