16 November 2020

AWS-Athena

  • Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
  • Athena automatically executes queries in parallel, so that you get query results in seconds, even on large datasets
  • Uses Presto, an open source, distributed SQL query engine optimized for low latency, ad hoc analysis of data.
  • Athena supports a wide variety of data formats such as CSV, JSON, ORC, Avro, or Parquet.
  • Athena integrates out-of-the-box with AWS Glue.
  • Users can use It to generate reports or to explore data with business intelligence tools or SQL clients, connected via an ODBC or JDBC driver.
  • It can be accessed via the AWS Management Console, an API or an ODBC or JDBC driver.
  • It can handle complex analysis, including large joins, window functions and arrays.
  • User can invoke their SageMaker machine learning models in an Athena SQL query to run inference.
  • With User-Defined Functions (UDFs), users can now write their own functions in Java and invoke them in Athena SQL query.
  • Users can connect Athena to their external Apache Hive Metastore.
  • It uses Apache Hive DDL to define tables.
  • It supports compressed data in Snappy, Zlib, LZO and GZIP formats.
  • It supports both simple data types such as INTEGER, DOUBLE, VARCHAR and complex data types such as MAPS, ARRAY and STRUCT.
  • Users can run ANSI-Compliant SQL SELECT statements to query their data in Amazon S3.
  • It uses SerDes (Serializer/Deserializer) to interpret the data read from Amazon S3.
  • Parquet and ORC files created via Spark can be read in Athena.
  • It allows users to partition their data on any column.
  • All Athena query results are stored in an Amazon S3 location that user set.
  • It allows users to control access to their data by using AWS IAM policies, Access Control Lists (ACLs) and Amazon S3 bucket policies.
  • It integrates with KMS and provides users an option to encrypt their result sets.
  • It is priced per query and charges based on the amount of data scanned by the query.
  • Users can save 30%-90% on their query costs and get better performance by compressing, partitioning & converting their data into columnar formats.
  • Users are not charged for failed queries.
  • If user cancel a query manually, they are charged for the amount of data scanned up to the point at which they cancelled the query.

No comments:

Post a Comment

Most views on this month