FAQ: AWS-Redshift

17 November 2020

AWS-Redshift

Redshift is a fast, fully managed data warehouse that makes it simple & cost-effective to analyze all data using standard SQL & existing BI tools.

It allows to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks & massively parallel query execution.

It gives fast querying capabilities over structured data using familiar SQL-based clients & BI tools using standard ODBC and JDBC connections.

It supports VPC, SSL, AES-256 encryption & Hardware Security Modules to protect data in transit and at rest.

It Spectrum is a feature of It that enables to run queries against exabytes of unstructured data in S3, with no loading or ETL required.

It manages all the computing infrastructure, load balancing, planning, scheduling & execution of queries on data stored in S3.

Users can load data into c from a range of data sources including S3, DynamoDB, EMR, Glue, Data Pipeline and or any SSH-enabled host on EC2 or on-premises.

It will automatically detect and replace a failed node in user's data warehouse cluster.

It replicates all user's data within their data warehouse cluster when it is loaded and also continuously backs up their data to S3.

It always attempts to maintain at least three copies of data (the original and replica on the compute nodes and a backup in S3).

It can also asynchronously replicate snapshots to S3 in another region for disaster recovery.

It enables automated backups of data warehouse cluster with a 1-day retention period.

Concurrency scaling is a feature in Redshift that provides consistently fast query performance, even with thousands of concurrent queries.

Elastic Resize adds or removes nodes from a single Redshift cluster within minutes to manage its query throughput.

It Spectrum supports many open source data formats, including Avro, CSV, Grok, Ion, JSON, ORC, Parquet, RCFile, RegexSerDe, SequenceFile, TextFile & TSV.

It Spectrum supports Gzip and Snappy compression.

The CREATE EXTERNAL SCHEMA command supports Hive Metastores. AWS do not currently support DDL against the Hive Metastore.

It Spectrum queries run using per-query scale-out resources against data in S3.

It periodically performs maintenance to apply fixes, enhancements and new features to cluster.

For It, users are billed based on: Compute node hours, Backup Storage, Data transfer, Data scanned.

Users can also use Redshift Spectrum together with EMR. Redshift Spectrum uses the same approach to store table definitions as Amazon EMR.

In Redshift, a leader node receives queries from client applications, parses the queries and develops execution plans, which are an ordered set of steps to process these queries.

In Redshift, compute nodes execute the steps specified in the execution plans and transmit data among themselves to serve these queries.

It uses a variety of innovations to achieve up to ten times higher performance than traditional databases for data warehousing and analytics workloads: Columnar Data Storage, Advanced Compression, Massively Parallel Processing (MPP), Redshift Spectrum.

It automatically distributes data and query load across all nodes.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)