17 November 2020

AWS-Redshift

  • Redshift is a fast, fully managed data warehouse that makes it simple & cost-effective to analyze all data using standard SQL & existing BI tools.
  • It allows to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks & massively parallel query execution.
  • It gives fast querying capabilities over structured data using familiar SQL-based clients & BI tools using standard ODBC and JDBC connections.
  • It supports VPC, SSL, AES-256 encryption & Hardware Security Modules to protect data in transit and at rest.
  • It Spectrum is a feature of It that enables to run queries against exabytes of unstructured data in S3, with no loading or ETL required.
  • It manages all the computing infrastructure, load balancing, planning, scheduling & execution of queries on data stored in S3.
  • Users can load data into c from a range of data sources including S3, DynamoDB, EMR, Glue, Data Pipeline and or any SSH-enabled host on EC2 or on-premises.
  • It will automatically detect and replace a failed node in user's data warehouse cluster.
  • It replicates all user's data within their data warehouse cluster when it is loaded and also continuously backs up their data to S3.
  • It always attempts to maintain at least three copies of data (the original and replica on the compute nodes and a backup in S3).
  • It can also asynchronously replicate snapshots to S3 in another region for disaster recovery.
  • It enables automated backups of data warehouse cluster with a 1-day retention period.
  • Concurrency scaling is a feature in Redshift that provides consistently fast query performance, even with thousands of concurrent queries.
  • Elastic Resize adds or removes nodes from a single Redshift cluster within minutes to manage its query throughput.
  • It Spectrum supports many open source data formats, including Avro, CSV, Grok, Ion, JSON, ORC, Parquet, RCFile, RegexSerDe, SequenceFile, TextFile & TSV.
  • It Spectrum supports Gzip and Snappy compression.
  • The CREATE EXTERNAL SCHEMA command supports Hive Metastores. AWS do not currently support DDL against the Hive Metastore.
  • It Spectrum queries run using per-query scale-out resources against data in S3.
  • It periodically performs maintenance to apply fixes, enhancements and new features to cluster.
  • For It, users are billed based on: Compute node hours, Backup Storage, Data transfer, Data scanned.
  • Users can also use Redshift Spectrum together with EMR. Redshift Spectrum uses the same approach to store table definitions as Amazon EMR.
  • In Redshift, a leader node receives queries from client applications, parses the queries and develops execution plans, which are an ordered set of steps to process these queries.
  • In Redshift, compute nodes execute the steps specified in the execution plans and transmit data among themselves to serve these queries.
  • It uses a variety of innovations to achieve up to ten times higher performance than traditional databases for data warehousing and analytics workloads: Columnar Data Storage, Advanced Compression, Massively Parallel Processing (MPP), Redshift Spectrum.
  • It automatically distributes data and query load across all nodes.

No comments:

Post a Comment

Most views on this month