Amazon Redshift is a fully managed data warehouse service provided by Amazon Web Services (AWS), designed to handle large-scale data analytics workloads. Its architecture is optimized for high performance and scalability. Here are the key components and aspects of Amazon Redshift architecture:
Clusters:
-
The fundamental unit of computation and storage in Amazon Redshift is the cluster. A cluster consists of a leader node and multiple compute nodes.
-
Leader Node: Manages communications with client applications, receives queries, creates execution plans, and coordinates the parallel execution of queries across compute nodes.
-
Compute Nodes: Store data and perform computations and transformations. Each compute node runs an instance of the Amazon Redshift engine and manages a portion of the overall data.
Columnar Storage:
-
Amazon Redshift stores data in a columnar format rather than row-based. This is optimized for analytical queries that typically involve scanning large volumes of data but retrieving only a subset of columns.
-
Columnar storage reduces I/O overhead and improves query performance by minimizing the amount of data read from disk.
Massively Parallel Processing (MPP)
- DIVIDE THE WORK INTO SMALL ‘SIMILAR’ TASKS
- INDIVIDUAL TEAMS WORK IN SILO TO COMPLETE THE TASK
- DIRECTOR COLLATE THE TASKS BACK INTO ONE
Columnar Database
- COLUMNS ARE STORED IN SAME/ADJACENT
- EFFICIENT READ WHEN FEW COLUMNS ARE REQUIRED
- BETTER COMPRESSION AT COLUMN LEVEL