Databases
Aurora
- Supports multi-master or single-master clusters
- consists of cluster volume that spans multiple AZ
- Single-master has primary instance has R/W access to data, with up to 15 R/O replicas
- Supports automatic failover by promoting one of the replica as master
- Multi-master clusters
- can cause write-conflict errors which are reported as deadlock errors
- BP: good for sharded application architecture
- Has one endpoint for each cluster and reader; 0+ custom endpoints; all instance endpoints
- Reader and customer endpoints load balance between all replicas
- load balancing is at connection level (not query level)
- 2 copies of data in 3 AZ = 6 copies of data
RDS
- Multi-AZ: For DR, Synchronous
- Read replica: For read-heavy workloads, Asynchronous
- Up to 5 replicas
- Needs automatic backups turned on, up to 5 RO copies
- Supported by: MySQL, MariaDB and PostgreSQL
- Can be promoted to a DB on it’s own by breaking replication connection
- Only for MySQL and MariaDB, replicas can be on diff regions
DynamoDB
- Stored on SSD, millisecond response time
- Spread across 3 geographical data centers
- Consistency models:
- Eventual: Within a second
- Strong: As long as write was successful, can be reread back
Athena
- Based on Presto, on top of S3, serverless SQL
- Can use AWS Glue (Hive compatible catalog) as metadata repository for storing schemas
- Glue provides crawlers to automate inferring schema from data stores
Redshift
- Redshift Spectrum: allows running queries over S3
- Unlike Athena which uses pooled compute resources, Redshift Spectrum relies on the redshift cluster for compute
Cluster
- Shared-nothing architecture, consisting of multiple compute nodes, each node consisting of multiple slices that own data
- uses columnar data storage
- One leader node (LN) that receives queries, distributes execution plans to CN, aggregates results from CN
- Result Caching is memory reserved on leader node that stores results of prior queries
- Node types can be dense storage (DS2) or dense compute (DC2) use HDD and SSD respectively
- resize requires, restart old cluster in RO mode, copy to new cluster, update cluster endpoint
- during resize,
ANALYZE is run, but not VACUUM
- Can create multiple subnet groups, but cluster is created in one subnet
- Backups using automatic or manual snapshots to S3. Restores are done to a new cluster
- users are cluster wide. One
masteruser created by default.
- Based on PostgreSQL 8
- not supported: Unique, FK constraints
- Features: MPP, (Columnar, Compression, 1MB blocks)[https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt.html], WLM
- WLM: Consists of 1 system and multiple user queues (created by service class)
- Two default queues: user (concurrency 5) and superuser (concurrency 1).
- Max concurrency is 50 for each and all non-superuser queues combined.
- Monitoring
- Audit logging: authentication, connections, user changes, queries
- Table design. Follow SET DW principles
- Sort key: Used only for empty table. Compound: Any predicate that uses key or it's prefix of they will use it
- Encoding: sets compression
- Table maintenance: regular VACUUM and ANALYZE
- Distribution key: KEY (user specified), EVEN, ALL (copies on each node)
- Workload Management:
- ROT:
- Automatic compression has overhead for empty tables, so loading data into stage tables, save the encoding after first load
- Define PK and FK. Constraints aren't enforced.
ElastiCache
- fully managed instance of Redis or MemCached
- Redis can persist cache online
- Only one node/cluster
- Can form replication group with 5 read-only replicas for offloading reads
- Can save cache
- ElastiCache Redis provides native functions to simplify “Leaderboard”
- MemCached:
- Up to 20, partitioned nodes.
- Supports auto-discovery client library