Databases¶

Aurora¶

Supports multi-master or single-master clusters
consists of cluster volume that spans multiple AZ
Single-master has primary instance has R/W access to data, with up to 15 R/O replicas
Supports automatic failover by promoting one of the replica as master
Multi-master clusters
can cause write-conflict errors which are reported as deadlock errors
BP: good for sharded application architecture
Has one endpoint for each cluster and reader; 0+ custom endpoints; all instance endpoints
Reader and customer endpoints load balance between all replicas
load balancing is at connection level (not query level)
2 copies of data in 3 AZ = 6 copies of data

RDS¶

Multi-AZ: For DR, Synchronous
Read replica: For read-heavy workloads, Asynchronous
Up to 5 replicas
Needs automatic backups turned on, up to 5 RO copies
Supported by: MySQL, MariaDB and PostgreSQL
Can be promoted to a DB on it’s own by breaking replication connection
Only for MySQL and MariaDB, replicas can be on diff regions

DynamoDB¶

Stored on SSD, millisecond response time
Spread across 3 geographical data centers
Consistency models:
Eventual: Within a second
Strong: As long as write was successful, can be reread back

Athena¶

Based on Presto, on top of S3, serverless SQL
Can use AWS Glue (Hive compatible catalog) as metadata repository for storing schemas
Glue provides crawlers to automate inferring schema from data stores

Redshift¶

Redshift Spectrum: allows running queries over S3
Unlike Athena which uses pooled compute resources, Redshift Spectrum relies on the redshift cluster for compute

Cluster¶

Shared-nothing architecture, consisting of multiple compute nodes, each node consisting of multiple slices that own data
uses columnar data storage
One leader node (LN) that receives queries, distributes execution plans to CN, aggregates results from CN
Result Caching is memory reserved on leader node that stores results of prior queries
Node types can be dense storage (DS2) or dense compute (DC2) use HDD and SSD respectively
resize requires, restart old cluster in RO mode, copy to new cluster, update cluster endpoint
during resize, ANALYZE is run, but not VACUUM
Can create multiple subnet groups, but cluster is created in one subnet
Backups using automatic or manual snapshots to S3. Restores are done to a new cluster
users are cluster wide. One masteruser created by default.
Based on PostgreSQL 8
not supported: Unique, FK constraints
Features: MPP, Columnar, Compression, 1MB blocks, WLM
WLM: Consists of 1 system and multiple user queues (created by service class)
Two default queues: user (concurrency 5) and superuser (concurrency 1).
Max concurrency is 50 for each and all non-superuser queues combined.
Monitoring
Audit logging: authentication, connections, user changes, queries
Table design. Follow SET DW principles
Sort key: Used only for empty table. Compound: Any predicate that uses key or it's prefix of they will use it
Encoding: sets compression
Table maintenance: regular VACUUM and ANALYZE
Distribution key: KEY (user specified), EVEN, ALL (copies on each node)
Workload Management:
ROT:
Automatic compression has overhead for empty tables, so loading data into stage tables, save the encoding after first load
Define PK and FK. Constraints aren't enforced.

ElastiCache¶

fully managed instance of Redis or MemCached
Redis can persist cache online
Only one node/cluster
Can form replication group with 5 read-only replicas for offloading reads
Can save cache
ElastiCache Redis provides native functions to simplify “Leaderboard”
MemCached:
Up to 20, partitioned nodes.
Supports auto-discovery client library