Skip to content

Databases

Aurora

  • Supports multi-master or single-master clusters
  • consists of cluster volume that spans multiple AZ
  • Single-master has primary instance has R/W access to data, with up to 15 R/O replicas
    • Supports automatic failover by promoting one of the replica as master
  • Multi-master clusters
    • can cause write-conflict errors which are reported as deadlock errors
    • BP: good for sharded application architecture
  • Has one endpoint for each cluster and reader; 0+ custom endpoints; all instance endpoints
    • Reader and customer endpoints load balance between all replicas
    • load balancing is at connection level (not query level)
    • 2 copies of data in 3 AZ = 6 copies of data

RDS

  • Multi-AZ: For DR, Synchronous
  • Read replica: For read-heavy workloads, Asynchronous
    • Up to 5 replicas
    • Needs automatic backups turned on, up to 5 RO copies
    • Supported by: MySQL, MariaDB and PostgreSQL
    • Can be promoted to a DB on it’s own by breaking replication connection
    • Only for MySQL and MariaDB, replicas can be on diff regions

DynamoDB

  • Stored on SSD, millisecond response time
  • Spread across 3 geographical data centers
  • Consistency models:
    • Eventual: Within a second
    • Strong: As long as write was successful, can be reread back

Athena

  • Based on Presto, on top of S3, serverless SQL
  • Can use AWS Glue (Hive compatible catalog) as metadata repository for storing schemas
    • Glue provides crawlers to automate inferring schema from data stores

Redshift

  • Redshift Spectrum: allows running queries over S3
    • Unlike Athena which uses pooled compute resources, Redshift Spectrum relies on the redshift cluster for compute

Cluster

  • Shared-nothing architecture, consisting of multiple compute nodes, each node consisting of multiple slices that own data
    • uses columnar data storage
  • One leader node (LN) that receives queries, distributes execution plans to CN, aggregates results from CN
    • Result Caching is memory reserved on leader node that stores results of prior queries
  • Node types can be dense storage (DS2) or dense compute (DC2) use HDD and SSD respectively
  • resize requires, restart old cluster in RO mode, copy to new cluster, update cluster endpoint
    • during resize, ANALYZE is run, but not VACUUM
  • Can create multiple subnet groups, but cluster is created in one subnet
  • Backups using automatic or manual snapshots to S3. Restores are done to a new cluster
  • users are cluster wide. One masteruser created by default.
  • Based on PostgreSQL 8
    • not supported: Unique, FK constraints
  • Features: MPP, (Columnar, Compression, 1MB blocks)[https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt.html], WLM
  • WLM: Consists of 1 system and multiple user queues (created by service class)
    • Two default queues: user (concurrency 5) and superuser (concurrency 1).
    • Max concurrency is 50 for each and all non-superuser queues combined.
  • Monitoring
    • Audit logging: authentication, connections, user changes, queries
  • Table design. Follow SET DW principles
    • Sort key: Used only for empty table. Compound: Any predicate that uses key or it's prefix of they will use it
    • Encoding: sets compression
    • Table maintenance: regular VACUUM and ANALYZE
    • Distribution key: KEY (user specified), EVEN, ALL (copies on each node)
    • Workload Management:
  • ROT:
    • Automatic compression has overhead for empty tables, so loading data into stage tables, save the encoding after first load
    • Define PK and FK. Constraints aren't enforced.

ElastiCache

  • fully managed instance of Redis or MemCached
  • Redis can persist cache online
    • Only one node/cluster
    • Can form replication group with 5 read-only replicas for offloading reads
    • Can save cache
    • ElastiCache Redis provides native functions to simplify “Leaderboard”
  • MemCached:
    • Up to 20, partitioned nodes.
    • Supports auto-discovery client library