Skip to content

Storage

S3

  • Storage class
Class Availability Min Size Min Days Fees AZ Restore?
Standard 99.99 No 3+
Standard_IA 99.9 128k 30 /GB 3+
OneZone_IA 99.5 128k 30 /GB 1
Intelligent_IA 99.9 None 30 /Obj 3+
Glacier 99.99 None 90 /GB 3+ Yes
Glacier Depp Archive 99.99 None 180 /GB 3+ Yes
RRS (Deprecated) 99.99 3+
  • Format: http://<bucket>.s3.amazonaws.com/<key>
    • Bucket name is globally unique (two AWS users cannot have the same bucket name)
  • S3 data offers strong consistency, S3 config changes offers eventual consistency
    • deleting a bucket and then immediately listing may list deleted bucket
    • Writes are atomic
    • No locking, the latest timestamp wins
  • Object: Bucket, (Key, Value), Version, Metadata, Sub-resources (ACL, Torrent)
  • Server Side Encryption: SSE-S3 (S3 managed keys), SSE-KMS, SSE-C (customer key)
  • Setup MFA (Multi Factor Auth) Versioning Delete to guard against accidental deletes
  • Cross region replication requires versioning, IAM policy to let S3 replicate
    • Deleting specific version or delete marker are not replicated
    • Files that existed prior to enabling replication aren’t automatically replicated
  • Auto life cycle: S3 to Glacier to Permanent deletion
  • Bucket Lifecycle Rules apply to bucket or a prefix.
    • For current version or previous versions
    • Migrate to IA S3 and then to Glacier
    • Previous versions can be set up to be permanently deleted
  • Object owner can be different from Bucket owner, but Bucket owner pays the bill, but can delete non-owned objects
  • Static web-site: top level domain name is always the S3 bucket name.
    • Must have public read access to the bucket
  • Must use multi-part uploads for S3 >5GB, glacier >100GB
  • Events are configured at bucket level
  • Performance: parallelism is at prefix level. Performance increases linearly by number of prefixes.

Security and Access

  • Security:
    • In transit: SSL/TLS
    • At Rest:
      • SSE-S3 Server Side Encryption within S3 itself
      • SSE-KMS, Key Management Service, enables audit log
      • SSE-C Customer provided
    • Client side
    • Pre-signed URLs: Allow limited time permission to someone to upload
  • Access Control are either Resource based (Policies and ACLs), User based (Policies), or a combination of the two
  • Policies across all objects, either grant or deny, account or user
  • ACLs XML document attached to buckets and/or objects
    • only grant, only to other AWS account
  • S3 access requests can be authenticated or unauthenticated (anonymous or public)

Elastic Block Store EBS

  • EBS is automatically replicated within AZ
  • Snapshots
    • Un-encrypted or SSE-C snapshots can be shared
    • Snapshots are incremental
    • Can’t delete a snapshot if used as root device of an AMI, until AMI is deregistered
  • RAID
    • RAID 0: striped, 1: Redundancy 10: striped+redundancy, 5: parity
      • Never use RAID 5 (parity on different disk)
    • To take application consistent snapshot
      • Freeze filesystem
      • Unmount the RAID array
      • Shutdown EC2 instance
  • Types:
feature io1/io2/express gp2/gp3 st1/sc1
Type Provisioned IOPS General Purpose Low/Lowest cost
Medium SSD SSD HDD
Max IOPS /Vol 64k/64K/256K 16k/16k 500/250
MB/s /Vol 1000/1000/4000 250/1000 500/250
Latency ms <9/<9/<1 <9 ?

EFS Elastic File System

  • regional service that provides block based, NFS v4 compatible storage
  • Supports EFS IA Infrequent Access for lower cost
  • No provisioning required, pay only what you use
  • Storage Comparison
Feature S3 EBS EFS
Access Via http, REST API, SDK Mounted disk Mounted directory
Availability 99.99% 99.99% ?
Max AZ failures 2 None 1
Durability 11 9’s 20x than regular HDD ?
Encryption SSE and client SSE-KMS SSE-KMS
Concurrency Multiple requests Single instance Multiple instances
Pricing Cheapest > S3 ~ 10x EBS
Provisioning auto Fixed, resize with API auto

Cloud Data Transfer

  • DataSync: run an agent to bi-directional sync between on-prem NFS/SMB and S3/EFS/FSx
  • AWS Transfer: Use existing SFTP, FTP, FTPS (FTP over SSL) to S3
  • S3 Transfer Acceleration: Upload to an edge location (CloudFront) instead of S3 directly
  • Kinesis Data Firehose: load streaming data into S3 or Redshift

Storage Gateway

  • allows on-prem access to S3
  • File Gateway: S3 as NFS or SMB
  • Tape Gateway: Virtual Tape Library VTL over S3 Glacier
  • Volume Gateway: stores block volumes locally with PIT backups as EBS snapshot

Snowball

  • For data transfer or edge computing
  • SnowCone: 8TB, 2CPU, 4GB RAM backpack size
  • snowball: storage optimized 80TB, 40 vCPU, or compute optimized: 42TB, 52 vCPU
  • snowmobile: 100PB, shipping container
  • Lab
    • Download and install the Snowball client
    • Get manifest file and activation code
    • Start snowball client with manifest client and activation code
    • Use snowball cp <file> s3://<bucket>

CloudFront CDN

  • Distributions: Web or RTMP
  • Origin: S3 bucket, EC2 instance, load balancer, Route53 or any non-aws server
  • Edge locations can also be written to
  • Distribution can have multiple origins