Skip to content

Storage

  • types of services: BigTable, Cloud Storage, Cloud SQL, Cloud Spanner, Cloud Datastore, Cloud MemoryStore (managed Redis)
  • choose right storage
    • Unstructured: Cloud Storage (scale to petabytes)
    • Structured:
      • Transactional:
        • SQL: (scales to Gigabytes)
          • Single node: Cloud SQL
          • scale-out: Cloud Spanner
        • NO-SQL: Cloud Datastore, Key-Value (Terabytes)
      • Analytical: (Petabytes)
        • Millisecond Latency: Cloud BigTable
        • Seconds Latency: BigQuery
  • Memory Store is redis compliant in-memory storage two offerings, standard (replicas + failover) and basic
  • SQL Server high availability options:
    • replication: object level, data copied using replication agents
    • log shipping: database level, transaction log copied and replayed on another server
    • mirroring: database level, data copied to two network endpoints
    • clustering: instance level, storage is shared between primary and secondary using windows clustering
    • always on availability group: like mirroring but uses windows clustering and provides up to 8 replicas

Compare

Feature Firestore BigTable Storage SQL Spanner BigQuery MemoryStore
Type NoSQL Wide col (HBase) Immutable Blob SQL OLTP SQL OLTP SQL OLAP In Memory (Redis)
Transaction Yes Singe Row - Yes Yes -
Consistency Strong Eventual (Replicas) Strong Strong Strong Eventual (Replicas)
Latency 10k/Sec Low High Low
Cost $$ + R/W $$ for small $ $$$ $$ for small $ + Queries
Capacity TB+ PB+ PB+ TB PB PB+
Unit Size 1 MB/Entity 10MB Cell, 100MB/Row 5TB 10GB/Row 10MB/Row
Scale out RW RW RO RW
Scale down 0 Yes min 3 nodes Yes
Avail Regional 99.99 99.9, 99.0 (cold) 99.99
Avail multi-Reg 99.999 99.95+ 99.999

Cloud Storage

  • Buckets are containers with globally unique names
  • individual objects (aka files) are immutable
  • Objects are Read-Only
    • existing object will be overwritten unless versioned
    • Composite objects are useful for append to an existing object
  • except for the following operations which are eventual consistent, all other operations are strongly consistent
    • revoking access from objects
    • accessing publicly readable cached objects
  • Buckets have default storage class, but objects can be of different classes (except regional and multi-regional can't be mixed)
  • storage location
    • regional: lowest latency, high performance local, high-frequency access
    • dual-region: low latency
    • multi-regional (large areas, USA, Europe or Asia): highly available, low end-user latency, high-frequency access
  • All storage classes offer 11 9's of durability
  • Storage classes (associated with buckets)
    • standard: no retrieval costs
    • nearline: durable, low frequency ~ once/month, retrieval costs
    • coldline: durable, low frequency - once/quarter, higher retrieval costs
    • archive: once/year
  • charges: /GB/Month + egress + data transfer; nearline $/GB read
    • if the objects are accessed from a different region, egress charges apply
  • Cloud Storage automatically create a service account service-[PROJECT_NUMBER]@gs-project-accounts.iam.gserviceaccount.com
    • used for services like pubsub notification, customer-managed encryption keys
  • object change notifications through pubsub
  • Q: How to optimally distribute load? A: Use randomized names instead of sequential and use the same bucket that is already primed

Large data transfer

  • storage transfer service: scheduled, managed batch transfer at PoP, or from other cloud providers
    • scales to billions of files and TB of data
    • Runs on-prem as a docker container, min 300 Mbps, charged
    • Automatic retries, secure, logged, can be tracked via Cloud Console
  • Transfer Appliance: rack-able appliance
    • up to 1PB
    • customer supplied encryption

Data Security

  • Permissions flow from Project -> Bucket -> Object
  • Retention policy applies at bucket level, enforces minimum duration for which objects cannot be deleted or modified
  • Bucket Locks and Object Holds
    • prevents from modifying/deleting data
  • BP: Enable Domain Restricted Access (limit access within Org) and Uniform bucket-level Access
  • Q: How do you protect PII data on Cloud Storage? A: Use granular ACLs, Signed URLs can be leaked.

Access Control

  • Uniform: IAM, bucket level only (Uniform bucket-level this is recommended
  • Fine-grained: IAM+ACL
    • ACL is legacy, but allows access at object level
      • ACL: Scope (who) and Permission (what)
      • scope: who can access, user or groups
      • permission: read, write
  • when both are used, either one which allows access is used
  • Signed URL: signed, time-limited, http-method restricted URLs with cryptographic key gsutil signurl -d 30m -u gs://bucket/file
    • cryptographic key can be either the service account (-u) or private key
    • method cannot be POST
  • Signed policy document: can control what can be uploaded, e.g. size limits, contents types
  • Use allUsers or allAuthenticatedUsers are special users for any (public) and authenticated users
  • Alternatives to ACLs:
    • Signed URLs: temporary read or write access to specific resources. Anyone with the URL will have access
    • Signed Policy:
    • Separate buckets: when multiple subsets of objects share same access patterns, move them to different buckets
    • IAM conditions: use resource name prefix, e.g. users have access to objects with dir-name/ prefix
  • Encryption: Server side encryption involves Key Encryption Key KEK -> Data Encryption Key -> encrypted data.
    • Client side encryption still goes through server side encryption
    • KEK can be
      • GMEK: Google Managed Encryption Key
      • CMEK: Customer Managed Encryption Key; still resides in Cloud KMS
      • CSEK: Customer supplied Encryption Key; not stored in cloud
    • Device level encryption is in addition to above storage level encryption
      • A small number of legacy HDDs support only AES128 bit, for guaranteed AES256 use SSDs

Features

  • sync a directory: gsutil rsync
  • Buckets can be tagged with
    • compressed: faster upload, lower storage cost, but served decompressed
    • requester-pays: requester pays any egress charges
  • Cloud Storage provides XML and JSON APIs. Prefer XML for RESTful API/boto framework, JSON for flexibility and lower cost
    • XML API needs MD5 sum; for composite objects this will be expensive due double the bandwidth consumed and elapsed time
    • life-cycle management APIs are supported only by JSON APIs

Lifecycle and Versioning

  • A lifecycle rule has 1+ conditions and 1 action
    • action can be Delete or SetStorageClass. For buckets with versioning, deleting current version makes it a non-current version
    • conditions examples are: age, number of versions (if versioned)
  • Versioning: gsutil versioning {get|set}
    • cannot be enabled for buckets that have retention policy
    • consists of generation which identifies content version, and metageneration which identifies metadata version within a specific generation
      • each new generation resets metageneration to 1
    • a non-current version has it's own ACL
    • each object has a generation even if versioning is not enabled

Cloud SQL

  • Vertical scaling for RW and horizontal scaling for RO
  • backups are disk level snapshots after issuing FLUSH TABLES WITH READ mysql command
    • Free 7 daily backups
    • restoring a backup requires replicas to be deleted first
  • allows import/export using mysqldump to cloud storage bucket
  • Replication support includes replicating from external instances (e.g. backing up on-prem)
    • two types Read and Failover
    • failover replicas are restricted to the same region (but can be in a different zone than primary)
    • read replica can have a instance size that is different from primary
  • Failover supported with cross-zone replication (same region)
    • applications don't need to change anything except reestablishing connection
  • my.cnf can't be modified like on-prem, use Database Options or REST APIs
  • ROT: Allocate larger disk than necessary to provide higher IOPs for small but high throughput database
    • By default SQL databases disks are auto-sized, need to be manually override allocation
  • Cloud SQL has higher CPU cost than corresponding VM. For large databases, it's cost effective to use VM
    • storage cost is same for both Cloud SQL and native VM
  • cannot decrease data disk size once allocated; create new instance and copy data
  • connecting to Cloud SQL instance
    • instance with private IP: can optionally require using SQL Proxy or self managed SSL certificates
    • instance with public IP: must be authorized using
      • Cloud SQL Proxy: secured and authorized using IAM permission
      • authorized networks; strongly recommended to use self-managed SSL certificates

Cloud Firestore

  • Document/key-value store, replaces old Cloud Datastore
  • built on top of BigTable (?)
  • Entity Group ensures locality of related entities, that have the same parent key
  • Entity is a record and is identified by a key
  • key consists of namespace, kind (type or ~ table name) and identifier: (an integer or a string)
    • entities can store other keys, but no RI enforced or maintained
  • supported operations are: get, put and delete
  • query language is called GQL (no joins)
  • unlike RDBMS, Indexes aren't just for optimization, but make certain advanced queries possible
    • Indexes are defined using yaml
  • replica's are maintained using two phase commit, the second phase (commit) is async
  • no backup/restore per se, but export/import
    • import overwrites existing entries, but any new entries aren't erased
  • pricing: per GB and per operation (get, put delete)
    • key-only operations are free, i.e. no properties are retrieved
  • v/s Cloud SQL:
    • about the same latency
    • higher throughput because of consistent latency when data size is large and high concurrency
    • can not only scale reads, but writes too
    • data replicated transparently and always
    • both offer fully ACID compliant transactions
  • can span App Engine and Compute Engine applications (Coursera Fundamentals Course)
  • Performance
    • avoid high reads or writes rates to keys that are lexicographically close (doesn't use sharding)
    • gradually ramp up traffic to new Datastore Kinds (allow BigTable to perform tablet splits)
    • avoid deleting large # of entities across a small range of keys (BigTable's compactions runs slower)
    • use manual sharding to increase throughput
    • index on individual properties are automatically created, but composite indexes can be manually created to enable queries
    • Use batch for multiple calls instead of separate individual calls
Features Native Datastore mode
writes scaling 10k/sec millions/sec
Listen for real-time updates Yes No
Offline persistence web/mobile Yes No
Data model document+collection kinds+entity groups
namespaces support No Yes
Best suited for Web+Mobile Server projects

Spanner

  • suitable as HTAP Hybrid Transactional Analytical Processing
  • data is always sharded and replicated (at least 3 copies)
  • concepts:
    • instance: A container that holds 1 or more databases, is automatically replicated
      • instead of replicating to a specific zone, a configuration of multiple zone is used
    • nodes make up an instance
    • databases contain tables, grouping for permissions, maintenance
    • Tables have schema, maximum cell size 10MB
  • DDL is similar
  • Instead of INSERT and UPDATE, one uses APIs to pass a JSON list containing objects to store data
    • e.g. employees.insert([{id: 1, name: "Steve Jobs"}, {id: 2, name: "Bill Gates"}])
  • In addition to SQL, a single table can be read using API
    • e.g employees.read({columns: ["id", "name"], key: ["1"]})
  • Using SQL: database.run("SELECT id, name FROM employees")
    • parameterized: database.run({sql: "SELECT id, name FROM employees WHERE id = @id", params: {id = 2}})
  • To ensure child rows are collocated with parent, use INTERLEAVE
  • Data is locked at cell (row and column) level, thus two transaction can update separate columns of the same row concurrently

BP

  • avoid creating hotspots by storing data with PK which is random instead of ordered
  • avoid using surrogate keys because they tend to be sequential
  • data from parent and child tables that is read and written together frequently, use root and interleaved tables pattern e.g. CREATE TABLE employee(id ...) PRIMARY KEY(id); and CREATE TABLE paychecks(id..., paycheck_id... ) PRIMARY KEY(id, paycheck_id), INTERLEAVE IN PARENT employee ON DELETE CASCADE;
  • split points refer to which rows can be sharded to different nodes.
    • rows with same primary key of root tables and all interleaved tables will be on the same node
  • To use index-only scans, store non-search columns with the index
    • CREATE INDEX nameix on employees(name) STORING(start_date)
  • Use interleaved indexes to ensure index entries are collocated with the root table

BigTable

  • high throughput, low latency, TB+, no-SQL, Key-Value store that uses Colossus as native file-system
  • managed, unlike HBase, scale by increasing machine count without downtime

Physical components

  • A BigTable Instance consists of 1 to 4 clusters, and each cluster consists of 1+ nodes
    • Instance can be either HDD or SSD
    • Instances that have replication enabled, have application profiles that govern how writes are handled (single-cluster routing or nearest-cluster routing)
  • A cluster is a zonal resource and represents BigTable service in a specific location
  • A node belongs to a cluster and performance scales linearly with number of nodes
  • Each Table is split into Tablets, each tablet is managed by only one node

Concepts

  • Row-key: a (derived) column by which all rows will be sorted
  • Column Families: columnar like support. Segregate frequently and rarely used columns in different families
  • Cell: Value at Row+Column intersection; multiple cells represent multiple versions at different timestamps
  • updated and deleted data is logical, requires compaction
  • new rows are appended
  • BigTable constantly balances data based on usage patterns
  • Each row and column intersection can contain multiple cells or versions at different timestamp
  • are sparsely populated

performance

  • group related data for more efficient reads => use column families
  • distribute data evenly for efficient writes
  • place identical values in the same row or adjoining rows for efficient compression (appropriate row-keys)
  • ensure client and BigTable are in same zone
  • minimize skew
  • use replication, for example, to isolate serving workloads from batch workloads
  • Use Key Visualizer' heat maps to identify row-key ranges

partitioning

  • can stores each partition on different shard
  • partitioning types
    • ingestion time: --time_partitioning_type=DAY
    • DATE or TIMESTAMP column: --time_partitioning_field tmcol
    • range partitioning: --range_partitioning=customer_id,0,100,10
  • in a predicate partitioning column must be on the left side
  • table can be made to require partitioning filter by specifying bq query ... --require_partition_filter

clustering

  • only supported on partitioned tables with time or ingest type partitioning
  • allows rows to be sorted within partition
  • not enforced, so it degrades over time as more data is added
  • BigQuery automatically re-clusters data automatically in background
  • up to 4 columns (no nested columns)