Storage¶

types of services: BigTable, Cloud Storage, Cloud SQL, Cloud Spanner, Cloud Datastore, Cloud MemoryStore (managed Redis)
choose right storage
Unstructured: Cloud Storage (scale to petabytes)
Structured:
- Transactional:
- SQL: (scales to Gigabytes)
  - Single node: Cloud SQL
  - scale-out: Cloud Spanner
- NO-SQL: Cloud Datastore, Key-Value (Terabytes)
- Analytical: (Petabytes)
- Millisecond Latency: Cloud BigTable
- Seconds Latency: BigQuery
Memory Store is redis compliant in-memory storage two offerings, standard (replicas + failover) and basic
SQL Server high availability options:
replication: object level, data copied using replication agents
log shipping: database level, transaction log copied and replayed on another server
mirroring: database level, data copied to two network endpoints
clustering: instance level, storage is shared between primary and secondary using windows clustering
always on availability group: like mirroring but uses windows clustering and provides up to 8 replicas

Compare¶

Feature	Firestore	BigTable	Storage	SQL	Spanner	BigQuery	MemoryStore
Type	NoSQL	Wide col (HBase)	Immutable Blob	SQL OLTP	SQL OLTP	SQL OLAP	In Memory (Redis)
Transaction	Yes	Singe Row	-	Yes	Yes	-
Consistency	Strong	Eventual (Replicas)	Strong	Strong	Strong		Eventual (Replicas)
Latency	10k/Sec				Low	High	Low
Cost	$$ + R/W	$$ for small	$	$$$	$$ for small	$ + Queries
Capacity	TB+	PB+	PB+	TB	PB	PB+
Unit Size	1 MB/Entity	10MB Cell, 100MB/Row	5TB		10GB/Row	10MB/Row
Scale out	RW	RW		RO	RW
Scale down 0	Yes	min 3 nodes				Yes
Avail Regional	99.99		99.9, 99.0 (cold)		99.99
Avail multi-Reg	99.999		99.95+		99.999

Cloud Storage¶

Buckets are containers with globally unique names
individual objects (aka files) are immutable
Objects are Read-Only
existing object will be overwritten unless versioned
Composite objects are useful for append to an existing object
except for the following operations which are eventual consistent, all other operations are strongly consistent
revoking access from objects
accessing publicly readable cached objects
Buckets have default storage class, but objects can be of different classes (except regional and multi-regional can't be mixed)
storage location
regional: lowest latency, high performance local, high-frequency access
dual-region: low latency
multi-regional (large areas, USA, Europe or Asia): highly available, low end-user latency, high-frequency access
All storage classes offer 11 9's of durability
Storage classes (associated with buckets)
standard: no retrieval costs
nearline: durable, low frequency ~ once/month, retrieval costs
coldline: durable, low frequency - once/quarter, higher retrieval costs
archive: once/year
charges: /GB/Month + egress + data transfer; nearline $/GB read
if the objects are accessed from a different region, egress charges apply
Cloud Storage automatically create a service account service-[PROJECT_NUMBER]@gs-project-accounts.iam.gserviceaccount.com
used for services like pubsub notification, customer-managed encryption keys
object change notifications through pubsub
Q: How to optimally distribute load? A: Use randomized names instead of sequential and use the same bucket that is already primed
If randomness is after a prefix, it is effective only for that prefix

Large data transfer¶

storage transfer service: scheduled, managed batch transfer at PoP, or from other cloud providers
scales to billions of files and TB of data
Runs on-prem as a docker container, min 300 Mbps, charged
Automatic retries, secure, logged, can be tracked via Cloud Console
Transfer Appliance: rack-able appliance
up to 1PB
customer supplied encryption

Data Security¶

Permissions flow from Project -> Bucket -> Object
Retention policy applies at bucket level, enforces minimum duration for which objects cannot be deleted or modified
Bucket Locks and Object Holds
prevents from modifying/deleting data
BP: Enable Domain Restricted Access (limit access within Org) and Uniform bucket-level Access
Q: How do you protect PII data on Cloud Storage? A: Use granular ACLs, Signed URLs can be leaked.

Access Control¶

Uniform: IAM, bucket level only (Uniform bucket-level this is recommended
Fine-grained: IAM+ACL
ACL is legacy, but allows access at object level
- ACL: Scope (who) and Permission (what)
- scope: who can access, user or groups
- permission: read, write
when both are used, either one which allows access is used
Signed URL: signed, time-limited, http-method restricted URLs with cryptographic key gsutil signurl -d 30m -u gs://bucket/file
cryptographic key can be either the service account (-u) or private key
method cannot be POST
Signed policy document: can control what can be uploaded, e.g. size limits, contents types
Use allUsers or allAuthenticatedUsers are special users for any (public) and authenticated users
Alternatives to ACLs:
Signed URLs: temporary read or write access to specific resources. Anyone with the URL will have access
Signed Policy:
Separate buckets: when multiple subsets of objects share same access patterns, move them to different buckets
IAM conditions: use resource name prefix, e.g. users have access to objects with dir-name/ prefix
Encryption: Server side encryption involves Key Encryption Key KEK -> Data Encryption Key -> encrypted data.
Client side encryption still goes through server side encryption
KEK can be
- GMEK: Google Managed Encryption Key
- CMEK: Customer Managed Encryption Key; still resides in Cloud KMS
- CSEK: Customer supplied Encryption Key; not stored in cloud
Device level encryption is in addition to above storage level encryption
- A small number of legacy HDDs support only AES128 bit, for guaranteed AES256 use SSDs

Features¶

sync a directory: gsutil rsync
Buckets can be tagged with
compressed: faster upload, lower storage cost, but served decompressed
requester-pays: requester pays any egress charges
Cloud Storage provides XML and JSON APIs. Prefer XML for RESTful API/boto framework, JSON for flexibility and lower cost
XML API needs MD5 sum; for composite objects this will be expensive due double the bandwidth consumed and elapsed time
life-cycle management APIs are supported only by JSON APIs

Lifecycle and Versioning¶

A lifecycle rule has 1+ conditions and 1 action
action can be Delete or SetStorageClass. For buckets with versioning, deleting current version makes it a non-current version
conditions examples are: age, number of versions (if versioned)
Versioning: gsutil versioning {get|set}
cannot be enabled for buckets that have retention policy
consists of generation which identifies content version, and metageneration which identifies metadata version within a specific generation
- each new generation resets metageneration to 1
a non-current version has it's own ACL
each object has a generation even if versioning is not enabled

Cloud SQL¶

Vertical scaling for RW and horizontal scaling for RO
backups are disk level snapshots after issuing FLUSH TABLES WITH READ mysql command
Free 7 daily backups
restoring a backup requires replicas to be deleted first
allows import/export using mysqldump to cloud storage bucket
Replication support includes replicating from external instances (e.g. backing up on-prem)
two types Read and Failover
failover replicas are restricted to the same region (but can be in a different zone than primary)
read replica can have a instance size that is different from primary
Failover supported with cross-zone replication (same region)
applications don't need to change anything except reestablishing connection
my.cnf can't be modified like on-prem, use Database Options or REST APIs
ROT: Allocate larger disk than necessary to provide higher IOPs for small but high throughput database
By default SQL databases disks are auto-sized, need to be manually override allocation
Cloud SQL has higher CPU cost than corresponding VM. For large databases, it's cost effective to use VM
storage cost is same for both Cloud SQL and native VM
cannot decrease data disk size once allocated; create new instance and copy data
connecting to Cloud SQL instance
instance with private IP: can optionally require using SQL Proxy or self managed SSL certificates
instance with public IP: must be authorized using
- Cloud SQL Proxy: secured and authorized using IAM permission
- authorized networks; strongly recommended to use self-managed SSL certificates

Cloud Firestore¶

Document/key-value store, replaces old Cloud Datastore
built on top of BigTable (?)
Entity Group ensures locality of related entities, that have the same parent key
Entity is a record and is identified by a key
key consists of namespace, kind (type or ~ table name) and identifier: (an integer or a string)
entities can store other keys, but no RI enforced or maintained
supported operations are: get, put and delete
query language is called GQL (no joins)
unlike RDBMS, Indexes aren't just for optimization, but make certain advanced queries possible
Indexes are defined using yaml
replica's are maintained using two phase commit, the second phase (commit) is async
no backup/restore per se, but export/import
import overwrites existing entries, but any new entries aren't erased
pricing: per GB and per operation (get, put delete)
key-only operations are free, i.e. no properties are retrieved
v/s Cloud SQL:
about the same latency
higher throughput because of consistent latency when data size is large and high concurrency
can not only scale reads, but writes too
data replicated transparently and always
both offer fully ACID compliant transactions
can span App Engine and Compute Engine applications (Coursera Fundamentals Course)
Performance
avoid high reads or writes rates to keys that are lexicographically close (doesn't use sharding)
gradually ramp up traffic to new Datastore Kinds (allow BigTable to perform tablet splits)
avoid deleting large # of entities across a small range of keys (BigTable's compactions runs slower)
use manual sharding to increase throughput
index on individual properties are automatically created, but composite indexes can be manually created to enable queries
Use batch for multiple calls instead of separate individual calls

Features	Native	Datastore mode
writes scaling	10k/sec	millions/sec
Listen for real-time updates	Yes	No
Offline persistence web/mobile	Yes	No
Data model	document+collection	kinds+entity groups
namespaces support	No	Yes
Best suited for	Web+Mobile	Server projects

Spanner¶

suitable as HTAP Hybrid Transactional Analytical Processing
data is always sharded and replicated (at least 3 copies)
concepts:
instance: A container that holds 1 or more databases, is automatically replicated
- instead of replicating to a specific zone, a configuration of multiple zone is used
nodes make up an instance
databases contain tables, grouping for permissions, maintenance
Tables have schema, maximum cell size 10MB
DDL is similar
Instead of INSERT and UPDATE, one uses APIs to pass a JSON list containing objects to store data
e.g. employees.insert([{id: 1, name: "Steve Jobs"}, {id: 2, name: "Bill Gates"}])
In addition to SQL, a single table can be read using API
e.g employees.read({columns: ["id", "name"], key: ["1"]})
Using SQL: database.run("SELECT id, name FROM employees")
parameterized: database.run({sql: "SELECT id, name FROM employees WHERE id = @id", params: {id = 2}})
To ensure child rows are collocated with parent, use INTERLEAVE
Data is locked at cell (row and column) level, thus two transaction can update separate columns of the same row concurrently

BP¶

avoid creating hotspots by storing data with PK which is random instead of ordered
avoid using surrogate keys because they tend to be sequential
data from parent and child tables that is read and written together frequently, use root and interleaved tables pattern e.g. CREATE TABLE employee(id ...) PRIMARY KEY(id); and CREATE TABLE paychecks(id..., paycheck_id... ) PRIMARY KEY(id, paycheck_id), INTERLEAVE IN PARENT employee ON DELETE CASCADE;
split points refer to which rows can be sharded to different nodes.
rows with same primary key of root tables and all interleaved tables will be on the same node
To use index-only scans, store non-search columns with the index
CREATE INDEX nameix on employees(name) STORING(start_date)
Use interleaved indexes to ensure index entries are collocated with the root table

BigTable¶

high throughput, low latency, TB+, no-SQL, Key-Value store that uses Colossus as native file-system
managed, unlike HBase, scale by increasing machine count without downtime

Physical components¶

A BigTable Instance consists of 1 to 4 clusters, and each cluster consists of 1+ nodes
Instance can be either HDD or SSD
Instances that have replication enabled, have application profiles that govern how writes are handled (single-cluster routing or nearest-cluster routing)
A cluster is a zonal resource and represents BigTable service in a specific location
A node belongs to a cluster and performance scales linearly with number of nodes
Each Table is split into Tablets, each tablet is managed by only one node

Concepts¶

Row-key: a (derived) column by which all rows will be sorted
Column Families: columnar like support. Segregate frequently and rarely used columns in different families
Cell: Value at Row+Column intersection; multiple cells represent multiple versions at different timestamps
updated and deleted data is logical, requires compaction
new rows are appended
BigTable constantly balances data based on usage patterns
Each row and column intersection can contain multiple cells or versions at different timestamp
are sparsely populated

performance¶

group related data for more efficient reads => use column families
distribute data evenly for efficient writes
place identical values in the same row or adjoining rows for efficient compression (appropriate row-keys)
ensure client and BigTable are in same zone
minimize skew
use replication, for example, to isolate serving workloads from batch workloads
Use Key Visualizer' heat maps to identify row-key ranges

partitioning¶

can stores each partition on different shard
partitioning types
ingestion time: --time_partitioning_type=DAY
DATE or TIMESTAMP column: --time_partitioning_field tmcol
range partitioning: --range_partitioning=customer_id,0,100,10
in a predicate partitioning column must be on the left side
table can be made to require partitioning filter by specifying bq query ... --require_partition_filter

clustering¶

only supported on partitioned tables with time or ingest type partitioning
allows rows to be sorted within partition
not enforced, so it degrades over time as more data is added
BigQuery automatically re-clusters data automatically in background
up to 4 columns (no nested columns)