Storage¶
- types of services: BigTable, Cloud Storage, Cloud SQL, Cloud Spanner, Cloud Datastore, Cloud MemoryStore (managed Redis)
- choose right storage
- Unstructured: Cloud Storage (scale to petabytes)
- Structured:
- Transactional:
- SQL: (scales to Gigabytes)
- Single node: Cloud SQL
- scale-out: Cloud Spanner
- NO-SQL: Cloud Datastore, Key-Value (Terabytes)
- SQL: (scales to Gigabytes)
- Analytical: (Petabytes)
- Millisecond Latency: Cloud BigTable
- Seconds Latency: BigQuery
- Transactional:
- Memory Store is redis compliant in-memory storage two offerings, standard (replicas + failover) and basic
- SQL Server high availability options:
- replication: object level, data copied using replication agents
- log shipping: database level, transaction log copied and replayed on another server
- mirroring: database level, data copied to two network endpoints
- clustering: instance level, storage is shared between primary and secondary using windows clustering
- always on availability group: like mirroring but uses windows clustering and provides up to 8 replicas
Compare¶
| Feature | Firestore | BigTable | Storage | SQL | Spanner | BigQuery | MemoryStore |
|---|---|---|---|---|---|---|---|
| Type | NoSQL | Wide col (HBase) | Immutable Blob | SQL OLTP | SQL OLTP | SQL OLAP | In Memory (Redis) |
| Transaction | Yes | Singe Row | - | Yes | Yes | - | |
| Consistency | Strong | Eventual (Replicas) | Strong | Strong | Strong | Eventual (Replicas) | |
| Latency | 10k/Sec | Low | High | Low | |||
| Cost | $$ + R/W | $$ for small | $ | $$$ | $$ for small | $ + Queries | |
| Capacity | TB+ | PB+ | PB+ | TB | PB | PB+ | |
| Unit Size | 1 MB/Entity | 10MB Cell, 100MB/Row | 5TB | 10GB/Row | 10MB/Row | ||
| Scale out | RW | RW | RO | RW | |||
| Scale down 0 | Yes | min 3 nodes | Yes | ||||
| Avail Regional | 99.99 | 99.9, 99.0 (cold) | 99.99 | ||||
| Avail multi-Reg | 99.999 | 99.95+ | 99.999 |
Cloud Storage¶
- Buckets are containers with globally unique names
- individual objects (aka files) are immutable
- Objects are Read-Only
- existing object will be overwritten unless versioned
- Composite objects are useful for append to an existing object
- except for the following operations which are eventual consistent, all other operations are strongly consistent
- revoking access from objects
- accessing publicly readable cached objects
- Buckets have default storage class, but objects can be of different classes (except regional and multi-regional can't be mixed)
- storage location
- regional: lowest latency, high performance local, high-frequency access
- dual-region: low latency
- multi-regional (large areas, USA, Europe or Asia): highly available, low end-user latency, high-frequency access
- All storage classes offer 11 9's of durability
- Storage classes (associated with buckets)
- standard: no retrieval costs
- nearline: durable, low frequency ~ once/month, retrieval costs
- coldline: durable, low frequency - once/quarter, higher retrieval costs
- archive: once/year
- charges: /GB/Month + egress + data transfer; nearline $/GB read
- if the objects are accessed from a different region, egress charges apply
- Cloud Storage automatically create a service account
service-[PROJECT_NUMBER]@gs-project-accounts.iam.gserviceaccount.com- used for services like pubsub notification, customer-managed encryption keys
- object change notifications through pubsub
- Q: How to optimally distribute load? A: Use randomized names instead of sequential and use the same bucket that is already primed
- If randomness is after a prefix, it is effective only for that prefix
Large data transfer¶
- storage transfer service: scheduled, managed batch transfer at PoP, or from other cloud providers
- scales to billions of files and TB of data
- Runs on-prem as a docker container, min 300 Mbps, charged
- Automatic retries, secure, logged, can be tracked via Cloud Console
- Transfer Appliance: rack-able appliance
- up to 1PB
- customer supplied encryption
Data Security¶
- Permissions flow from Project -> Bucket -> Object
- Retention policy applies at bucket level, enforces minimum duration for which objects cannot be deleted or modified
- Bucket Locks and Object Holds
- prevents from modifying/deleting data
- BP: Enable Domain Restricted Access (limit access within Org) and Uniform bucket-level Access
- Q: How do you protect PII data on Cloud Storage? A: Use granular ACLs, Signed URLs can be leaked.
Access Control¶
- Uniform: IAM, bucket level only (Uniform bucket-level this is recommended
- Fine-grained: IAM+ACL
- ACL is legacy, but allows access at object level
- ACL: Scope (who) and Permission (what)
- scope: who can access, user or groups
- permission: read, write
- ACL is legacy, but allows access at object level
- when both are used, either one which allows access is used
- Signed URL: signed, time-limited, http-method restricted URLs with cryptographic key
gsutil signurl -d 30m -u gs://bucket/file- cryptographic key can be either the service account (
-u) or private key - method cannot be POST
- cryptographic key can be either the service account (
- Signed policy document: can control what can be uploaded, e.g. size limits, contents types
- Use
allUsersorallAuthenticatedUsersare special users for any (public) and authenticated users - Alternatives to ACLs:
- Signed URLs: temporary read or write access to specific resources. Anyone with the URL will have access
- Signed Policy:
- Separate buckets: when multiple subsets of objects share same access patterns, move them to different buckets
- IAM conditions: use resource name prefix, e.g. users have access to objects with
dir-name/prefix
- Encryption: Server side encryption involves Key Encryption Key KEK -> Data Encryption Key -> encrypted data.
- Client side encryption still goes through server side encryption
- KEK can be
- GMEK: Google Managed Encryption Key
- CMEK: Customer Managed Encryption Key; still resides in Cloud KMS
- CSEK: Customer supplied Encryption Key; not stored in cloud
- Device level encryption is in addition to above storage level encryption
- A small number of legacy HDDs support only AES128 bit, for guaranteed AES256 use SSDs
Features¶
- sync a directory:
gsutil rsync - Buckets can be tagged with
compressed: faster upload, lower storage cost, but served decompressedrequester-pays: requester pays any egress charges
- Cloud Storage provides XML and JSON APIs. Prefer XML for RESTful API/
botoframework, JSON for flexibility and lower cost- XML API needs MD5 sum; for composite objects this will be expensive due double the bandwidth consumed and elapsed time
- life-cycle management APIs are supported only by JSON APIs
Lifecycle and Versioning¶
- A lifecycle rule has 1+ conditions and 1 action
- action can be
DeleteorSetStorageClass. For buckets with versioning, deleting current version makes it a non-current version - conditions examples are: age, number of versions (if versioned)
- action can be
- Versioning:
gsutil versioning {get|set}- cannot be enabled for buckets that have retention policy
- consists of
generationwhich identifies content version, andmetagenerationwhich identifies metadata version within a specificgeneration- each new
generationresetsmetagenerationto 1
- each new
- a non-current version has it's own ACL
- each object has a
generationeven if versioning is not enabled
Cloud SQL¶
- Vertical scaling for RW and horizontal scaling for RO
- backups are disk level snapshots after issuing
FLUSH TABLES WITH READmysql command- Free 7 daily backups
- restoring a backup requires replicas to be deleted first
- allows
import/exportusingmysqldumpto cloud storage bucket - Replication support includes replicating from external instances (e.g. backing up on-prem)
- two types Read and Failover
- failover replicas are restricted to the same region (but can be in a different zone than primary)
- read replica can have a instance size that is different from primary
- Failover supported with cross-zone replication (same region)
- applications don't need to change anything except reestablishing connection
my.cnfcan't be modified like on-prem, use Database Options or REST APIs- ROT: Allocate larger disk than necessary to provide higher IOPs for small but high throughput database
- By default SQL databases disks are auto-sized, need to be manually override allocation
- Cloud SQL has higher CPU cost than corresponding VM. For large databases, it's cost effective to use VM
- storage cost is same for both Cloud SQL and native VM
- cannot decrease data disk size once allocated; create new instance and copy data
- connecting to Cloud SQL instance
- instance with private IP: can optionally require using SQL Proxy or self managed SSL certificates
- instance with public IP: must be authorized using
- Cloud SQL Proxy: secured and authorized using IAM permission
- authorized networks; strongly recommended to use self-managed SSL certificates
Cloud Firestore¶
- Document/key-value store, replaces old Cloud Datastore
- built on top of BigTable (?)
- Entity Group ensures locality of related entities, that have the same parent key
- Entity is a record and is identified by a key
- key consists of namespace, kind (type or ~ table name) and identifier: (an integer or a string)
- entities can store other keys, but no RI enforced or maintained
- supported operations are: get, put and delete
- query language is called GQL (no joins)
- unlike RDBMS, Indexes aren't just for optimization, but make certain advanced queries possible
- Indexes are defined using yaml
- replica's are maintained using two phase commit, the second phase (commit) is async
- no backup/restore per se, but export/import
- import overwrites existing entries, but any new entries aren't erased
- pricing: per GB and per operation (get, put delete)
- key-only operations are free, i.e. no properties are retrieved
- v/s Cloud SQL:
- about the same latency
- higher throughput because of consistent latency when data size is large and high concurrency
- can not only scale reads, but writes too
- data replicated transparently and always
- both offer fully ACID compliant transactions
- can span App Engine and Compute Engine applications (Coursera Fundamentals Course)
- Performance
- avoid high reads or writes rates to keys that are lexicographically close (doesn't use sharding)
- gradually ramp up traffic to new Datastore Kinds (allow BigTable to perform tablet splits)
- avoid deleting large # of entities across a small range of keys (BigTable's compactions runs slower)
- use manual sharding to increase throughput
- index on individual properties are automatically created, but composite indexes can be manually created to enable queries
- Use batch for multiple calls instead of separate individual calls
| Features | Native | Datastore mode |
|---|---|---|
| writes scaling | 10k/sec | millions/sec |
| Listen for real-time updates | Yes | No |
| Offline persistence web/mobile | Yes | No |
| Data model | document+collection | kinds+entity groups |
| namespaces support | No | Yes |
| Best suited for | Web+Mobile | Server projects |
Spanner¶
- suitable as HTAP Hybrid Transactional Analytical Processing
- data is always sharded and replicated (at least 3 copies)
- concepts:
- instance: A container that holds 1 or more databases, is automatically replicated
- instead of replicating to a specific zone, a configuration of multiple zone is used
- nodes make up an instance
- databases contain tables, grouping for permissions, maintenance
- Tables have schema, maximum cell size 10MB
- instance: A container that holds 1 or more databases, is automatically replicated
- DDL is similar
- Instead of
INSERTandUPDATE, one uses APIs to pass a JSON list containing objects to store data- e.g.
employees.insert([{id: 1, name: "Steve Jobs"}, {id: 2, name: "Bill Gates"}])
- e.g.
- In addition to SQL, a single table can be read using API
- e.g
employees.read({columns: ["id", "name"], key: ["1"]})
- e.g
- Using SQL:
database.run("SELECT id, name FROM employees")- parameterized:
database.run({sql: "SELECT id, name FROM employees WHERE id = @id", params: {id = 2}})
- parameterized:
- To ensure child rows are collocated with parent, use
INTERLEAVE - Data is locked at cell (row and column) level, thus two transaction can update separate columns of the same row concurrently
BP¶
- avoid creating hotspots by storing data with PK which is random instead of ordered
- avoid using surrogate keys because they tend to be sequential
- data from parent and child tables that is read and written together frequently, use root and interleaved tables pattern
e.g.
CREATE TABLE employee(id ...) PRIMARY KEY(id);andCREATE TABLE paychecks(id..., paycheck_id... ) PRIMARY KEY(id, paycheck_id), INTERLEAVE IN PARENT employee ON DELETE CASCADE; - split points refer to which rows can be sharded to different nodes.
- rows with same primary key of root tables and all interleaved tables will be on the same node
- To use index-only scans, store non-search columns with the index
CREATE INDEX nameix on employees(name) STORING(start_date)
- Use interleaved indexes to ensure index entries are collocated with the root table
BigTable¶
- high throughput, low latency, TB+, no-SQL, Key-Value store that uses Colossus as native file-system
- managed, unlike HBase, scale by increasing machine count without downtime
Physical components¶
- A BigTable Instance consists of 1 to 4 clusters, and each cluster consists of 1+ nodes
- Instance can be either HDD or SSD
- Instances that have replication enabled, have application profiles that govern how writes are handled (single-cluster routing or nearest-cluster routing)
- A cluster is a zonal resource and represents BigTable service in a specific location
- A node belongs to a cluster and performance scales linearly with number of nodes
- Each Table is split into Tablets, each tablet is managed by only one node
Concepts¶
- Row-key: a (derived) column by which all rows will be sorted
- Column Families: columnar like support. Segregate frequently and rarely used columns in different families
- Cell: Value at Row+Column intersection; multiple cells represent multiple versions at different timestamps
- updated and deleted data is logical, requires compaction
- new rows are appended
- BigTable constantly balances data based on usage patterns
- Each row and column intersection can contain multiple
cellsor versions at different timestamp - are sparsely populated
performance¶
- group related data for more efficient reads => use column families
- distribute data evenly for efficient writes
- place identical values in the same row or adjoining rows for efficient compression (appropriate row-keys)
- ensure client and BigTable are in same zone
- minimize skew
- use replication, for example, to isolate serving workloads from batch workloads
- Use Key Visualizer' heat maps to identify row-key ranges
partitioning¶
- can stores each partition on different shard
- partitioning types
- ingestion time:
--time_partitioning_type=DAY DATEorTIMESTAMPcolumn:--time_partitioning_field tmcol- range partitioning:
--range_partitioning=customer_id,0,100,10
- ingestion time:
- in a predicate partitioning column must be on the left side
- table can be made to require partitioning filter by specifying
bq query ... --require_partition_filter
clustering¶
- only supported on partitioned tables with time or ingest type partitioning
- allows rows to be sorted within partition
- not enforced, so it degrades over time as more data is added
- BigQuery automatically re-clusters data automatically in background
- up to 4 columns (no nested columns)