Table¶
- types: permanent, transient, temporary, external
- no failsafe period for transient and temporary
- only permanent tables (enterprise+) have up to 90 days of time-travel, other tables have 0-1 days
- temporary: exists only for the session, can have the same name as permanent tables but will shadow them
- Only constrained enforced is
NOT NULL- RI is not enforced, but may be used by the optimizer for join elimination
- copying tables
create table t2 like t1 [copy grants]: copy only structure (and grants)create table t2 as select * from t1 [copy grants]: copy structure and data (and grants)
- Consists of micro partitions each max 16MB compressed => 50-500MB sized object storage
- re-clustering is automatic and non-blocking
- can be suspended/resumed using
ALTER TABLE t1 SUSPEND/RESUME RECLUSTER - charges available in
information_schema.automatic_clustering_history
- can be suspended/resumed using
- Statistics maintained by cloud-services layer:
- table level: Row count, table size (bytes)
- micro-partition level: Counts: Row, NULLs, Distinct Values; Values: MIN, MAX (only for integer and date data types?)
Dynamic Table¶
- Supported in incremental refresh:
WITH,SELECT: expression must have columns or deterministic functions, immutable UDFFROM: tables, views, dynamic tables, Iceberg tablesOVER: all window functionsJOIN: inner, outer-equi, cross, lateral flatten (non-staticFLATTEN)
- lateral joins are unsupported except
lateral flatten
External Table¶
- Can have Snowflake leverage partition pruning
SELECT metadata$filename FROM @s1/; -- METADATA$FILENAME -- --------------------------------------- -- files/logs/2018/08/05/0524/log.parquet -- files/logs/2018/08/27/1408/log.parquet CREATE EXTERNAL TABLE et1( date_part date AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3) || '/' || SPLIT_PART(metadata$filename, '/', 4) || '/' || SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'), timestamp bigint AS (value:timestamp::bigint), col2 varchar AS (value:col2::varchar) ) PARTITION BY (date_part) LOCATION=@s1/logs/ - Allows row access policy
- Workflow for creating:
- create named stage: create storage integration using IAM role and S3 prefix, edit IAM policy trust relation using SI IAM user and external ID
- create external table
- verify using alter refresh,
- configure s3 notification: two options depending on if there is already an existing notification configured for the S3 prefix:
- no existing notification: configure S3 notification for
ObjectCreateandObjectRemovedevents using Ext table's SQS ARN (notification_channel) - already existing notification:
- Create SNS topic
- subscribe existing SQS to the SNS topic
- Grant Snowflake SQS permission to subscribe to the SNS topic using
system$get_aws_sns_iam_policy('<sns_topic_arn>')to get - Create external table with SNS topic
- no existing notification: configure S3 notification for
- alter refresh
- Use
ALTER EXTERNAL TABLE <tb> REMOVE FILES ('file1', ...)to remove old files - Use table function
EXTERNAL_TABLE_FILES(TABLE_NAME => '<TB>')to list all files
Event Tables¶
- One per account, used for logging
Hybrid Tables¶
- Supports and enforces primary-key, foreign-key, unique constraints and indexes; must have a primary-key
- Row-level locking
- Bulk-loading: supported only if table is empty. Use,
- initial bulk-load as a single transaction
- CTAS (does not support foreign key)
- quotas and throttles
- 2TB per database
- 8,000 operations per second per database
- 100 databases that can contain hybrid tables
- unsupported: failsafe, data sharing, streams, snowpipe, dynamic tables, materialized view, replication, QAS, SO
- limited support:
| feature | support |
|---|---|
| time-travel | only timestamp option |
| cloning | only at database level |
| undrop table | only by cloning database |
BP¶
x-smallwarehouse is sufficient for most queries- for concurrency use MCW with
standardscaling policy - use separate WH for hybrid tables, since initial warming up time can take up to 2-3 minutes
- Supported by may offer sub-optimal performance:
- stored procedures (prefer multi-statement transactions and autocommit)
- serverless tasks
- use
aggregate_query_history, which aggregates repeated SQL statements every minute- several metric columns are of
objecttype, and contain sum, avg, p99 etc values
- several metric columns are of
Hybrid Tables¶
- Queries that run under 500ms don't appear in
QUERY_HISTORY, butAGGREGATE_QUERY_HISTORYwill be updated - hybrid tables are run on a separate compute cluster, but must be submitted through the standard WH
- quotas:
- storage ~ 2.5TB
- compute: throttling is activated when the compute cluster is being dominated by a single customer
- cost:
- storage cost can be as high as 8 to 10 times the regular tables
- VWH cost + "request" costs
- supports: time-travel
- does not support:
- result cache
- streams
- replication (not yet)
Performance BP¶
- loading into empty table is much faster
- Single and multi-statement transaction offer better performance
- order of key columns matters
- query-profile: look for and prefer "row scan mode" v/s column (object storage)
Interactive Tables¶
- must be added to a interactive warehouse
- max of 10 tables can be added to an interactive warehouse
- Interactive WH cache data locally