Skip to content

Table

  • types: permanent, transient, temporary, external
  • no failsafe period for transient and temporary
  • only permanent tables (enterprise+) have up to 90 days of time-travel, other tables have 0-1 days
  • temporary: exists only for the session, can have the same name as permanent tables but will shadow them
  • Only constrained enforced is NOT NULL
  • copying tables
    • create table t2 like t1 [copy grants]: copy only structure (and grants)
    • create table t2 as select * from t1 [copy grants]: copy structure and data (and grants)
  • Consists of micro partitions each max 16MB compressed => 50-500MB sized object storage
  • re-clustering is automatic and non-blocking
    • can be suspended/resumed using ALTER TABLE t1 SUSPEND/RESUME RECLUSTER
    • charges available in information_schema.automatic_clustering_history
  • Statistics maintained by cloud-services layer:
    • table level: Row count, table size (bytes)
    • micro-partition level: Counts: Row, NULLs, Distinct Values; Values: MIN, MAX (only for integer and date data types?)

Dynamic Table

  • Supported in incremental refresh:
    • WITH,
    • SELECT: expression must have columns or deterministic functions, immutable UDF
    • FROM: tables, views, dynamic tables, Iceberg tables
    • OVER: all window functions
    • JOIN: inner, outer-equi, cross, lateral flatten (non-static FLATTEN)
  • lateral joins are unsupported except lateral flatten

External Table

  • Can have Snowflake leverage partition pruning
    SELECT metadata$filename FROM @s1/;
    --  METADATA$FILENAME
    --  ---------------------------------------
    --  files/logs/2018/08/05/0524/log.parquet
    --  files/logs/2018/08/27/1408/log.parquet
    
    CREATE EXTERNAL TABLE et1(
        date_part      date   AS TO_DATE(SPLIT_PART(metadata$filename, '/', 3) || '/' ||
                                         SPLIT_PART(metadata$filename, '/', 4) || '/' ||
                                         SPLIT_PART(metadata$filename, '/', 5), 'YYYY/MM/DD'),
        timestamp     bigint  AS (value:timestamp::bigint),
        col2          varchar AS (value:col2::varchar)
    ) PARTITION BY (date_part) LOCATION=@s1/logs/
    
  • Allows row access policy
  • Workflow for creating:
    1. create named stage: create storage integration using IAM role and S3 prefix, edit IAM policy trust relation using SI IAM user and external ID
    2. create external table
    3. verify using alter refresh,
    4. configure s3 notification: two options depending on if there is already an existing notification configured for the S3 prefix:
      • no existing notification: configure S3 notification for ObjectCreate and ObjectRemoved events using Ext table's SQS ARN (notification_channel)
      • already existing notification:
        1. Create SNS topic
        2. subscribe existing SQS to the SNS topic
        3. Grant Snowflake SQS permission to subscribe to the SNS topic using system$get_aws_sns_iam_policy('<sns_topic_arn>') to get
        4. Create external table with SNS topic
    5. alter refresh
  • Use ALTER EXTERNAL TABLE <tb> REMOVE FILES ('file1', ...) to remove old files
  • Use table function EXTERNAL_TABLE_FILES(TABLE_NAME => '<TB>') to list all files

Event Tables

  • One per account, used for logging

Hybrid Tables

  • Supports and enforces primary-key, foreign-key, unique constraints and indexes; must have a primary-key
  • Row-level locking
  • Bulk-loading: supported only if table is empty. Use,
    • initial bulk-load as a single transaction
    • CTAS (does not support foreign key)
  • quotas and throttles
    • 2TB per database
    • 8,000 operations per second per database
    • 100 databases that can contain hybrid tables
  • unsupported: failsafe, data sharing, streams, snowpipe, dynamic tables, materialized view, replication, QAS, SO
  • limited support:
feature support
time-travel only timestamp option
cloning only at database level
undrop table only by cloning database

BP

  • x-small warehouse is sufficient for most queries
  • for concurrency use MCW with standard scaling policy
  • use separate WH for hybrid tables, since initial warming up time can take up to 2-3 minutes
  • Supported by may offer sub-optimal performance:
    • stored procedures (prefer multi-statement transactions and autocommit)
    • serverless tasks
  • use aggregate_query_history, which aggregates repeated SQL statements every minute
    • several metric columns are of object type, and contain sum, avg, p99 etc values

Hybrid Tables

  • Queries that run under 500ms don't appear in QUERY_HISTORY, but AGGREGATE_QUERY_HISTORY will be updated
  • hybrid tables are run on a separate compute cluster, but must be submitted through the standard WH
  • quotas:
    • storage ~ 2.5TB
    • compute: throttling is activated when the compute cluster is being dominated by a single customer
  • cost:
    • storage cost can be as high as 8 to 10 times the regular tables
    • VWH cost + "request" costs
  • supports: time-travel
  • does not support:
    • result cache
    • streams
    • replication (not yet)

Performance BP

  • loading into empty table is much faster
  • Single and multi-statement transaction offer better performance
  • order of key columns matters
  • query-profile: look for and prefer "row scan mode" v/s column (object storage)

Interactive Tables

  • must be added to a interactive warehouse
  • max of 10 tables can be added to an interactive warehouse
  • Interactive WH cache data locally