Skip to content

Features

Time-travel

  • data retention can be specified at account, database, schema or table level, the most granular wins
  • Account parameter MIN_DATA_RETENTION_TIME_IN_DAYS determines the minimum effective time-travel period
  • Increasing/decreasing time-travel period affects data that is already in time-travel and not which is in fail-safe.
  • reducing time-travel period is not effective immediately until a background process picks up the change
  • not supported on shared data
  • Specified using AT|BEFORE
    • Timestamp values are specified as strings and unless using explicit cast, they are treated as timestamp_ntz (same as UTC)
    • FROM clause can also specify CHANGES in which case changes made since AT|BEFORE are returned

Data Clean Room

  • Provider needs to have a Enterprise+ edition, consumer can be non-Snowflake user
  • One environment per account that supports multiple clean rooms
  • Uses differential privacy: mathematical tools to add noise to hide the originating data
  • can be managed using Web UI or API
  • steps:
    1. Provider creates a clean room
    2. imports tables/views, and SQL templates, or Python code
      • join policies: which columns are allowed for joining
      • segmentation columns: analyst can create segments only from the specified columns (e.g. income_bracket, age_band)
      • activation columns: analyst cannot enrich results unless it comes from the activation columns
    3. invites a consumer, who import their own tables/view (collaborator tables)
    4. consumer applies filters, segments
    5. consumer activates the result, i.e. sends data to the provider for enrichment
    6. consumer can download the enriched results

Setup

  1. Sign up for DCR
  2. Add IP address of the DCR Web App to network policy
  3. Configure Snowflake account:
    1. create a service account
    2. Install Snowflake native app (SAMOOHA_BY_SNOWFLAKE)
    3. Register databases that need to be shared
      • only tables are supported

Unstructured data

  • stored in internal/external stages, using directory tables
  • Access using URLs
  • for URLs to work, choose server side encryption for stages
  • Can use REST API GET /api/files/ to download data from an internal/external stage
    • clients must use OAuth2
  • Java, Scala and Python UDFs/SPs can read from URLs dynamically
  • Enable SSE encryption for an internal stage to allow access to unstructured data
Property Scoped File Presigned
Expiry result cache (24 hr) permanent Expiry time argument
URL type encoded path encoded
Generate BUILD_SCOPED_FILE_URL BUILD_STAGE_FILE_URL GET_PRESIGNED_URL
Permission user who generated it access to stage anyone
Usage REST API/Snowsight REST API/Snowsight Within browser
Data Sharing Yes No Yes

Schema Detection/Evolution

Schema detection

  • table function infer_schema(location => <stage>, file_format => 'myfmt') gives a list of columns
  • generate_column_description(<array col-desc>, '<type>') takes array_agg(object_construct(infer_schema(...))) to return column DDLs
  • create table ... using template (<query-expr>) uses string from the above query to create table
  • for JSON and CSV files only, entire file is scanned by default to detect schema
  • also works at folder level by returning all columns across all files

Schema evolution

  • supported when loading avro, parquet, orc, json and csv data
  • stand-alone feature, but can be used with schema detection
  • To enable:
    1. table must be created with enable_schema_evolution = true
    2. COPY INTO must specify match_by_column_name (for CSV set parse_header and set error_on_column_count_mismatch to false)
    3. role that loads data must include evolve schema or ownership on table
  • limitations
    • does not drop columns, but will make required columns as nullable
    • per run: max 10 new columns, 1 schema change per copy into
    • Not supported by Snowpipe streaming SDK; but supported by Kafka Connector with Snowpipe Streaming
  • create/alter table set enable_schema_evolution = true
  • when loading multiple files with different schemas, copy into can evolve schema using one file at a time
    • keep running until copy into succeeds when it evolves schema for all files
    • schema changes are not rolled back even if copy into fails

Native Apps

  • A native application consists of manifest (JSON format), setup scripts (SQL), and business logic (resource files) that gets installed on consumer accounts
    • business logic can be written using Streamlit, stored-procs, UDFs, Snowflake API, javascript and SQL
  • setup script installs in a local schema: tables, views, UDFs, procs, app roles
    • can use SQL, Python or Java
  • can bundle proprietary data (tables and stages)
    • consumers do not have access to internal data
  • apps can access external data
  • versioned
  • can be included in a listing (free/paid, private/marketplace)
  • Provider creates application package that includes data, application logic, metadata, setup script and versions/patch levels
  • Consumer installs the application from the listing

Query Acceleration

  • Works for queries that are deterministic, scan large amount of data, and return a small result set. E.g.
    • large scans and highly selective filters or aggregated result is expected to be low cardinality
  • alter warehouse foo_wh set enable_query_acceleration = true, query_acceleration_max_scale_factor = 10
    • scale factor is omitted or 0 => unbounded, queries can lease as many resources as needed and available
    • regardless of scale factor, query uses only the resources needed (if available)
    • e.g. queries running on a Medium WH can use up to (\(4 * 10\)) 40 nodes
  • common reasons for ineligibility:
    • low partition count
    • not selective enough (e.g. group by has high cardinality)
    • non deterministic functions or limit without an order by
  • supports: select, insert, ctas and copy into <table>
  • use SYSTEM$ESTIMATE_QUERY_ACCELERATION(<qid>) or QUERY_ACCELERATION_ELIGIBLE view