Features¶
Time-travel¶
- data retention can be specified at account, database, schema or table level, the most granular wins
- Account parameter
MIN_DATA_RETENTION_TIME_IN_DAYSdetermines the minimum effective time-travel period - Increasing/decreasing time-travel period affects data that is already in time-travel and not which is in fail-safe.
- reducing time-travel period is not effective immediately until a background process picks up the change
- not supported on shared data
- Specified using
AT|BEFORE - Timestamp values are specified as strings and unless using explicit cast, they are treated as
timestamp_ntz(same as UTC) FROMclause can also specifyCHANGESin which case changes made sinceAT|BEFOREare returned
Data Clean Room¶
- Provider needs to have a Enterprise+ edition, consumer can be non-Snowflake user
- One environment per account that supports multiple clean rooms
- Uses differential privacy: mathematical tools to add noise to hide the originating data
- can be managed using Web UI or API
-
steps:
-
Provider creates a clean room
-
imports tables/views, and SQL templates, or Python code
- join policies: which columns are allowed for joining
- segmentation columns: analyst can create segments only from the specified columns (e.g.
income_bracket,age_band) - activation columns: analyst cannot enrich results unless it comes from the activation columns
-
invites a consumer, who import their own tables/view (collaborator tables)
- consumer applies filters, segments
- consumer activates the result, i.e. sends data to the provider for enrichment
- consumer can download the enriched results
Setup¶
- Sign up for DCR
- Add IP address of the DCR Web App to network policy
- Configure Snowflake account:
- create a service account
- Install Snowflake native app (
SAMOOHA_BY_SNOWFLAKE) -
Register databases that need to be shared
- only tables are supported
Unstructured data¶
- stored in internal/external stages, using directory tables
- Access using URLs
- for URLs to work, choose server side encryption for stages
- Can use REST API
GET /api/files/to download data from an internal/external stage - clients must use OAuth2
- Java, Scala and Python UDFs/SPs can read from URLs dynamically
- Enable SSE encryption for an internal stage to allow access to unstructured data
| Property | Scoped | File | Presigned |
|---|---|---|---|
| Expiry | result cache (24 hr) | permanent | Expiry time argument |
| URL type | encoded | path | encoded |
| Generate | BUILD_SCOPED_FILE_URL |
BUILD_STAGE_FILE_URL |
GET_PRESIGNED_URL |
| Permission | user who generated it | access to stage | anyone |
| Usage | REST API/Snowsight | REST API/Snowsight | Within browser |
| Data Sharing | Yes | No | Yes |
Schema Detection/Evolution¶
Schema detection¶
- table function
infer_schema(location => <stage>, file_format => 'myfmt')gives a list of columns generate_column_description(<array col-desc>, '<type>')takesarray_agg(object_construct(infer_schema(...)))to return column DDLscreate table ... using template (<query-expr>)uses string from the above query to create table- for JSON and CSV files only, entire file is scanned by default to detect schema
- also works at folder level by returning all columns across all files
Schema evolution¶
- supported when loading
avro,parquet,orc,jsonandcsvdata - stand-alone feature, but can be used with schema detection
-
To enable:
-
table must be created with
enable_schema_evolution = true COPY INTOmust specifymatch_by_column_name(for CSV setparse_headerand seterror_on_column_count_mismatchtofalse)-
role that loads data must include
evolve schemaorownershipon table - does not drop columns, but will make required columns as nullable
- per run: max 10 new columns, 1 schema change per
copy into - Not supported by Snowpipe streaming SDK; but supported by Kafka Connector with Snowpipe Streaming
create/alter table set enable_schema_evolution = true- when loading multiple files with different schemas,
copy intocan evolve schema using one file at a time - keep running until
copy intosucceeds when it evolves schema for all files - schema changes are not rolled back even if
copy intofails
Native Apps¶
- A native application consists of manifest (JSON format), setup scripts (SQL), and business logic (resource files) that gets installed on consumer accounts
- business logic can be written using Streamlit, stored-procs, UDFs, Snowflake API, javascript and SQL
- setup script installs in a local schema: tables, views, UDFs, procs, app roles
- can use SQL, Python or Java
- can bundle proprietary data (tables and stages)
- consumers do not have access to internal data
- apps can access external data
- versioned
- can be included in a listing (free/paid, private/marketplace)
- Provider creates application package that includes data, application logic, metadata, setup script and versions/patch levels
- Consumer installs the application from the listing
Query Acceleration¶
- Works for queries that are deterministic, scan large amount of data, and return a small result set. E.g.
- large scans and highly selective filters or aggregated result is expected to be low cardinality
alter warehouse foo_wh set enable_query_acceleration = true, query_acceleration_max_scale_factor = 10- scale factor is omitted or
0=> unbounded, queries can lease as many resources as needed and available - regardless of scale factor, query uses only the resources needed (if available)
- e.g. queries running on a Medium WH can use up to (\(4 * 10\)) 40 nodes
- common reasons for ineligibility:
- low partition count
- not selective enough (e.g.
group byhas high cardinality) - non deterministic functions or
limitwithout anorder by - supports:
select,insert,ctasandcopy into <table> - use
SYSTEM$ESTIMATE_QUERY_ACCELERATION(<qid>)orQUERY_ACCELERATION_ELIGIBLEview