Snowpipe¶

PIPE object for continuous loading
BP: aim for 100-250 MB file sizes, if that has high latency, aim for 1 minute (< 1 min, not guaranteed to work)
can auto-ingest or manually using REST API
each pipe object has metadata that stores path and name only. So even if a modified version (eTag) of file is uploaded again, it won't be loaded again
Only charged for the number of cores used v/s entire WH. Prefer Snowpipe over COPY INTO for loading single files
ALTER PIPE REFRESH checks both PIPE and COPY history to skip previously loaded files
Serverless premium of $0.06/1000 files + 1.25 * warehouse_time
Monitoring:
- SYSTEM$PIPE_STATUS to check status of a pipe (e.g. running/stopped, number of pending files)
  - executionStatus: running, paused, stopped_cloned -> cloned
  - pendingFileCount: number of files queued
  - [lastIngested{FilePath, Timestamp}, notificationChannelName, error]
- ACCOUNT_USAGE.PIPE_USAGE_HISTORY: credits_used, bytes_inserted, files_inserted by PIPE_NAME, USAGE_DATE
Pipes can be paused and resumed.
A stale pipe (paused for more than 14 days) when resumed, ignores notifications received older than 14 days.

REST API¶

REST endpoints:
- insertFiles (POST): adds files for consumption
- insertReport (GET): retrieves results of previously submitted insertFiles
  - Events are kept only for 10 minutes or the most recent 10,000
- loadHistoryScan (GET): retrieves results from a range of time; expensive; prefer insertReport over this
REST authentication via JWT

Charged per active client, irrespective of number of channels
Exactly-once and per-channel ordering guarantees
Java SDK or Kafka connector
Supports Iceberg tables starting with 3.0.0
Throughput in GB/sec (Feature Friday 3/30/23)
monitoring:
- SNOWPIPE_STREAMING_FILE_MIGRATION_HISTORY: By table, migration activity period
- SNOWPIPE_STREAMING_CLIENT_HISTORY: By client, event timestamp
No more boundaries between streaming and batch (Feature Friday 3/30/23 4:00)
BP:
- fewer clients, more channels
- use insertRows (v/s multiple insertRow) with Batch size < 16MB
- use supported java types for DATE, TIME and TIMESTAMP

Feature	Bulk load	SnowPipe
transactions	single	multiple
compute	Warehouse	Snowflake
error default	`ABORT_STATEMENT`	`SKIP_FILE`
truncate resets?	Yes	No
metadata days history	64	14
De-dup by	table	pipe
De-dup logic	filename+MD5	filename
path segments from location when using `PATTERN` (1)	Honored	Trimmed
`copy_H`, `load_H`, `pipe_usage_H`	LH+CH	CH+PUH