Skip to content

SnowPipe

  • PIPE object for continuous loading
  • BP: aim for 100-250 MB file sizes, if that has high latency, aim for 1 minute (< 1 min, not guaranteed to work)
  • can auto-ingest or manually using REST API
  • each pipe object has metadata that stores path and name only. So even if a modified version (eTag) of file is uploaded again, it won't be loaded again
  • Only charged for the number of cores used v/s entire WH. Prefer Snowpipe over COPY INTO for loading single files
  • ALTER PIPE REFRESH checks both PIPE and COPY history to skip previously loaded files
  • Serverless premium of $0.06/1000 files + 1.25 * warehouse_time
  • Monitoring:
    • SYSTEM$PIPE_STATUS to check status of a pipe (e.g. running/stopped, number of pending files)
      • executionStatus: running, paused, stopped_cloned -> cloned
      • pendingFileCount: number of files queued
      • [lastIngested{FilePath, Timestamp}, notificationChannelName, error]
    • ACCOUNT_USAGE.PIPE_USAGE_HISTORY: credits_used, bytes_inserted, files_inserted by PIPE_NAME, USAGE_DATE
  • Pipes can be paused and resumed.
  • A stale pipe (paused for more than 14 days) when resumed, ignores notifications received older than 14 days.

REST API

  • REST endpoints:
    • insertFiles (POST): adds files for consumption
    • insertReport (GET): retrieves results of previously submitted insertFiles
      • Events are kept only for 10 minutes or the most recent 10,000
    • loadHistoryScan (GET): retrieves results from a range of time; expensive; prefer insertReport over this
  • REST authentication via JWT

Snowpipe Streaming

  • Charged per active client, irrespective of number of channels
  • Exactly-once and per-channel ordering guarantees
  • Java SDK or Kafka connector
  • Supports Iceberg tables starting with 3.0.0
  • Throughput in GB/sec (Feature Friday 3/30/23)
  • monitoring:
    • SNOWPIPE_STREAMING_FILE_MIGRATION_HISTORY: By table, migration activity period
    • SNOWPIPE_STREAMING_CLIENT_HISTORY: By client, event timestamp
  • No more boundaries between streaming and batch (Feature Friday 3/30/23 4:00)
  • BP:
    • fewer clients, more channels
    • use insertRows (v/s multiple insertRow) with Batch size < 16MB
    • use supported java types for DATE, TIME and TIMESTAMP

Bulk v/s Continuous

Feature Bulk load SnowPipe
transactions single multiple
compute Warehouse Snowflake
error default ABORT_STATEMENT SKIP_FILE
truncate resets? Yes No
metadata days history 64 14
De-dup by table pipe
De-dup logic filename+MD5 filename
path segments from location when using PATTERN (1) Honored Trimmed
copy_H, load_H, pipe_usage_H LH+CH CH+PUH