SnowPipe¶
PIPEobject for continuous loading- BP: aim for 100-250 MB file sizes, if that has high latency, aim for 1 minute (< 1 min, not guaranteed to work)
- can auto-ingest or manually using REST API
- each pipe object has metadata that stores path and name only. So even if a modified version (
eTag) of file is uploaded again, it won't be loaded again - Only charged for the number of cores used v/s entire WH. Prefer Snowpipe over
COPY INTOfor loading single files ALTER PIPE REFRESHchecks both PIPE and COPY history to skip previously loaded files- Serverless premium of $0.06/1000 files + 1.25 * warehouse_time
- Monitoring:
SYSTEM$PIPE_STATUSto check status of a pipe (e.g. running/stopped, number of pending files)executionStatus:running,paused,stopped_cloned-> clonedpendingFileCount: number of files queued- [
lastIngested{FilePath, Timestamp},notificationChannelName,error]
ACCOUNT_USAGE.PIPE_USAGE_HISTORY:credits_used,bytes_inserted,files_insertedbyPIPE_NAME,USAGE_DATE
- Pipes can be paused and resumed.
- A stale pipe (paused for more than 14 days) when resumed, ignores notifications received older than 14 days.
REST API¶
- REST endpoints:
insertFiles(POST): adds files for consumptioninsertReport(GET): retrieves results of previously submittedinsertFiles- Events are kept only for 10 minutes or the most recent 10,000
loadHistoryScan(GET): retrieves results from a range of time; expensive; preferinsertReportover this
- REST authentication via JWT
Snowpipe Streaming¶
- Charged per active client, irrespective of number of channels
- Exactly-once and per-channel ordering guarantees
- Java SDK or Kafka connector
- Supports Iceberg tables starting with 3.0.0
- Throughput in GB/sec (Feature Friday 3/30/23)
- monitoring:
SNOWPIPE_STREAMING_FILE_MIGRATION_HISTORY: By table, migration activity periodSNOWPIPE_STREAMING_CLIENT_HISTORY: By client, event timestamp
- No more boundaries between streaming and batch (Feature Friday 3/30/23 4:00)
- BP:
- fewer clients, more channels
- use
insertRows(v/s multipleinsertRow) with Batch size < 16MB - use supported java types for
DATE,TIMEandTIMESTAMP
Bulk v/s Continuous¶
| Feature | Bulk load | SnowPipe |
|---|---|---|
| transactions | single | multiple |
| compute | Warehouse | Snowflake |
| error default | ABORT_STATEMENT |
SKIP_FILE |
| truncate resets? | Yes | No |
| metadata days history | 64 | 14 |
| De-dup by | table | pipe |
| De-dup logic | filename+MD5 | filename |
path segments from location when using PATTERN (1) |
Honored | Trimmed |
copy_H, load_H, pipe_usage_H |
LH+CH | CH+PUH |