OpenFlow
Components
Each account can have 1+ OF Deployments, each deployment can have 1+ runtimes
A runtime has specific compute size
NiFi Terminology
FlowFile : Data file along with various attributes (GUID, size, name etc) to be processed by Processors
Processors : core unit of work, acts on FlowFile
Connection : A path/connection between two processors , contains a queue of FlowFiles to be processed. Each connection consists of one or more Relationships
Relationship : 0 or more named output from Processors to which a new FlowFile is routed. Common relationships are Success , Failure , Retry
Controller Service : shared configuration consisting of database connections, authentication, schema management
Processor Group : A group of Processors and Connections , an organization unit that can be exported and imported to VCS
Setup
two flavors: BYO-Compute and SPCS
BYO-C two flavors: managed VPC and BYO-VPC
BYO-VPC:
must have private subnets (must be tagged)
NAT/Internet gateway, must allow egress
IP address of the NAT gateway must be white-listed
BYO-C: Snowflake creates EKS cluster, EC2 instance runs Data Plane Agent
Create security groups (inbound) to allow access from EKS cluster should be attached to the data source (e.g. port 5432 for RDS Postgres)
if data sources are in a different VPC, additional networking will be needed: VPC peering/VPC endpoint/transit gateway
EKS can use privatelink, but the agents currently use internet to pull images and runtimes
governance
Observability
OpenFlow event table: captures logs and metrics
each deployment can optionally specify an event table (default snowflake.telemetry.events)
large number of logs, typical filters: time, runtime key, log level
cost monitoring
AWS deployment costs:
EC2 costs:
Always running for the deployment Agent node, EKS management node (~ $6/day)
By runtime: EKS worker nodes
VPC/NAT Gateway data transfer
EKS cluster, auto-scaling groups
ELB
other negligible: S3 bucket (system journal), secrets manager, EBS volume
Snowflake costs:
BYOC compute costs, per sec with min 60 sec, 0.0225 credits/hour (of active runtime)
ingestion: snowpipe or snowpipe streaming
telemetry ingestion (logging): 0.0212 credits/GB
OF runtime costs (BYO-C) available in