Skip to content

OpenFlow

Components

  • Each account can have 1+ OF Deployments, each deployment can have 1+ runtimes
  • A runtime has specific compute size

NiFi Terminology

  • FlowFile: Data file along with various attributes (GUID, size, name etc) to be processed by Processors
  • Processors: core unit of work, acts on FlowFile
  • Connection: A path/connection between two processors, contains a queue of FlowFiles to be processed. Each connection consists of one or more Relationships
  • Relationship: 0 or more named output from Processors to which a new FlowFile is routed. Common relationships are Success, Failure, Retry
  • Controller Service: shared configuration consisting of database connections, authentication, schema management
  • Processor Group: A group of Processors and Connections, an organization unit that can be exported and imported to VCS

Setup

  • two flavors: BYO-Compute and SPCS
  • BYO-C two flavors: managed VPC and BYO-VPC
  • BYO-VPC:
    • must have private subnets (must be tagged)
    • NAT/Internet gateway, must allow egress
    • IP address of the NAT gateway must be white-listed
  • BYO-C: Snowflake creates EKS cluster, EC2 instance runs Data Plane Agent
  • Create security groups (inbound) to allow access from EKS cluster should be attached to the data source (e.g. port 5432 for RDS Postgres)
  • if data sources are in a different VPC, additional networking will be needed: VPC peering/VPC endpoint/transit gateway
  • EKS can use privatelink, but the agents currently use internet to pull images and runtimes

governance

Observability

  • OpenFlow event table: captures logs and metrics
  • each deployment can optionally specify an event table (default snowflake.telemetry.events)
  • large number of logs, typical filters: time, runtime key, log level

cost monitoring

  • AWS deployment costs:
    • EC2 costs:
      • Always running for the deployment Agent node, EKS management node (~ $6/day)
      • By runtime: EKS worker nodes
    • VPC/NAT Gateway data transfer
    • EKS cluster, auto-scaling groups
    • ELB
    • other negligible: S3 bucket (system journal), secrets manager, EBS volume
  • Snowflake costs:
    • BYOC compute costs, per sec with min 60 sec, 0.0225 credits/hour (of active runtime)
    • ingestion: snowpipe or snowpipe streaming
    • telemetry ingestion (logging): 0.0212 credits/GB
  • OF runtime costs (BYO-C) available in