Skip to content

dbt

Components

  • profile.yml: describes database connections, connections details by environment. Not required for dbt cloud
  • dbt_project.yml: settings for specific dbt project
    • default paths for project folders
    • config: e.g. model materialization using table v/s view
  • models: transformation logic
  • macros: jinja macros for code generation
  • snapshots: maintain SCD
  • tests: Generic or Singular (one-off) tests
    • Generic: unique, not_null
    • Singular: consists of a SQL; it's successful if it doesn't return any data
  • analyses:
  • seeds:

Models

  • models -> 1+ folders -> 1+ .sql files
    • each sub-folder under models may contain schema.yml that describe schema, contain docs and tests
  • source refers to data (table/view) that is outside of dbt project and not created during the dbt pipeline
  • {{ config(.....) }} for setting model specific settings:
    • e.g. {{ config(pre_hook=["ALTER SESSION SET QUERY_TAG='MODEL_A'"]
    • e.g. {{ config(materialized = 'ephemeral')
    • transient=False
    • cluster_by=[]
  • a key prefixed with + in the project yaml file implies the settings affect the key and everything that and below
  • incremental runs are supported by using:
    • materialized="incremental" and unique_key(in config)
    • in conjunction with jinja if is_incremental (in SQL text)
  • supported jinja expressions:
    • source refers to an external table/view
    • ref refers to another model, and creates a dependency
    • this (the current model being created)
  • materialization strategies: View, Table, Incremental, Ephemeral (CTE based, no objects are added)

Snapshots

  • must have models enclosed jinja2 {% snapshot DIM_CUSTOMER_SCD %}
  • define a strategy, unique_key and check_cols in config
  • strategies: timestamp and check
  • Examples:
    {% snapshot orders_snp %}
    {{ config(strategy='timestamp', unique_key='id', updated_at='last_upd_ts') }}
    
    select * from {{ source('jaffle_shop', 'orders') }}
    
    (% endsnapshot %)
    
    {% snapshot orders_snp %}
    {{ config(strategy='check', unique_key='id', check_cols=['status', 'is_cancelled']) }}
    
    select * from {{ source('jaffle_shop', 'orders') }}
    
    (% endsnapshot %)