dbt¶
Components¶
profile.yml: describes database connections, connections details by environment. Not required for dbt clouddbt_project.yml: settings for specific dbt project- default paths for project folders
- config: e.g. model materialization using table v/s view
models: transformation logicmacros: jinja macros for code generationsnapshots: maintain SCDtests: Generic or Singular (one-off) tests- Generic:
unique,not_null - Singular: consists of a SQL; it's successful if it doesn't return any data
- Generic:
analyses:seeds:
Models¶
- models -> 1+ folders -> 1+
.sqlfiles- each sub-folder under
modelsmay containschema.ymlthat describe schema, contain docs and tests
- each sub-folder under
sourcerefers to data (table/view) that is outside of dbt project and not created during the dbt pipeline{{ config(.....) }}for setting model specific settings:- e.g.
{{ config(pre_hook=["ALTER SESSION SET QUERY_TAG='MODEL_A'"] - e.g.
{{ config(materialized = 'ephemeral') transient=Falsecluster_by=[]
- e.g.
- a key prefixed with
+in the project yaml file implies the settings affect the key and everything that and below - incremental runs are supported by using:
materialized="incremental"andunique_key(in config)- in conjunction with jinja
if is_incremental(in SQL text)
- supported jinja expressions:
sourcerefers to an external table/viewrefrefers to another model, and creates a dependencythis(the current model being created)
- materialization strategies: View, Table, Incremental, Ephemeral (CTE based, no objects are added)
Snapshots¶
- must have models enclosed jinja2
{% snapshot DIM_CUSTOMER_SCD %} - define a
strategy,unique_keyandcheck_colsinconfig - strategies:
timestampandcheck - Examples: