Skip to content

Unity Catalog

  • Unified governance for all data and AI assets
  • supports both, Hive Metastore and ICEBERG Rest API

Hierarchy

  • Metastore: usually one per region

Top-level is top-level logical container in Unity catalog

  • Catalog: top-most container for data objects, consists of 1 or more schemas

Federation

  • can federate external or internal hive metastore or AWS Glue catalog
  • foreign catalogs, except internal hive metastore, are read-only
    • both, internal and Unity catalogs are updated when written to
  • Internal (legacy) hive metastore are per workspace
  • Unity catalog still provides data governance functions such as access control and auditing

Schema

  • is also called database
  • namespace container for several schema level objects

Tables

  • Managed: registered with Unity catalog and data is stored in managed locations
  • Foreign: managed by external system or catalog service
  • External: data is stored in an external location
  • Streaming: Delta tables used for processing incremental data in DLT.
  • Feature: Delta tables that have primary key
  • ~~Live~~: deprecated, replaced by materialized view
  • Permissions: SELECT, MODIFY, MANAGE (for DROP), CREATE TABLE (on schema)

Views

  • Materialized: incrementally calculate and update results
  • Temporary: not registered in catalog. Scope depends on
    • notebooks and jobs: scope of the notebook or script
    • DBSQL: all statements within the same query
  • Dynamic: allow row and column level access control
  • Hive metastore views: legacy, can be defined and registered with hive metastore
  • Hive metastore Global Temp views:

Volumes

  • allow governance over non-tabular datasets
  • managed: are located within managed location
  • external: are external, have user controlled lifecycle, require storage credentials

Besides Catalogs, metastore contains:

  • Service, Storage Credentials: used to authenticate cloud storage containers such as managed storage
  • External Location: used to provide access control at file level
  • Share, Recipient: Delta sharing
  • Provider, Connection, Clean Room