Unity Catalog¶
- Unified governance for all data and AI assets
- supports both, Hive Metastore and ICEBERG Rest API

- Metastore: usually one per region
Top-level is top-level logical container in Unity catalog
- Catalog: top-most container for data objects, consists of 1 or more schemas
Federation¶
- can federate external or internal hive metastore or AWS Glue catalog
- foreign catalogs, except internal hive metastore, are read-only
- both, internal and Unity catalogs are updated when written to
- Internal (legacy) hive metastore are per workspace
- Unity catalog still provides data governance functions such as access control and auditing
Schema¶
- is also called database
- namespace container for several schema level objects
Tables¶
- Managed: registered with Unity catalog and data is stored in managed locations
- Foreign: managed by external system or catalog service
- External: data is stored in an external location
- Streaming: Delta tables used for processing incremental data in DLT.
- Feature: Delta tables that have primary key
- ~~Live~~: deprecated, replaced by materialized view
- Permissions:
SELECT,MODIFY,MANAGE(for DROP),CREATE TABLE(on schema)
Views¶
- Materialized: incrementally calculate and update results
- Temporary: not registered in catalog. Scope depends on
- notebooks and jobs: scope of the notebook or script
- DBSQL: all statements within the same query
- Dynamic: allow row and column level access control
- Hive metastore views: legacy, can be defined and registered with hive metastore
- Hive metastore Global Temp views:
Volumes¶
- allow governance over non-tabular datasets
- managed: are located within managed location
- external: are external, have user controlled lifecycle, require storage credentials
Besides Catalogs, metastore contains:
- Service, Storage Credentials: used to authenticate cloud storage containers such as managed storage
- External Location: used to provide access control at file level
- Share, Recipient: Delta sharing
- Provider, Connection, Clean Room