Data Mesh¶
- strives to close the gap between operational data and analytical data by promoting Data as a Product instead of data being carried using ETL
- Example
| Traditional Analytical Data | Data Mesh |
|---|---|
| Centralized Ownership & Governance | Decentralized Ownership & Federated Governance |
| Monolithic | Distributed |
| Pipeline as first-class concern | Domain as first-class concern |
| Data as a by-product | Data as a product |
| language: Ingesting | Serving |
| language: Extract, load, onboard | Discover, consume, link |
| language: data flows through pipelines | publish data via ports |
| language: data lake/warehouse/platform | ecosystem of data as products |
Data Mesh Pillars¶
- Domain-driven Data Ownership and Decomposition Architecture Domains
- Domains aligned with the origin of data: Facts & reality of business, immutable timed events, historical snapshots, changes less frequently, permanently captured
- Domains aligned with the consumption: Fir for consuming, aggregation/projection/transformation, changes often, can be recreated
- Domains that straddle the above two
- Domain Data Product: data as a product that has characteristics: shareable, discoverable, understandable, addressable, trustworthy, inter-operable, secure
- it's an architectural quantum
- provides historical and read-only access to data
- consists of Input and Output Data Ports
- Self-serve Data Infra as a Platform
- Autonomy: doesn't mean duplicating technical infrastructure or resources, abstract technical complexity in to self-serve data infrastructure using
- blueprints, unified access patterns, discoverability, SLO and monitoring, pipeline orchestration, CI/CD, automating governance
- Autonomy: doesn't mean duplicating technical infrastructure or resources, abstract technical complexity in to self-serve data infrastructure using
- Federated Governance: build an ecosystem that governs all data products across all domains