Data Cloud Deployment Framework
Layers
- Source
- Cloud
- kept indefinitely
- Well defined directory structure by Source System, Database, Schema, Table, Data Extraction Period
- Raw: Layer for ingestion, minimal transformations, data as it exists on source systems
- consists of multiple databases, one for each source system
- maintains current and historical versions using primary key
- versions are maintained as separate tables and merge pattern
- Integration:
- data transformations
- enforces business rules
- surrogate key management
- Presentation: provides custom views of data models to end-users, consumption layer
- conformed dimensions, fact tables, aggregates
- Share: allows sharing data to different business entities, external partners and customers
- normally built on top of presentation layer, but can be on different layers
Additional Databases
- common: for sharing UDFs, SP, named stages, file-formats
- Workspace: work areas for different teams, e.g. Data Scientists, Analysts
- can persist data back to production
Business Entities
- facilitates corporate separateness. Typically a business unit or line-of-business LOB
- A business entity environment is isolated environment consisting of
- Warehouses, Databases, Schemas,
- Resource Monitors, Integrations, policies etc
- A business entity can contain multiple environment types: DEV, TEST, PROD
- Service Models
- PaaS: Central team provisions environment, other teams build out and manage operations
- Data as a Service (DaaS): Central team handles up to Raw layer, and other teams handle remaining layers
- Analytics as a Service (AaaS): Central team handles up to Presentation layer
- Single v/s multiple accounts
- separation of business entities combined with proper naming conventions allow consolidations of multiple accounts into a single account easier
- Enterprise data assets are stored in a central place
- Multi-region business entities
- business entities may be prohibited from sharing all data
- easier to share Raw layer if business entities follow different integration approaches
- have multiple databases for Raw layer to selectively share and reduce replication costs
Incremental processing
- Raw, Integration and Presentation layers should be adjusted for incremental processing
Naming Conventions
- applies to
- Privileges
- Object Types: Databases, Warehouses, Roles etc
- Raw layer
- utilizes
- Business Entity
- Environment Type
- Layer / Type
- Team / Function
- Modifier: may be needed for clarification or make name unique
- BP including environment type even for multi-account setup allows a later consolidation into a single account
- BP Cloud layer and Raw layer should have names that align closely