Skip to content

Data Cloud Deployment Framework

  • Best practices

Layers

  1. Source
  2. Cloud
    • kept indefinitely
    • Well defined directory structure by Source System, Database, Schema, Table, Data Extraction Period
  3. Raw: Layer for ingestion, minimal transformations, data as it exists on source systems
    • consists of multiple databases, one for each source system
    • maintains current and historical versions using primary key
    • versions are maintained as separate tables and merge pattern
  4. Integration:
    • data transformations
    • enforces business rules
    • surrogate key management
  5. Presentation: provides custom views of data models to end-users, consumption layer
    • conformed dimensions, fact tables, aggregates
  6. Share: allows sharing data to different business entities, external partners and customers
    • normally built on top of presentation layer, but can be on different layers

Additional Databases

  • common: for sharing UDFs, SP, named stages, file-formats
  • Workspace: work areas for different teams, e.g. Data Scientists, Analysts
    • can persist data back to production

Business Entities

  • facilitates corporate separateness. Typically a business unit or line-of-business LOB
  • A business entity environment is isolated environment consisting of
    • Warehouses, Databases, Schemas,
    • Resource Monitors, Integrations, policies etc
  • A business entity can contain multiple environment types: DEV, TEST, PROD
  • Service Models
    • PaaS: Central team provisions environment, other teams build out and manage operations
    • Data as a Service (DaaS): Central team handles up to Raw layer, and other teams handle remaining layers
    • Analytics as a Service (AaaS): Central team handles up to Presentation layer
  • Single v/s multiple accounts
    • separation of business entities combined with proper naming conventions allow consolidations of multiple accounts into a single account easier
  • Enterprise data assets are stored in a central place
  • Multi-region business entities
    • business entities may be prohibited from sharing all data
    • easier to share Raw layer if business entities follow different integration approaches
    • have multiple databases for Raw layer to selectively share and reduce replication costs

Incremental processing

  • Raw, Integration and Presentation layers should be adjusted for incremental processing

Naming Conventions

  • applies to
    • Privileges
    • Object Types: Databases, Warehouses, Roles etc
    • Raw layer
  • utilizes
    • Business Entity
    • Environment Type
    • Layer / Type
    • Team / Function
    • Modifier: may be needed for clarification or make name unique
  • BP including environment type even for multi-account setup allows a later consolidation into a single account
  • BP Cloud layer and Raw layer should have names that align closely