Skip to content

Data Governance

  • Focuses on availability, security, usability, integrity*
    • know, protect, share your data
  • Data Governance determines the focus of data quality improvements based on
    • business value
    • data stewards provide business understanding of assigned data domains
    • Management is the decision you make... Governance is the structure for making them
  • Several Type of Governance: BI, IT, Data, DW
    1. Data Governance: Getting the data right (Info supply)
    2. BI Governance: Using the data right (Info demand)
  • Master Data Management: Single source for master data that is standardized and conformed
  • MDM collation and distribution strategies:
    • Data consolidation: capture data from various data sources and maintain single hub (ODS)
    • Data federation: provide single virtual view of distributed master data
    • Data propagation: copying data between different systems

Data Governance Types

  • Data Quality:
    • validity: out-of-range, empty data (technical DQ)
    • accuracy: look-up values do not exist (business DQ)
    • completeness: missing values (eg no zip code, data enrichment needed)
    • consistency: duplicate data, concurrency issues
    • uniformity: different units of measurements
  • Data Security and Privacy
    • Auditing & Monitoring
    • Vulnerability Management (OS/Database Hardening)
    • Database Security (Authentication, Access rights, Encryption)
  • Metadata
    • Data Lineage
    • Business and transformation rules
    • Source system
    • Data freshness
    • 3 types of Metadata
      • Business: deals with the data contents; e.g. calculations
      • Technical: deals with data objects; database catalog, RI etc
      • Operational: deals with data movement
        • when, from where and how data was brought over
        • Load/Create/Update Date, Session data

People

  • Data Owners: Accountable and responsible for data generated and consumed.
  • Data Stewards: Responsible for data quality on day-to-day basis
    • Usually SME within their data domain
    • ensures/maintains data {quality, definition, metadata, documents source}
  • Data Custodians: implement physical environment (DBAs, ETL developers etc)
    • ensures authorized data access, data {integrity, auditability, safeguard}

Personas

  • BI BA DA
  • Business Intelligence: process of collecting, storing and analyzing data from business operations. Prescriptive Analysis answering What and How
  • Business Analytics: practice of using company's data to anticipate trends and outcomes. Predictive Analysis answering Why questions
    • consists of: statistical analysis, predictive modeling, data mining
    • goals: determining KPIs, surfacing insights,
  • Data Analytics: technical process of mining and data, building systems to manage data. Somewhat overlaps with BA.
    • BA uses data with the goal of improving business process, whereas DA analyze and gather data for the business
    • data is means to an end for BA, but data is an end for DA
  • Data Scientist: devises scientific algorithms, such as machine learning to create a model from which business analysts can make prediction
    • tools: machine learning, feature engineering
    • feature engineering: developing new data from existing data, e.g. is_weekday from sale_date
  • Data Engineer: implements an efficient engineering process to implement. Develops data pipelines for
    • by Data Scientists
    • Data Ingestion/ETL/ELT
  • ETL Developer: Focused on developing transformation jobs using ETL tool like Informatica
  • types of analysis
    • Analyst + BI:
      • Descriptive: What happened
      • Diagnostic: Why it happened
    • Data Scientist + ML:
      • Predictive: What will happen
      • Prescriptive: How can we make it happen