Skip to content

Data Governance

  • Focuses on availability, security, usability, integrity*
  • know, protect, share your data
  • Data Governance determines the focus of data quality improvements based on
  • business value
  • data stewards provide business understanding of assigned data domains
  • Management is the decision you make... Governance is the structure for making them
  • Several Type of Governance: BI, IT, Data, DW
  • Data Governance: Getting the data right (Info supply)
  • BI Governance: Using the data right (Info demand)
  • Master Data Management: Single source for master data that is standardized and conformed
  • MDM collation and distribution strategies:
  • Data consolidation: capture data from various data sources and maintain single hub (ODS)
  • Data federation: provide single virtual view of distributed master data
  • Data propagation: copying data between different systems

Data Governance Types

  • Data Quality:
  • validity: out-of-range, empty data (technical DQ)
  • accuracy: look-up values do not exist (business DQ)
  • completeness: missing values (eg no zip code, data enrichment needed)
  • consistency: duplicate data, concurrency issues
  • uniformity: different units of measurements
  • Data Security and Privacy
  • Auditing & Monitoring
  • Vulnerability Management (OS/Database Hardening)
  • Database Security (Authentication, Access rights, Encryption)
  • Metadata
  • Data Lineage
  • Business and transformation rules
  • Source system
  • Data freshness
  • 3 types of Metadata
    • Business: deals with the data contents; e.g. calculations
    • Technical: deals with data objects; database catalog, RI etc
    • Operational: deals with data movement
    • when, from where and how data was brought over
    • Load/Create/Update Date, Session data

People

  • Data Owners: Accountable and responsible for data generated and consumed.
  • Data Stewards: Responsible for data quality on day-to-day basis
  • Usually SME within their data domain
  • ensures/maintains data {quality, definition, metadata, documents source}
  • Data Custodians: implement physical environment (DBAs, ETL developers etc)
  • ensures authorized data access, data {integrity, auditability, safeguard}

Personas

  • BI BA DA
  • Business Intelligence: process of collecting, storing and analyzing data from business operations. Prescriptive Analysis answering What and How
  • Business Analytics: practice of using company's data to anticipate trends and outcomes. Predictive Analysis answering Why questions
  • consists of: statistical analysis, predictive modeling, data mining
  • goals: determining KPIs, surfacing insights,
  • Data Analytics: technical process of mining and data, building systems to manage data. Somewhat overlaps with BA.
  • BA uses data with the goal of improving business process, whereas DA analyze and gather data for the business
  • data is means to an end for BA, but data is an end for DA
  • Data Scientist: devises scientific algorithms, such as machine learning to create a model from which business analysts can make prediction
  • tools: machine learning, feature engineering
  • feature engineering: developing new data from existing data, e.g. is_weekday from sale_date
  • Data Engineer: implements an efficient engineering process to implement. Develops data pipelines for
  • by Data Scientists
  • Data Ingestion/ETL/ELT
  • ETL Developer: Focused on developing transformation jobs using ETL tool like Informatica
  • types of analysis
  • Analyst + BI:
    • Descriptive: What happened
    • Diagnostic: Why it happened
  • Data Scientist + ML:
    • Predictive: What will happen
    • Prescriptive: How can we make it happen