Data Governance
- Focuses on availability, security, usability, integrity*
- know, protect, share your data
- Data Governance determines the focus of data quality improvements based on
- business value
- data stewards provide business understanding of assigned data domains
- Management is the decision you make... Governance is the structure for making them
- Several Type of Governance: BI, IT, Data, DW
- Data Governance: Getting the data right (Info supply)
- BI Governance: Using the data right (Info demand)
- Master Data Management: Single source for master data that is standardized and conformed
- MDM collation and distribution strategies:
- Data consolidation: capture data from various data sources and maintain single hub (ODS)
- Data federation: provide single virtual view of distributed master data
- Data propagation: copying data between different systems
Data Governance Types
- Data Quality:
- validity: out-of-range, empty data (technical DQ)
- accuracy: look-up values do not exist (business DQ)
- completeness: missing values (eg no zip code, data enrichment needed)
- consistency: duplicate data, concurrency issues
- uniformity: different units of measurements
- Data Security and Privacy
- Auditing & Monitoring
- Vulnerability Management (OS/Database Hardening)
- Database Security (Authentication, Access rights, Encryption)
- Metadata
- Data Lineage
- Business and transformation rules
- Source system
- Data freshness
- 3 types of Metadata
- Business: deals with the data contents; e.g. calculations
- Technical: deals with data objects; database catalog, RI etc
- Operational: deals with data movement
- when, from where and how data was brought over
- Load/Create/Update Date, Session data
People
- Data Owners: Accountable and responsible for data generated and consumed.
- Data Stewards: Responsible for data quality on day-to-day basis
- Usually SME within their data domain
- ensures/maintains data {quality, definition, metadata, documents source}
- Data Custodians: implement physical environment (DBAs, ETL developers etc)
- ensures authorized data access, data {integrity, auditability, safeguard}
Personas
- BI BA DA
- Business Intelligence: process of collecting, storing and analyzing data from business operations. Prescriptive Analysis answering What and How
- Business Analytics: practice of using company's data to anticipate trends and outcomes. Predictive Analysis answering Why questions
- consists of: statistical analysis, predictive modeling, data mining
- goals: determining KPIs, surfacing insights,
- Data Analytics: technical process of mining and data, building systems to manage data. Somewhat overlaps with BA.
- BA uses data with the goal of improving business process, whereas DA analyze and gather data for the business
- data is means to an end for BA, but data is an end for DA
- Data Scientist: devises scientific algorithms, such as machine learning to create a model from which business analysts can make prediction
- tools: machine learning, feature engineering
- feature engineering: developing new data from existing data, e.g.
is_weekday from sale_date
- Data Engineer: implements an efficient engineering process to implement. Develops data pipelines for
- by Data Scientists
- Data Ingestion/ETL/ELT
- ETL Developer: Focused on developing transformation jobs using ETL tool like Informatica
- types of analysis
- Analyst + BI:
- Descriptive: What happened
- Diagnostic: Why it happened
- Data Scientist + ML:
- Predictive: What will happen
- Prescriptive: How can we make it happen