Teradata Analytic Ecosystem
- Data movement: Ingest, prepare and consume
- Governance and security
- Metadata: operational, technical and business
- Sources: Business, Human, Machine, External (Social)
- Reference Information Architecture: (Acquisition, Integration, Access)
- Consumers: General, Analysts, Statisticians, Autonomous Applications, Data Scientists
- Analytical methods: Reporting, Visualization, Statistical Analysis, Data Mining, Simulation, Optimization, NLP, Machine Learning
Data Movement
- Acquisition
- typical sources: OLTP, Enterprise applications, IoT, web/log
- Landing: Raw, unprocessed, lowest granularity
- Staging: Data that is ready to be ingested and processed. Usually in-database
- Standardization: Consumable format
- light standardization such as gender codes, medical codes etc
- physical layout optimization, such as partitioning, indexing etc
- Integration
- Common keys: standardize common keys to connect various subject areas
- Derived values: derive and automate KPI
- Common summaries: standard aggregation (not just for performance)
- Access
- Optimized structures: partitioning, indexing for optimizing resource and query speed
- shared views and services: for easier navigation, provide metadata
- Data pipeline: process to prepare raw data for consumption-friendly data. It includes ingestion, transformation, and aggregation
Data organization
- Data Lake: long term, low cost, optionally light level of data integration
- applicability: Acquisition layer, may replace parts of staging layer in traditional DW
- Data Warehouse: Integrated data from one or more disparate sources, usually 3NF
- applicability: Integration layer
- MDM: consistent enterprise wide reference data
- applicability: Standardization tier of acquisition layer, and integration layer
- Data projections: Purpose built data structure to solve specific business problems.
- applicability: Access layer
- Physical and/or Virtual Data Marts
- DM can be Dependent (single source of truth) or Independent (multiple source of truth)
- Metadata store: business, technical, operational
Data Zones
- Ingestion Source:
- Source systems, Enterprise Applications (e.g. SalesForce), Logs, IoT
- landing zone: bulk storage, kafka topics
- Operational zone:
- Raw: raw data, no transformations
- Conformed: Raw + de-duped + standardization
- Modeled: Integrated, cleaned, modeled
- Presentation (Semantic): Optimized for analytics and BI queries
- Share (Snowflake): Sharing for monetization
- Exploration zone: experimental before becoming part of operational, data discovery
- quick to set up
- limited life span