Teradata Analytic Ecosystem¶

Data movement: Ingest, prepare and consume
Governance and security
Metadata: operational, technical and business
Sources: Business, Human, Machine, External (Social)
Reference Information Architecture: (Acquisition, Integration, Access)
Consumers: General, Analysts, Statisticians, Autonomous Applications, Data Scientists
Analytical methods: Reporting, Visualization, Statistical Analysis, Data Mining, Simulation, Optimization, NLP, Machine Learning

Data Movement¶

Acquisition
typical sources: OLTP, Enterprise applications, IoT, web/log
Landing: Raw, unprocessed, lowest granularity
Staging: Data that is ready to be ingested and processed. Usually in-database
Standardization: Consumable format
- light standardization such as gender codes, medical codes etc
- physical layout optimization, such as partitioning, indexing etc
Integration
Common keys: standardize common keys to connect various subject areas
Derived values: derive and automate KPI
Common summaries: standard aggregation (not just for performance)
Access
Optimized structures: partitioning, indexing for optimizing resource and query speed
shared views and services: for easier navigation, provide metadata
Data pipeline: process to prepare raw data for consumption-friendly data. It includes ingestion, transformation, and aggregation

Data Lake: long term, low cost, optionally light level of data integration
applicability: Acquisition layer, may replace parts of staging layer in traditional DW
Data Warehouse: Integrated data from one or more disparate sources, usually 3NF
applicability: Integration layer
MDM: consistent enterprise wide reference data
applicability: Standardization tier of acquisition layer, and integration layer
Data projections: Purpose built data structure to solve specific business problems.
applicability: Access layer
Physical and/or Virtual Data Marts
DM can be Dependent (single source of truth) or Independent (multiple source of truth)
Metadata store: business, technical, operational

Ingestion Source:
Source systems, Enterprise Applications (e.g. SalesForce), Logs, IoT
landing zone: bulk storage, kafka topics
Operational zone:
Raw: raw data, no transformations
Conformed: Raw + de-duped + standardization
Modeled: Integrated, cleaned, modeled
Presentation (Semantic): Optimized for analytics and BI queries
Share (Snowflake): Sharing for monetization
Exploration zone: experimental before becoming part of operational, data discovery
quick to set up
limited life span