Skip to content

Teradata Analytic Ecosystem

  • Data movement: Ingest, prepare and consume
  • Governance and security
  • Metadata: operational, technical and business
  • Sources: Business, Human, Machine, External (Social)
  • Reference Information Architecture: (Acquisition, Integration, Access)
  • Consumers: General, Analysts, Statisticians, Autonomous Applications, Data Scientists
  • Analytical methods: Reporting, Visualization, Statistical Analysis, Data Mining, Simulation, Optimization, NLP, Machine Learning

Data Movement

  • Acquisition
    • typical sources: OLTP, Enterprise applications, IoT, web/log
    • Landing: Raw, unprocessed, lowest granularity
    • Staging: Data that is ready to be ingested and processed. Usually in-database
    • Standardization: Consumable format
      • light standardization such as gender codes, medical codes etc
      • physical layout optimization, such as partitioning, indexing etc
  • Integration
    • Common keys: standardize common keys to connect various subject areas
    • Derived values: derive and automate KPI
    • Common summaries: standard aggregation (not just for performance)
  • Access
    • Optimized structures: partitioning, indexing for optimizing resource and query speed
    • shared views and services: for easier navigation, provide metadata
  • Data pipeline: process to prepare raw data for consumption-friendly data. It includes ingestion, transformation, and aggregation

Data organization

  • Data Lake: long term, low cost, optionally light level of data integration
    • applicability: Acquisition layer, may replace parts of staging layer in traditional DW
  • Data Warehouse: Integrated data from one or more disparate sources, usually 3NF
    • applicability: Integration layer
  • MDM: consistent enterprise wide reference data
    • applicability: Standardization tier of acquisition layer, and integration layer
  • Data projections: Purpose built data structure to solve specific business problems.
    • applicability: Access layer
    • Physical and/or Virtual Data Marts
    • DM can be Dependent (single source of truth) or Independent (multiple source of truth)
  • Metadata store: business, technical, operational

Data Zones

  • Ingestion Source:
    • Source systems, Enterprise Applications (e.g. SalesForce), Logs, IoT
    • landing zone: bulk storage, kafka topics
  • Operational zone:
    • Raw: raw data, no transformations
    • Conformed: Raw + de-duped + standardization
    • Modeled: Integrated, cleaned, modeled
    • Presentation (Semantic): Optimized for analytics and BI queries
    • Share (Snowflake): Sharing for monetization
  • Exploration zone: experimental before becoming part of operational, data discovery
    • quick to set up
    • limited life span