Skip to content

Data Movement

  • ROT: Use ELT for complex transformation, use ETL for simple
  • LSN: A unique Logon Seq # is associated with all session used by FLOAD/MLOAD.
  • Fast path inserts: The inserted data aren’t TJed
    1. Table must be empty at the beginning of the transaction
    2. For MSR, All statements must be INSERT with same target table
  • if Fast path insert uses same PI, then 127KB block copy is done instead of row-by-row
  • Fast path Delete: When deletes involve only CI and MI without TJ.
    1. end transaction/commit must be part of the same request that has delete
    2. All rows must be deleted.
    3. if part of MSR, must be implicit transaction and last statement of MSR
    4. No JI/HI or RI allowed
  • Max # of utilities: 30 <= FLOAD+MLOAD; 60 <= FLOAD+MLOAD+FEXP
  • NPARC doesn’t copy RAS, so recollect SUMMARY/PARTITION stats on new system.
  • Retry: Server failure ie If TD restart occurs, the utilities are automatically retried.
  • Restart: Client failure, ie if a failed utility job is restarted after a fix.
  • ROT: Bulk data insert/delete into PPI tables benefit if data arrival pattern matches partitioning expr. All arriving rows will be focused on fewer data blocks leading to less I/O. It may lead to contention for TPUMP or BTEQ multisession inserts.
  • ARC preserves MVC but not BLC, DSA preservs both MVC and BLC
  • IWM doesn’t restrict FEXP (spool) or ARC for AWTs (usually 60%) like other utilities.
  • FLOAD,MLOAD,FEXP and TPT utilities use 2 SQL sessions, control and auxiliary.
    1. Auxiliary SQL session is used for maintaining restart log.
    2. TD13.10+ with TTU 13.10+: All utility work (except auxiliary session) is considered one UOW and assigned to the same workload as initial SQL control session. Throttles are checked only at beginning of the UOW

BTEQ

  • . prefix and ending semicolon both are optional
  • .SET SESSION (TRANSACTION/SQLFLAG) commands must be issued prior to logon
  • SQLFLAG just warns doesn’t disable teradata extensions
  • .SHOW CONTROL shows current settings
  • .RUN in nested script, replaces the script. It won’t return to the calling script
  • Import/Export types: DATA(Record), INDICDATA, REPORT(Field), VARTEXT(Import), DIF(Export)
  • BTEQ retries request, never a transaction.
  • ROT: FTS => use single session, fewer than All-AMP: use > 1 session
  • Field mode (EXPORT REPORT): formatted data, Record mode (EXPORT DATA): binary data

TPT

  • MaxSession are shared by all op instances. but only 1 session to an AMP by any instance
  • Producers load balance across all instances. Consumers use only needed instances
    • Override with -C option to feed consumer instances in round-robin
  • Vars priority: cmd-line (-u) > local var-file (-v) > global var-file > in-script vars
  • Consumer schema can be deferred by coding as SCHEMA *
  • Except FILTER operator, which uses 2 schemas, all operators use 1 schema
  • Multi-session utility/SQL operators: Load, Update, Export and Stream
  • Multi-instance only operators: DataConnector, SQL Inserter
  • INMOD/OUTMOD user written modules for modifying input/output data.
    1. For FLOAD INMODs are producer and can be used by any consumer operator
    2. for MLOAD INMODs are filter (uses VIA OPERATOR) or producer
  • Access modules work with DATA CONNECTOR ops (AccessModuleName) to provide device independent IO capability to TPT (eg read tapes, CD, MQ, OLE DB etc)
  • Multiple Data-streams exist when there are >1 APPLY TO ops and 1 APPLY FROM op.
  • Data connector producer (read) operator can specify wildcard using DirectoryPath
  • Optional Vigil{Start,Stop}Time provide continuous scanning for files
  • LOB: Must use SELECTOR or INSERTER Ops. Either inline or deferred (xLOB AS DEFERRED BY NAME)
  • Data conversion limited to converting value to/from NULL
  • QUERY_BAND UtilityDataSize can be set to LARGE|SMALL

LOAD

  • INSERT using LOAD operator supports wildcard
  • LOAD op counts reports dup rows but not logged in error tables
  • PauseAcq attr in LOAD op specifies if load should be paused after acquisition
  • Run standalone TPT LOAD to reset previous failure

STREAM

  • can load up to 128 tables
  • RATE is stmts/min. PERIODICITY: Frequency per min of sending stmts to DBMS.
    1. RATE and PERIODICITY limits are per instance.
  • LATENCY is max time in seconds or minutes after which the buffer is flushed
  • UPSERT uses SERIALIZE ON by default
  • MARK EXTRA [UPDATE|DELETE] ROWS: if >1 rows affected
  • SERIALIZE ON => same hash go to the same AMP => same order as arrival
  • PACK with ARRAY SUPPORT ON indicates # of records rather than # of statements
  • NOPI tables are much faster to insert into.

EXPORT

  • EXPORT SELECT statement cannot be 1 or 2 AMP operation
  • EXPORT takes checkpoint only after all data to pipe => restart with all or no read
  • ErrorLimit is per instance
  • LOB and CLOB are supported only with SQL inserter, SQL selector and data connector
  • Use Selector op instead of Export op if data needed in field (character) mode.
  • SpoolMode=NoSpool will fallback to spool mode. Use NoSpoolOnly for strict requirement

UPDATE

  • Manages checkpoints using two files.
  • 2 phases:
    1. Acquistion: Get locks, deblock, redist, sort, write to work table
    2. Application: Read target DB only once, and apply changes from work table
  • DELETE task cannot delete from UPI with an equality condition
  • DELETE task using UPDATE op will allow using PI if it’s not UPI equality
  • DELETE task can be standalone or consumer type operator (reads at most 1 row)
  • Unlike load, mload protocol can invoke UDF

Commands

  • tbuild: Define and submit TPT jobs
    • -l latency (stream op only)
    • -n continue even if error
    • -s start at specific step
    • -z checkpoint interval (has precedence over TPT script)
  • tdload: Can load data from delimited file or another table
  • tdlog (same as tlogview -j job -f "*") / tlogview
    • -w filter
    • -f logname | "*" read specific or all public and private log
  • twcbcmd: Pause, Resume, Terminate, Checkpoint job. Get job status
    • For stream jobs: set statement rate, periodicity
  • twbertbl: Extract data from acquisition phase error table
  • twbkill: Kills all tasks of a job
  • twbrmcp: Remove checkpoint files to restart the job from the begining
  • twbstat: show currently active jobs

Utilities Matrix

Behavior BTEQ FLOAD MLOAD TPUMP
Lock write exclusive acq: no DDL except drop
apply: only sel access lock and drop
write
Formats Data
IndicData
Report
Vartext
DIF
formatted(LIN)
unformatted(I)
binary(LI)
text
fastload(LN)
binary(L)
text(N)
unformat
vartext(N)
format spec set record
begin loading … indicators
Indicators .layout
Layout using define .layout
sessions 2 SQL + n Load
AWTs 3 Acq + 1 EndL 2 Acq + 1 Appl
file spec define … file=
table spec insert begin loading .begin mload tables
Data appl insert insert into
restart del + resub acq: resub script
appl: sub beg
load+end load
abort drop tables acq: release mload
appl: drop tables
Utility AWT
FLOAD 3 Acq/1 Endl
MLOAD 2 Acq/1 Apply
TPUMP pack*session for base upd+any reqd for index up
FEXP query+2 vert sort/2 horz sort+1 export phase
ARC 1/AMP

Hadoop

  • TDCH supports 3 ways of loading data from Hadoop in to Teradata.
    • internal.fastload uses one fastload job, one session per mapper
    • multiple.fastload uses multiple fastload jobs, one per mapper, multiple sessions for each mapper. Each fastload job loads to separate tables, which are automatically consolidated into a single table using insert... select.. at the end.
    • batch.insert uses JDBC SQL interface
  • A sample sqoop command that uses multiple fastload jobs (one for each mapper)
$SQOOP1/bin/sqoop export -libjars $XLIB_JVJARS \
    -Dteradata.db.output.method=multiple.fastload \
    -Dteradata.db.output.job.type=hdfs \
    -Dteradata.db.output.file.format=textfile \
    --connect jdbc:teradata://TDPROD/database=sqoop_db \
    --connection-manager org.apache.sqoop.teradata.TeradataConnManager \
    --username SqoopTest \
    --password xxxxxxxx  \
    --table sqoop_db.sqoop_sly_t099a \
    --export-dir /tmp/t011H4 \
    --num-mappers 5