Data Movement¶
- ROT: Use ELT for complex transformation, use ETL for simple
- LSN: A unique Logon Seq # is associated with all session used by FLOAD/MLOAD.
- Fast path inserts: The inserted data aren’t TJed
- Table must be empty at the beginning of the transaction
- For MSR, All statements must be INSERT with same target table
- if Fast path insert uses same PI, then 127KB block copy is done instead of row-by-row
- Fast path Delete: When deletes involve only CI and MI without TJ.
- end transaction/commit must be part of the same request that has delete
- All rows must be deleted.
- if part of MSR, must be implicit transaction and last statement of MSR
- No JI/HI or RI allowed
- Max # of utilities: 30 <= FLOAD+MLOAD; 60 <= FLOAD+MLOAD+FEXP
- NPARC doesn’t copy RAS, so recollect SUMMARY/PARTITION stats on new system.
- Retry: Server failure ie If TD restart occurs, the utilities are automatically retried.
- Restart: Client failure, ie if a failed utility job is restarted after a fix.
- ROT: Bulk data insert/delete into PPI tables benefit if data arrival pattern matches partitioning expr. All arriving rows will be focused on fewer data blocks leading to less I/O. It may lead to contention for TPUMP or BTEQ multisession inserts.
- ARC preserves MVC but not BLC, DSA preservs both MVC and BLC
- IWM doesn’t restrict FEXP (spool) or ARC for AWTs (usually 60%) like other utilities.
- FLOAD,MLOAD,FEXP and TPT utilities use 2 SQL sessions, control and auxiliary.
- Auxiliary SQL session is used for maintaining restart log.
- TD13.10+ with TTU 13.10+: All utility work (except auxiliary session) is considered one UOW and assigned to the same workload as initial SQL control session. Throttles are checked only at beginning of the UOW
BTEQ¶
.prefix and ending semicolon both are optional.SET SESSION (TRANSACTION/SQLFLAG)commands must be issued prior to logonSQLFLAGjust warns doesn’t disable teradata extensions.SHOW CONTROLshows current settings.RUNin nested script, replaces the script. It won’t return to the calling script- Import/Export types: DATA(Record), INDICDATA, REPORT(Field), VARTEXT(Import), DIF(Export)
BTEQretries request, never a transaction.- ROT: FTS => use single session, fewer than All-AMP: use > 1 session
- Field mode (
EXPORT REPORT): formatted data, Record mode (EXPORT DATA): binary data
TPT¶
MaxSessionare shared by all op instances. but only 1 session to an AMP by any instance- Producers load balance across all instances. Consumers use only needed instances
- Override with
-Coption to feed consumer instances in round-robin
- Override with
- Vars priority: cmd-line (
-u) > local var-file (-v) > global var-file > in-script vars - Consumer schema can be deferred by coding as
SCHEMA * - Except FILTER operator, which uses 2 schemas, all operators use 1 schema
- Multi-session utility/SQL operators: Load, Update, Export and Stream
- Multi-instance only operators: DataConnector, SQL Inserter
- INMOD/OUTMOD user written modules for modifying input/output data.
- For FLOAD INMODs are producer and can be used by any consumer operator
- for MLOAD INMODs are filter (uses VIA OPERATOR) or producer
- Access modules work with DATA CONNECTOR ops (AccessModuleName) to provide device independent IO capability to TPT (eg read tapes, CD, MQ, OLE DB etc)
- Multiple Data-streams exist when there are >1 APPLY TO ops and 1 APPLY FROM op.
- Data connector producer (read) operator can specify wildcard using DirectoryPath
- Optional
Vigil{Start,Stop}Timeprovide continuous scanning for files - LOB: Must use
SELECTORorINSERTEROps. Either inline or deferred (xLOB AS DEFERRED BY NAME) - Data conversion limited to converting value to/from NULL
- QUERY_BAND
UtilityDataSizecan be set toLARGE|SMALL
LOAD¶
- INSERT using LOAD operator supports wildcard
- LOAD op counts reports dup rows but not logged in error tables
- PauseAcq attr in LOAD op specifies if load should be paused after acquisition
- Run standalone TPT LOAD to reset previous failure
STREAM¶
- can load up to 128 tables
RATEis stmts/min.PERIODICITY: Frequency per min of sending stmts to DBMS.- RATE and PERIODICITY limits are per instance.
LATENCYis max time in seconds or minutes after which the buffer is flushedUPSERTusesSERIALIZE ONby defaultMARK EXTRA [UPDATE|DELETE] ROWS: if >1 rows affectedSERIALIZE ON=> same hash go to the same AMP => same order as arrivalPACKwithARRAY SUPPORT ONindicates # of records rather than # of statementsNOPItables are much faster to insert into.
EXPORT¶
EXPORT SELECTstatement cannot be 1 or 2 AMP operationEXPORTtakes checkpoint only after all data to pipe => restart with all or no read- ErrorLimit is per instance
- LOB and CLOB are supported only with SQL inserter, SQL selector and data connector
- Use Selector op instead of Export op if data needed in field (character) mode.
SpoolMode=NoSpoolwill fallback to spool mode. UseNoSpoolOnlyfor strict requirement
UPDATE¶
- Manages checkpoints using two files.
- 2 phases:
- Acquistion: Get locks, deblock, redist, sort, write to work table
- Application: Read target DB only once, and apply changes from work table
- DELETE task cannot delete from UPI with an equality condition
- DELETE task using UPDATE op will allow using PI if it’s not UPI equality
- DELETE task can be standalone or consumer type operator (reads at most 1 row)
- Unlike
load,mloadprotocol can invoke UDF
Commands¶
tbuild: Define and submit TPT jobs-llatency (stream op only)-ncontinue even if error-sstart at specific step-zcheckpoint interval (has precedence over TPT script)
tdload: Can load data from delimited file or another tabletdlog(same astlogview -j job -f "*") / tlogview-wfilter-flogname | "*" read specific or all public and private log
twcbcmd: Pause, Resume, Terminate, Checkpoint job. Get job status- For stream jobs: set statement rate, periodicity
twbertbl: Extract data from acquisition phase error tabletwbkill: Kills all tasks of a jobtwbrmcp: Remove checkpoint files to restart the job from the beginingtwbstat: show currently active jobs
Utilities Matrix¶
| Behavior | BTEQ | FLOAD | MLOAD | TPUMP |
|---|---|---|---|---|
| Lock | write | exclusive | acq: no DDL except drop apply: only sel access lock and drop |
write |
| Formats | Data IndicData Report Vartext DIF |
formatted(LIN) unformatted(I) binary(LI) text |
fastload(LN) binary(L) text(N) unformat vartext(N) |
|
| format spec | set record begin loading … indicators |
|||
| Indicators | .layout | |||
| Layout | using | define | .layout | |
| sessions | 2 SQL + n Load | |||
| AWTs | 3 Acq + 1 EndL | 2 Acq + 1 Appl | ||
| file spec | define … file= | |||
| table spec | insert | begin loading | .begin mload tables | |
| Data appl | insert | insert into | ||
| restart | del + resub | acq: resub script appl: sub beg load+end load |
||
| abort | drop tables | acq: release mload appl: drop tables |
| Utility | AWT |
|---|---|
| FLOAD | 3 Acq/1 Endl |
| MLOAD | 2 Acq/1 Apply |
| TPUMP | pack*session for base upd+any reqd for index up |
| FEXP | query+2 vert sort/2 horz sort+1 export phase |
| ARC | 1/AMP |
Hadoop¶
- TDCH supports 3 ways of loading data from Hadoop in to Teradata.
internal.fastloaduses one fastload job, one session per mappermultiple.fastloaduses multiple fastload jobs, one per mapper, multiple sessions for each mapper. Each fastload job loads to separate tables, which are automatically consolidated into a single table usinginsert... select..at the end.batch.insertuses JDBC SQL interface
- A sample
sqoopcommand that uses multiple fastload jobs (one for each mapper)
$SQOOP1/bin/sqoop export -libjars $XLIB_JVJARS \
-Dteradata.db.output.method=multiple.fastload \
-Dteradata.db.output.job.type=hdfs \
-Dteradata.db.output.file.format=textfile \
--connect jdbc:teradata://TDPROD/database=sqoop_db \
--connection-manager org.apache.sqoop.teradata.TeradataConnManager \
--username SqoopTest \
--password xxxxxxxx \
--table sqoop_db.sqoop_sly_t099a \
--export-dir /tmp/t011H4 \
--num-mappers 5