Misc¶

Major components: PE, MPL, AMP, VDISK
hash of DECIMAL is same as INT if it has no decimal places
hash of uppercase and lowercase letter is same
TPA Node memory:
After allocating memory of all VPROCs and a small buffer, remaining memory is allocated to FSG Cache as % (90% for 66xx systems)
Full Cylinder reads: more efficient IO than data block IO. carved out of FSG Cache.
Network attached clients use CLI/ODBC -> MTDP (Micro TDP)-> MOSI (OS dependent libraries).
MTDP connects to Gateway that manages sessions
Gateway runs on every node. Consists of two tasks
- Assign (ycgastsk): one per system, handles initial connection and hands off to connect task.
- Connect (ycgcntsk): one per node. communicates with the PEs on that node.
Gateway can distribute session request to PE that is on a diff node
Multiple LANs can be assigned different host id, which essential is a Gateway
Unlike MP-RAS, gateway on Linux are VPROCs and there can be >1/node
Gateway receive parcels from clients and can forward it to session PE, which can be on a different node
TD Systems
One CMIC per cabinet that communicates via SWS (AWS)
Appliances don’t have HSN
Appliances generally emulate BYNET over ethernet
VMS combines CMIC+SWS+Viewpoint functionalities
AMP cluster is group of, usually, 2 AMPs, one primary and one fallback.
ROWID(SQL reserved word) = PART#+ROWHASH+UNIQNESS
TD background tasks: deadlock detection, AutoCylPack,Cylinder defrag, TJ purging, FSG cache flushing
TD12+: QueryID in DBQL is longer and unique => no need for ProcID in joins
User can have multiple account strings, whereas database can only have one.
Rowkey = Partition # + Rowhash
All DB space is per AMP basis. Skew will artificially cause out-of-space scenario
Spool/Temp space
cannot exceed that of the owner.
limit is per user including all sessions
limit is taken from Profile > explicit > parent
USI changes are logged in TJ, whereas NUSI changes are not
Nodes: TPA, HSN (Host standby), NONTPA (Application nodes, ECN), TMS. Except TMS, all nodes are connected to BYNET. All are managed by SWS
ECN: Extended Communication Node hosts hardware for connecting to Mainframe
Clique: Group of nodes connected to the same storage system.
VDISK: one per AMP, consists of multiple PDISKs, each PDISK is full LUN or partition
LUN maybe multiple disk array ranks.
Disk Array Rank is set of disks with same SCSI ID, but on different SCSI Buses
SI have fallback if base table has fallback
Deleted uniqueness values aren’t reused unless they are the highest values.
Databases have only one account string, whereas Users can have multiple
BYNET: Supports point-to-point, multicast and broadcast messaging
TD and TCP/IP network see only one logical BYNET.
communication using channels, which use signaling rather than messaging
A data block cannot have more free space that >= one sector
SP and UDF are always FALLBACK protected
CLI (as opposed to ODBC) clients use COP entries. Generally there are as many COP entries as there are nodes.
Channel attached clients connect using a separately running TDP.
Teradata Director Program manages, balances, queues sessions. IO from/to PEs. May enforce security.
FLOAD: Apply phase time increases log base 8 times volume.
PUT to deliver software updates via AWS
EDW systems: 6650, 6690, Appliances: 2650, 2690
In a clique, one node is a control node that monitors health of other nodes
// in AS or TITLE clause indicates new line
There is no YEAR/DAY interval.
subqueries implicitly include DISTINCT clause
Analytical Architecture Cube dimensions:
Context: Business, Information, Application, System
Def Level: Reference, Logical, Solution, Deployment
From..To: Current, Transition, Target
RIA: Corresponds to Reference, Information, Target from the AA cube
When TASM is not enabled, priority is determined by Acct string and is for the session
When TASM is enabled, WD classifications may cause each query to have different priority
Down AMP recovery journal records changes to AMP while it is down.
All ON clause in MERGE DML must use PI equality.
DBC might have higher CurrPerm than MaxPerm, because TransientJournal table even though reported in DBC, has space residing in WAL
TD12 DB Design: TJ space comes from DBC PERM. Must have enough PERM to make sure TJ always gets space
Free cylinder list is not maintained in Master Index, but in Filesys Info Block (Cyl 0)
Oversized row is one that requires one data block to itself
Changes to Cylinder Index are not written to TJ, instead an alternate CI is maintained for data integrity.
FASTLOAD
Default load sessions = # of AMPs rounded up to multiples of 16
TPT
SCHEMA * (deferred schema) can only be used with consumer operators
DDL op does not support SQL statements that return data nor the USING clause
private log is specific to an op. If specified, it’s used in place of the public log
Full Cylinder Read: TD<13.10 had reserved (typically 6 to 8) slots per AMP
AMP may use much more memory (230M) than initial (80M) for operations such as redistribution (180M) that comes from free space
PDE Manages FSG Cache and TD Intelligent Memory(VH Cache)
Free memory (managed by OS): Used by OS and VPROCs AMP, GTW, RSG (Relay Services Gateway used for replication), PDE, PE and TVS (Teradata Virtualize Storage)
Multi-node Single clique system may have more than one TVS vproc per node.
Volatile tables must have current username as qualifier if explicitly specified
RSS
Two components: PDE and Resource Sampling Subsystem.
Gather/Live Buffer -> Work Buffer -> Log Buffer -> ResUsage Tables
Gather/Live Buffer -> Collect Buffer -> ViewPoint
PM/API provides interfaces to collect session and resource level data.
- Some of the Session level data is cumulative.
- Resource level data is not cumulative, but overwritten at each interval
Legacy Archive methods: methods a, b, c are legacy
All AMP: Single job, single stream, sessions >= 1
Parallel All AMP: Multiple jobs, each with single stream, sessions >=1 1/job
- Each job backs up subset of objects.
- Need multiple user-ids for restore
Cluster: Backup AMP cluster. Single cluster job, multiple streams. Each stream has different Teradata DB job ID. Legacy, only supported by NetVault
- Need multiple user-ids for restore
- AMP configuration change is problem.
Group read: allows write locks, requires after image journal, restore point is archive end, uses HUT and rolling read locks. No DDLs are allowed.
- Archive: enable PJ, archive, checkpoint with save, archive journal, delete saved journal, disable PJ
- restore: restore data tables, restore journal tables, rollforward, disable PJ
- can backup partitions
- DBC can only be restored to an initialized TD system (after SYSINIT)
On restore if config has changed, AMP redistributes rows in BUILD phase (TD12)
Value ordered NUSI counts as two indexes
Spool defined for DB is only used for inheriting value for any owned USERs
ALTER TABLE DATABLOCKSIZE IMMEDIATE causes reblocking
Unnamed constraints cannot be modified. must be dropped and re-added
SP: Updatable cursor only in ANSI mode and if no triggers
TD 14: if total partitions < 65535 then 2 byte else 8 byte partition number
TD13.10+: Partition expression can be char type data
MSR are like single transaction, except in TD13.10+ multiple INSERT, where only failing INSERT is rolled back (Statement Independence Feature)
SET SESSION ACCOUNT: to change a/c for request or session without logging off
RANDOM function:
cannot be used in both SELECT and WHERE
cannot be used with aggregate, OLAP functions, GROUP BY or ORDER BY
default time/timestamp precision is 6
Timestamp cannot be cast to a lower precision
WITH BY includes detail rows and the GROUP BY line aggregates
Non-ANSI QUANTILE and WIDTH_BUCKET use GROUP BY for partitioning
Non-window func like QUANTILE aren’t valid with std or wind aggregates
TOP n with unordered select, results in same rows being returned.??
TOP n PERCENT always rounds up (e.g. 1.2 rows is 2 rows etc).
TVS cold data if it was always in FSG Cache

SLES10 PSF & TASM¶

AWT: Must divide counts by collect interval on <TD13
PS evaluates priority on each node at node level.
A query can be running in different AG on different nodes
Relative weight is calculated in integers, fractions are truncated
Only active AG and RP participate in weight calc => any unused CPU from an AG is distributed to other AG in the same RP first
Placing CPU limit inhibits use of spare CPU. Can be at AG, RP, System level.
Rollback/Abort processing will not honor CPU limit
Console utilities and some internal DB tasks run in RP0 (default RP)
AG:
Age interval: last n (default 60) sec which are evaluated for resource calc (reaction time)
Active time: inactivity time in seconds, after which AG is considered inactive.
Two ways to distribute CPU. S: session and N:none (processes) (obsolete)
- ROT: simple or single AMP queries:S; complex, all-amp:N
An AG can be assigned to >1 PG, as long as all PG belong to same RP
A Workload can be mapped to an AG only if they both have the same enforcement priority (Tactical, Priority, Normal, Background)
Performance AG (#200) is reserved for system activities such as recovery, abort, deadlock detection, TVS, GTW and AWT management
Performance Periods: Link PG to AG.
Milestone types: T: time of day (day optional), S: session CPU, Q:Query CPU sec
For a given PG, all milestone types must be same
Rollback if with USER priority, do not honor CPU limits
Before the first request is parsed in a new session, PG is used assign priority
ROT:
use dedicated RP for tactical RP
make RP0 assigned wt = highest non-tactical RP’s weight
if Age interval is shortened, shorten Active interval by same degree

Legacy Utilities¶

Fast-Load¶

2 phases:
Acquisition: supports chkpt, receives blocks, unblocks, redistributes to work area
Application: internal chkpt, sorts data and writes to perm table. AMP local
Restart:
loading phase: resubmit (remove ddls) and restart from last checkpoint
end loading: not needed normally. else submit abbr job
2 SQL sessions + n load sessions
TENACITY: # hours to try. SLEEP: # of minutes to pause between retries
cmds: BEGIN LOADING, SET RECORD, DEFINE, INSERT, END LOADING
Formats: FORMATTED(LEN+IND+LF), UNFORMATTED(IND), BINARY(LEN+IND), TEXT(LF), VARTEXT ‘c’
err1: translation or constraint errors. err2: UPI violations for non-dup rows
opts before .logon:tenacity, sleep, sessions
config file: floadcfg.dat, must be in curr dir or FLOADLIB env var
unpacked-signed decimal load using FORMAT with PIC clause (#TF 34-23)
PUBLIC needs SIUD access on SYSADMIN.FASTLOG restart log table
NOPI tables: Are accessible with ACCESS lock. Dup rows are not discarded
BEGIN LOADING … NODROP option to manually drop error tables later. Must be dropped.
ROT: For NUPI tables, not sorting on PI columns improves performance

Support Environment¶

.LOGTABLE before .LOGON
sys variables: &SYSDATE/4, &SYSRC, &SYSTIME, &SYSUSER
cmds: .accept,.display,.if,.else,.endif,.route,.run,.set,.system

Multi-load¶

5 phases: prelim (log/work tables, lock), DML tran (parse DML), Acq (block/deblock data, redist, sort, chg lock to write), Appl (process DML), cleanup
doesn’t honor freespace when loading a non-empty table
Import task can perform delete, update, upsert with a WHERE clause
If WHERE used, import task must reference PI, but delete must not
upto 5 tables per import, only 1 table per delete task
NO: USI, Trigger, JI, RI
BEGIN MLOAD … TABLEWAIT: hours to wait for another MLOAD to release table
Read lock in acq, rw lock in apply. Utility Acq lock to prevent any DDL except DROP
Locks are released after each phase; must wait for new lock to be acquired
cmds: .begin import mload, .layout, .field, .dml, .import, .end mload
cmds: .begin delete mload, .end mload
sysvar: &SYS{aply,noaply,del,ins,upd,rcd,rjct,et,uv}CNT{1..5}
checkpoint < 60 is minutes (15 is default), >=60 is number of records
.DML LABEL lbl [MARK|IGNORE (DUP INS, DUP UPD, MISS UPD, MISS DEL) ROWS)]
.DML LABEL lbl DO INSERT FOR MISSING UPDATE ROWS
Restart if mload failed in apply phase:
use OVERRIDE parm if log table has been dropped
any corrupt tables, drop data/work/error tables. submit as new mload
use abbreviated script (just BEGIN MLOAD and END MLOAD) if no script
RELEASE MLOAD has no effect if in APPLY phase or DELETE task is in DML phase.
fallback tables are written after base tables in background to allow quicker access.
Staging table has fallback, if the target table is defined with fallback
Work tables are alwaysfallback protected, have same PI as target table
Eval order: cmd line > script > mloadcfg.dat/$MLOADLIB > defaults
ET_ for all application phase errors and when PI cannot be build in apply phase
UV_ for all apply phase errors when PI has been built successfully.
if there is PJ, MLOAD maintains private journal and is transferred to PJ at end
AMPCHECK: For fallback tables, fail when AMP is down. NONE|APPLY|ALL
CONTINUEIF > INDICATORS
MLOAD Performance:
Too many errors and error table has fallback. Fallback is written at SQL speed
Too many NUSIs. Each NUSI requires change row after is table loaded
Inefficient for highly non-unique (>100 rows/value) NUPI due to dup row checking unless NUPI helps locality of reference and table is multiset
Efficient when it affects at least one row per data block
PPI helps locality leading to improved performance
NUSI is spread-out, it hurts performance.
If target has PJ, MLOAD maintains private PJ then transfers at the end to true PJ

TPUMP¶

Allows throttling, uses macros
serializeon: data with same keys processed in the same session, same order as source
For tables with JI, serialization can still deadlock if base and JI have diff PI
Use JI PI if base table P\I is unique or almost unique
Use common set of columns as key between base table and JI
Use single session
arraysupport: send multiple sets of data using 1 dml stmt (INS/UPD/DEL/MACRO)
pack: send multiple statements in 1 request
PACKMAXIMUM determines max pack rate by trial & error. Do not use in prod
robust on: required for certain complex dml, which cannot be repeated on restart
max 4 import commands
cmds: .begin load, .layout, .field, .dml, .import, .end load
errortable options: APPEND, NODROP QUEUETABLE
DML LABEL same as MLOAD + IGNORE|MARK EXTRA UPD|DEL ROWS
Can use atomic upsert (UPDATE ELSE INSERT)
NO: if UPD/INS trigger could be fired or JI/HI could be updated
NO: UPD FROM, INS ... SEL, sub-query
TPUMP automatically converts to atomic when possible
PARTITIONed DML: segregates DML into diff session => cache reuse
smart USING: only packs used columns in DML (in case of multi-table)
Can specify EXEC MACRO INS|UPD|DEL|UPSERT as DML
Monitoring: SysAdmin.TPumpStatusTbl: Updated/checked by Tpump every min.
SysAdmin.TPumpUpdateSelect(M), TPumpMacro.UserUpdateSelect(M): Set new stmt rate
SysAdmin.TPumpActionRequest(M), TPumpMacro.UserActionRequest(M)
Chkpt,Pause,Resume,Terminate(chkpt taken),Abort(immediate)
High DBS wait-time (response from DBS) in log => more sessions may help
When UPSERT processing, high # of INSERT => lower throughput
ROT: Use TPUMP if % of rows updated is less than
NUSI:5%, USI:5%, STJI:2-5%, STAJI: 0.2-1%, MTJI/MTAJI:0.2%
ROT: For UPSERT processing, large # of INSERT will reduce performance
Overhead: 1 NUSI (50%), 2 NUSI (100%), 1% errors (40%), fallback (45%), SERIALIZE (30%), ROBUST (15%)
ROT: Use NOMONITOR, persistent macros, for short jobs
ROT: If high amount of data contain errors, use lower pack factor to reduce large rollbacks

FastExport¶

Two sorts, vertical & horizontal
Formats: FASTLOAD(len+eor), BINARY(len), UNFORMAT, TEXT(eor)
Modes: RECORD, INDICATOR
MLSCRIPT option generates MLOAD script to loads the exported data.