Tuning¶

A query step that is already executing by an AMP cannot be reassigned a diff priority.
UPDATE of USI or hard RI uses complex update plan, that is, rows are deleted and then re-inserted, whereas MERGE of USI uses single pass update.
MERGE doesn’t create spool if source has no filter and src/tgt joined on PI. INSERT-SELECT does spool. Use 1=0 when using MERGE while doing pure INSERT-SELECT
INS-SEL can be faster than MERGE when there is no JI that includes base ROWID
GROUP BY: ARSA: Agg(Local)/Redist/Sort/Agg(Global). Global Agg is optional
Agg cache is used during local Agg. Of that 10% is cache overflow
overflow rows are written to spool, sorted and redist 1 row at a time.
90% cache rows redist using buffered row redist
DISTINCT: Redistribute/Sort (remove dups). Preferable if nearly unique or wide rows
NOT EXISTS > LEFT JOIN/IS NULL > NOT IN
Joins look at unique values and not detailed histogram values
View spooling (materialized) (source: Stephen Molini - Designing Complex Views)
Occurs when view contains, DISTINCT, OLAP, TOP n, SAMPLE
Occurs when view contains WHERE or JOIN and the query has TOP or SAMPLE
Occurs when view contains WITH RECURSIVE in view definition
Detected by noticing "Spool 1" in mid-way of EXPLAIN output
Set Branch Elimination optimization, eliminates tables from UNION ALL view if the WHERE clause has a predicate that compares a numeric column. In addition, if the view is joined to dimension tables, all tables that are unioned must have RI
ROT: Wide tables will cause high disk activity due to overflow in Agg Cache. Use DISTINCT
Rows Per Value (RPV, Cardinality) for non-existent values is affected by if extrapolation is enable (defaults to Yes, using Object Usage Count). Disabling (ExtrapolationForEQPredicate set to 1) will fix RPV for non-existent values
Use SET QUERY_BAND = 'DynamicPlan={OFF|SYSTEM};' FOR SESSION to disable|enable IPE

Join Index Strategies¶

Multi-table JI: Outer joins are better, can cover both inner and outer joins.
Field 1 must have all join-cols from outer table and at least 1 not null col from each inner table
Global JI: Generally single table and include ROWID, UPI or (NUPI +USI/ROWID)
Most frequent trade-off is creating NUSI
GJI isn’t used if col is in compressed list and used for outer join
With light load NUSI > GJI, because it’s 1 step.
Hashed NUSI: Unique enough columns (rows/value < 50% AMPs) can use GJI to convert All AMP op to Group AMP and table lock to row hash locks.
Join-back: Include base table PI, to allow join-back. NUSI may be better
Join-back to USI: Include base table USI, join back using nested-loop join to base table
AJI: COUNT(*) is automatically included, not compressible, no outer joins

Locking¶

levels: Rowhash, Table, Database
ACCESS lock will wait, if there is an exclusive lock in the lock queue ahead of it
CHECKSUM lock same as ACCESS lock, except timestamp is written to spool to check during cursor update if another user modified the row being written
if update includes PI columns, it locks table even if using PI in WHERE clause
DROP TABLE access required to issue exclusive lock
LOCKING … NOWAIT rollbacks entire transaction if requested locks can’t be obtained
global deadlock detection runs on PE (default 4 min) local runs on each AMP (fixed 30 sec)
locking logger segment is circular and may lose deadlock events.
An MSR taking row-locks on diff AMPs can deadlock with table-level locks
end transaction step in query plan becomes necessary to release lock and TJ
unless there is a locking modifier, PE is unaware of rowhash locks, since AMP takes them
LOCK ROW FOR ACCESS is taken at AMP level, requiring no PE intervention to take table level lock and releasing it as separate steps. Hence it’s preferable
Except access locks, table level requests require pseudo locks taken in tableid order
Exclusive locks are generally used by DDL type operations
HUT locks are placed by arc utility
are associated with the user ID rather than transaction or job
Only participating AMPs are locked
levels: read for dump, write for Rollback/forward, exclusive for restore.
Must be released explicitly
RELEASE LOCK can be run via BTEQ or ARC
TD Filesystem has internal locking mechanism to maintain data-block integrity during concurrent modifications. Few DB/AMP => less concurrency
ROT: Prefer MSR instead of transaction, to prevent table-level deadlock
ROT: Change DeadLockTimeOut to 30 sec if mixed workload
ROT: Table level locks => all-AMP steps => high cost. Prefer group AMP if possible
If there is no blocking host or session information, it may be a HUT lock

Statistics¶

types: 1) collected 2) RAS 3) derived (includes inherited (JI), CHECK and RI)
collect stats: # of distinct values, avg AMP rows / value, # of nulls, #of All NULLS, value range & freq, data skew. # of partial nulls = # of null - # of all nulls.
Index stats are not dropped if index is dropped
Four versions of statistics data. V4 is for >= TD12.03
200 detail intervals, interval 0 contains summary information
upto 2 high biased intervals, each can store 2 values
high biased values are > 0.25% of total rows.
Multi-col stats
Values are limited to first 16 bytes. short column should be first in multi-col stats.
Stats column order is determined by the table column order not collect stats cmd
Pre TD12, if one col was null, the entire multi-col group was considered as NULL. TD12+, partial NULLs are counted separately from all NULL values
Interval 0 has distinct values calculated for multi-col as:
distinct non-null values + 1 if all nulls + all partial null counts
# partial nulls are treated as distinct values even if there are duplicates.
VARCHAR columns are expanded fully for stats
Single col stats are in dbc.TVFields and multi-col stats are in dbc.Indexes
ROT: 10% change -> recollect.
ROT: multi-col stats are not very useful for ranges
ROT: collect stats on PARTITION. Else # rows/partition = Tot rows / tot partitions
Cannot use SAMPLE on partitioning column
SAMPLE is not truly random but works by scanning first few rows, can be a problem if column is correlated to partitionon column
Cannot use SUMMARY stats on volatile tables
Retrieve Confidence: High > Low > No
Only pred H AND H will yield H conf, if both are indexed. Low otherwise
Join confidence: index join > High > low > no
Index join: There is a unique constraint (RI)
Must have stats on both, PI index and the WHERE condition columns to have high confidence.
Single column statistics with multi column condition leads to low conf.
TD14+: Unlike dictionary cache, dedicated statistics cache isn’t purged every 4 hours
TD14+: Stats aren’t dropped when the indexes are dropped
Dynamic Amp Sample (DAS) stats are stored in table header and refreshed every 4 hours.
collects 1) est table row count 2) for each NUSI distinct values
Only good for equality conditions
NUSI distinct values = # of "rows in NUSI". For highly non-unique NUSI, the distinct values may take up >1 rows, thus overestimation of distinct values
NUPI distinct values are derived using heuristics (75%) rather than DAS
DAS always samples from the same AMP based on hash of table ID
RandomAmpSampling: D:1, L:2, M:5, N:Node, A:All amps
Always All AMP for volatile tables, single table sparse JI and single table AJI
DAS: Don’t collect stats on NUSI unless used in joins or have uneven distribution of values
DAS: Don’t collect stats on USI/UPI unless used in non-equality (range) predicates
extrapolation
When stats are missing, DAS are used for indexed columns only and only for equality
Extrapolation:
With either PI, PARTITION or SUMMARY stats, a copy of RAS is saved. The saved RAS is compared with current RAS for stats extra-polation.
Used only for DATE and TIMESTAMP columns and if >= 95% unique
isn’t used for small tables or (unless all-amp sampling is on) skewed PI tables
Recollection is not helpful for: shrinkage, small tables or skewed data
PPI stats: collect on PARTITION and partitioning columns.
Collect on (PARTITION, PI) to help sliding window, rowkey joins and DPE
(PARTITION, PI, part-cols) to help rowkey join costing
NUPI is assumed to be 75% unique in absence of any stats
Prefer collecting stats on base table rather than STJI (except sparse SJTI)
Derived stats based on CHECK and RI are max distinct values => assumes uniform dist
Inherited stats: Base table stats override derived
base table to non-sparse JI and hash indexes
single-table sparse JI to base table
base table to single-table sparse JI
Agg JI to base table
non-sparse JI or HI to base table (but does not extend to other JI or HI)
On a large table on columns with a very few cardinality, it's better to collect SAMPLED STATS first, so the subsequent full stats knows not to use ARSA
ROT: For multi-col stats, put filtering columns ahead of join columns (TD14 OB)
Default selectivity:
10% for equality for 1st col, 7.5% for 2 columns
20% for closed ranges (WHERE C1 BETWEEN v1 and v2)
TD14.10: Orphaned stats created by dropping UDF are invalidated only when they are used (eg SHOW STATS) next time, not at the time UDF is dropped.
OUC is not tracked for DBC tables since they use Express Requests (internal API)
TD14.10: ROT (OB)
Use SYSTEM SAMPLE globally.
Use SYSTEM SAMPLE with SYSTEM THRESHOLD to get both optimizations
Use statement level sampling only for special cases

Explain¶

Nested loop join: Joins two indexes. At least one must be unique.
directed lookups using each row on left side
via PI or USI: Issue directed lookup for each row on left side table
via NUSI: dup left side and then issue directed lookup for each row on left-side
only join that does not always require all AMPs
Hash join: Based on hash, but unlike merge joins, not both tables sorted by hash.
Smaller table in memory, then binary searched for each row of larger table
smaller table is always sorted, larger table may not be. May avoid redist of larger table.
Classic/Hybrid: if small table too large to fit into 1 partition, rows from both tables are partitioned and joined partitioned at a time
Dynamic: Small/Left table must fit in 1 partition and is duplicated. Large isn’t hashed to spool on join col, but dynamically hashed. Used only on non-PI join
TD14+: Can be used for outer, semi-join (inclusion/exclusion with IN or NOT IN), predicate with one side using expression and DPE
Merge join: rows on both side must be in the same hash order. may require pre-sort
Not good perf if high data skew - leads to hot amp
"fast path" merge joins access rows back and forth between joined table; “slow path” involves access path on left table and always goes from left table to right table. It’s not actually slow, just the naming convention
sliding window merge join requires, reading one data block of NPPI and one data block of all eligible partitions in memory for joining. If not enough memory is available to load all partition’s datablocks, NPPI table is read multiple times
Merge inclusion/exclusion join: Used for IN or EXISTS conditions. Unlike Merge join (which links all outer rows to inner rows), merge inclusion join returns rows that have at least one inner row.
Product join: Each row compared against rows from other side
traditional: dup small table. Efficient than dynamic hash if small table < 10 rows
enhanced by DPE: dup non-ppi. sort dup table by row-key. For efficiency, granularity of rowkey (of fact) must be 1-to-1 with small (dim table)
doesn’t require sort
STATFN: Sample 100 rows from each AMP and route to 1 AMP (if # rows > # AMPs)
Value dist: if distinct PARTITION keys < 5 * # AMPs, dist by range of key-values (PARTITION BY + ORDER BY composite)
Else, redistribute by PARTITION BY columns
Optimizer will choose All AMP operation over group-AMP op if it has to scan half of the AMPs. Generally that means number of rows for redist >= ½ AMPs
LT/ST (Star) joins: product-join all dimension values before joining to fact table.
In-list star join: only some dim are IN lists. Joined using LT/ST method
In-list access path: All dim tables are IN lists and joined to fact NUSI - joins materialized dim tables with NUSI and joins row-ids with LT
Partial group by: When partial aggregation can be performed before join.
Early Group By (SORT/GROUP in explain): grouping then join. GROUP BY and JOIN columns match and there is a unique index on join/group column
Partial SUM (in explain), when there is no unique index. Aggregate two times
Collect stats on both, join and group by columns
NO: if COUNT(*) is the only function, PPI, JI/HI, DISTINCT except INSERT-SELECT
DATE variable is treated as literal starting with TD12+, whereas it was USING variable earlier
Tricks:
Use bogus predicate to lower/increase cardinality estimates to favor e.g. dynamic hash join. e.g. AND TRIM(c1) not in (....)
Use expression or functions to bypass index or stats
When creating STJI, drop the corresponding NUSI if it existed before
PI is all-partition and 1 AMP op. NUSI is an all-AMP and 1 partition op. Define both to get optimal performance. Performance better than USI.
UNION maybe cheaper than using OR in WHERE, if it avoids product join
Histogram high biased intervals:
Non-Modal Values => -2 (or -1)
Mode Value => 1st loner value
Max Value => 2nd loner value
Mode Freq => # of 1st loner rows
Non-modal Rows => # of 2nd loner rows

Explain Verbs¶

Merge Join: Rowhash match scan
traversal of index => access through VOSI for range predicate
DELETE from PPI table, "of n partition" => optimized delete not if “from n partition”
DELETE must be the last stmt in transaction for optimization
"n/single partition of" => partition elimination
Partial SUM => PGB lacks a unique index and thus will require another group by
SORT/GROUP => PGB has unique index and group by/join columns align
all-rows scan => Product join or Merge Exclusion Join
"computed globally" => full ARSA, “computed locally” => only local aggregation
"END TRANSACTION" => Commit the request. TJ discarded, lock release.
EXPLAIN 1=1 is shown for CROSS JOIN with no WHERE clause
"lock DBC.AccessRights" => DDL statement is being executed
Aggregates normally have two steps, aggregating and formatting
if one or more rows inserted into spool … go to step … => recursive query
covering CP merge Merge several CP, no additional base table access needed.
from n partition .. by way of index #n partition elimination by scanning NUSI
redistributed … to few AMPs Few AMPs indicates light-weight redistribution

Auto-stats¶

Criteria for ranking recommendations
Cardinality accuracy: # of times actual result-set was significantly different from estimated
Query cost: Average cost of the queries where the stat is recommended
Freq: # of times the stat was recommended in a workload
Average importance: as assigned by optimizer