Skip to content

Tuning

  • A query step that is already executing by an AMP cannot be reassigned a diff priority.
  • UPDATE of USI or hard RI uses complex update plan, that is, rows are deleted and then re-inserted, whereas MERGE of USI uses single pass update.
  • MERGE doesn’t create spool if source has no filter and src/tgt joined on PI. INSERT-SELECT does spool. Use 1=0 when using MERGE while doing pure INSERT-SELECT
  • INS-SEL can be faster than MERGE when there is no JI that includes base ROWID
  • GROUP BY: ARSA: Agg(Local)/Redist/Sort/Agg(Global). Global Agg is optional
    • Agg cache is used during local Agg. Of that 10% is cache overflow
    • overflow rows are written to spool, sorted and redist 1 row at a time.
    • 90% cache rows redist using buffered row redist
  • DISTINCT: Redistribute/Sort (remove dups). Preferable if nearly unique or wide rows
  • NOT EXISTS > LEFT JOIN/IS NULL > NOT IN
  • Joins look at unique values and not detailed histogram values
  • View spooling (materialized) (source: Stephen Molini - Designing Complex Views)
    • Occurs when view contains, DISTINCT, OLAP, TOP n, SAMPLE
    • Occurs when view contains WHERE or JOIN and the query has TOP or SAMPLE
    • Occurs when view contains WITH RECURSIVE in view definition
    • Detected by noticing "Spool 1" in mid-way of EXPLAIN output
  • Set Branch Elimination optimization, eliminates tables from UNION ALL view if the WHERE clause has a predicate that compares a numeric column. In addition, if the view is joined to dimension tables, all tables that are unioned must have RI
  • ROT: Wide tables will cause high disk activity due to overflow in Agg Cache. Use DISTINCT
  • Rows Per Value (RPV, Cardinality) for non-existent values is affected by if extrapolation is enable (defaults to Yes, using Object Usage Count). Disabling (ExtrapolationForEQPredicate set to 1) will fix RPV for non-existent values
  • Use SET QUERY_BAND = 'DynamicPlan={OFF|SYSTEM};' FOR SESSION to disable|enable IPE

Join Index Strategies

  • Multi-table JI: Outer joins are better, can cover both inner and outer joins.
    • Field 1 must have all join-cols from outer table and at least 1 not null col from each inner table
  • Global JI: Generally single table and include ROWID, UPI or (NUPI +USI/ROWID)
    • Most frequent trade-off is creating NUSI
    • GJI isn’t used if col is in compressed list and used for outer join
    • With light load NUSI > GJI, because it’s 1 step.
    • Hashed NUSI: Unique enough columns (rows/value < 50% AMPs) can use GJI to convert All AMP op to Group AMP and table lock to row hash locks.
  • Join-back: Include base table PI, to allow join-back. NUSI may be better
  • Join-back to USI: Include base table USI, join back using nested-loop join to base table
  • AJI: COUNT(*) is automatically included, not compressible, no outer joins

Locking

  • levels: Rowhash, Table, Database
  • ACCESS lock will wait, if there is an exclusive lock in the lock queue ahead of it
  • CHECKSUM lock same as ACCESS lock, except timestamp is written to spool to check during cursor update if another user modified the row being written
  • if update includes PI columns, it locks table even if using PI in WHERE clause
  • DROP TABLE access required to issue exclusive lock
  • LOCKING … NOWAIT rollbacks entire transaction if requested locks can’t be obtained
  • global deadlock detection runs on PE (default 4 min) local runs on each AMP (fixed 30 sec)
    • locking logger segment is circular and may lose deadlock events.
  • An MSR taking row-locks on diff AMPs can deadlock with table-level locks
  • end transaction step in query plan becomes necessary to release lock and TJ
  • unless there is a locking modifier, PE is unaware of rowhash locks, since AMP takes them
  • LOCK ROW FOR ACCESS is taken at AMP level, requiring no PE intervention to take table level lock and releasing it as separate steps. Hence it’s preferable
  • Except access locks, table level requests require pseudo locks taken in tableid order
  • Exclusive locks are generally used by DDL type operations
  • HUT locks are placed by arc utility
    • are associated with the user ID rather than transaction or job
    • Only participating AMPs are locked
    • levels: read for dump, write for Rollback/forward, exclusive for restore.
    • Must be released explicitly
    • RELEASE LOCK can be run via BTEQ or ARC
  • TD Filesystem has internal locking mechanism to maintain data-block integrity during concurrent modifications. Few DB/AMP => less concurrency
  • ROT: Prefer MSR instead of transaction, to prevent table-level deadlock
  • ROT: Change DeadLockTimeOut to 30 sec if mixed workload
  • ROT: Table level locks => all-AMP steps => high cost. Prefer group AMP if possible
  • If there is no blocking host or session information, it may be a HUT lock

Statistics

  • types: 1) collected 2) RAS 3) derived (includes inherited (JI), CHECK and RI)
  • collect stats: # of distinct values, avg AMP rows / value, # of nulls, #of All NULLS, value range & freq, data skew. # of partial nulls = # of null - # of all nulls.
  • Index stats are not dropped if index is dropped
  • Four versions of statistics data. V4 is for >= TD12.03
  • 200 detail intervals, interval 0 contains summary information
    • upto 2 high biased intervals, each can store 2 values
    • high biased values are > 0.25% of total rows.
  • Multi-col stats
    • Values are limited to first 16 bytes. short column should be first in multi-col stats.
    • Stats column order is determined by the table column order not collect stats cmd
    • Pre TD12, if one col was null, the entire multi-col group was considered as NULL. TD12+, partial NULLs are counted separately from all NULL values
    • Interval 0 has distinct values calculated for multi-col as:
      distinct non-null values + 1 if all nulls + all partial null counts
    • # partial nulls are treated as distinct values even if there are duplicates.
  • VARCHAR columns are expanded fully for stats
  • Single col stats are in dbc.TVFields and multi-col stats are in dbc.Indexes
  • ROT: 10% change -> recollect.
  • ROT: multi-col stats are not very useful for ranges
  • ROT: collect stats on PARTITION. Else # rows/partition = Tot rows / tot partitions
  • Cannot use SAMPLE on partitioning column
    • SAMPLE is not truly random but works by scanning first few rows, can be a problem if column is correlated to partitionon column
  • Cannot use SUMMARY stats on volatile tables
  • Retrieve Confidence: High > Low > No
    • Only pred H AND H will yield H conf, if both are indexed. Low otherwise
  • Join confidence: index join > High > low > no
    • Index join: There is a unique constraint (RI)
    • Must have stats on both, PI index and the WHERE condition columns to have high confidence.
    • Single column statistics with multi column condition leads to low conf.
  • TD14+: Unlike dictionary cache, dedicated statistics cache isn’t purged every 4 hours
  • TD14+: Stats aren’t dropped when the indexes are dropped
  • Dynamic Amp Sample (DAS) stats are stored in table header and refreshed every 4 hours.
    • collects 1) est table row count 2) for each NUSI distinct values
    • Only good for equality conditions
    • NUSI distinct values = # of "rows in NUSI". For highly non-unique NUSI, the distinct values may take up >1 rows, thus overestimation of distinct values
    • NUPI distinct values are derived using heuristics (75%) rather than DAS
    • DAS always samples from the same AMP based on hash of table ID
    • RandomAmpSampling: D:1, L:2, M:5, N:Node, A:All amps
    • Always All AMP for volatile tables, single table sparse JI and single table AJI
    • DAS: Don’t collect stats on NUSI unless used in joins or have uneven distribution of values
    • DAS: Don’t collect stats on USI/UPI unless used in non-equality (range) predicates
    • extrapolation
    • When stats are missing, DAS are used for indexed columns only and only for equality
  • Extrapolation:
    • With either PI, PARTITION or SUMMARY stats, a copy of RAS is saved. The saved RAS is compared with current RAS for stats extra-polation.
    • Used only for DATE and TIMESTAMP columns and if >= 95% unique
    • isn’t used for small tables or (unless all-amp sampling is on) skewed PI tables
  • Recollection is not helpful for: shrinkage, small tables or skewed data
  • PPI stats: collect on PARTITION and partitioning columns.
    • Collect on (PARTITION, PI) to help sliding window, rowkey joins and DPE
    • (PARTITION, PI, part-cols) to help rowkey join costing
  • NUPI is assumed to be 75% unique in absence of any stats
  • Prefer collecting stats on base table rather than STJI (except sparse SJTI)
  • Derived stats based on CHECK and RI are max distinct values => assumes uniform dist
  • Inherited stats: Base table stats override derived
    • base table to non-sparse JI and hash indexes
    • single-table sparse JI to base table
    • base table to single-table sparse JI
    • Agg JI to base table
    • non-sparse JI or HI to base table (but does not extend to other JI or HI)
  • On a large table on columns with a very few cardinality, it's better to collect SAMPLED STATS first, so the subsequent full stats knows not to use ARSA
  • ROT: For multi-col stats, put filtering columns ahead of join columns (TD14 OB)
  • Default selectivity:
    • 10% for equality for 1st col, 7.5% for 2 columns
    • 20% for closed ranges (WHERE C1 BETWEEN v1 and v2)
  • TD14.10: Orphaned stats created by dropping UDF are invalidated only when they are used (eg SHOW STATS) next time, not at the time UDF is dropped.
    • OUC is not tracked for DBC tables since they use Express Requests (internal API)
  • TD14.10: ROT (OB)
    • Use SYSTEM SAMPLE globally.
    • Use SYSTEM SAMPLE with SYSTEM THRESHOLD to get both optimizations
    • Use statement level sampling only for special cases

Explain

  • Nested loop join: Joins two indexes. At least one must be unique.
    • directed lookups using each row on left side
    • via PI or USI: Issue directed lookup for each row on left side table
    • via NUSI: dup left side and then issue directed lookup for each row on left-side
    • only join that does not always require all AMPs
  • Hash join: Based on hash, but unlike merge joins, not both tables sorted by hash.
    • Smaller table in memory, then binary searched for each row of larger table
    • smaller table is always sorted, larger table may not be. May avoid redist of larger table.
    • Classic/Hybrid: if small table too large to fit into 1 partition, rows from both tables are partitioned and joined partitioned at a time
    • Dynamic: Small/Left table must fit in 1 partition and is duplicated. Large isn’t hashed to spool on join col, but dynamically hashed. Used only on non-PI join
    • TD14+: Can be used for outer, semi-join (inclusion/exclusion with IN or NOT IN), predicate with one side using expression and DPE
  • Merge join: rows on both side must be in the same hash order. may require pre-sort
    • Not good perf if high data skew - leads to hot amp
    • "fast path" merge joins access rows back and forth between joined table; “slow path” involves access path on left table and always goes from left table to right table. It’s not actually slow, just the naming convention
    • sliding window merge join requires, reading one data block of NPPI and one data block of all eligible partitions in memory for joining. If not enough memory is available to load all partition’s datablocks, NPPI table is read multiple times
  • Merge inclusion/exclusion join: Used for IN or EXISTS conditions. Unlike Merge join (which links all outer rows to inner rows), merge inclusion join returns rows that have at least one inner row.
  • Product join: Each row compared against rows from other side
    • traditional: dup small table. Efficient than dynamic hash if small table < 10 rows
    • enhanced by DPE: dup non-ppi. sort dup table by row-key. For efficiency, granularity of rowkey (of fact) must be 1-to-1 with small (dim table)
    • doesn’t require sort
  • STATFN: Sample 100 rows from each AMP and route to 1 AMP (if # rows > # AMPs)
    • Value dist: if distinct PARTITION keys < 5 * # AMPs, dist by range of key-values (PARTITION BY + ORDER BY composite)
    • Else, redistribute by PARTITION BY columns
  • Optimizer will choose All AMP operation over group-AMP op if it has to scan half of the AMPs. Generally that means number of rows for redist >= ½ AMPs
  • LT/ST (Star) joins: product-join all dimension values before joining to fact table.
    • In-list star join: only some dim are IN lists. Joined using LT/ST method
    • In-list access path: All dim tables are IN lists and joined to fact NUSI - joins materialized dim tables with NUSI and joins row-ids with LT
  • Partial group by: When partial aggregation can be performed before join.
    • Early Group By (SORT/GROUP in explain): grouping then join. GROUP BY and JOIN columns match and there is a unique index on join/group column
    • Partial SUM (in explain), when there is no unique index. Aggregate two times
    • Collect stats on both, join and group by columns
    • NO: if COUNT(*) is the only function, PPI, JI/HI, DISTINCT except INSERT-SELECT
  • DATE variable is treated as literal starting with TD12+, whereas it was USING variable earlier
  • Tricks:
    • Use bogus predicate to lower/increase cardinality estimates to favor e.g. dynamic hash join. e.g. AND TRIM(c1) not in (....)
    • Use expression or functions to bypass index or stats
    • When creating STJI, drop the corresponding NUSI if it existed before
    • PI is all-partition and 1 AMP op. NUSI is an all-AMP and 1 partition op. Define both to get optimal performance. Performance better than USI.
    • UNION maybe cheaper than using OR in WHERE, if it avoids product join
  • Histogram high biased intervals:
    • Non-Modal Values => -2 (or -1)
    • Mode Value => 1st loner value
    • Max Value => 2nd loner value
    • Mode Freq => # of 1st loner rows
    • Non-modal Rows => # of 2nd loner rows

Explain Verbs

  • Merge Join: Rowhash match scan
  • traversal of index => access through VOSI for range predicate
  • DELETE from PPI table, "of n partition" => optimized delete not if “from n partition”
    • DELETE must be the last stmt in transaction for optimization
  • "n/single partition of" => partition elimination
  • Partial SUM => PGB lacks a unique index and thus will require another group by
  • SORT/GROUP => PGB has unique index and group by/join columns align
  • all-rows scan => Product join or Merge Exclusion Join
  • "computed globally" => full ARSA, “computed locally” => only local aggregation
  • "END TRANSACTION" => Commit the request. TJ discarded, lock release.
  • EXPLAIN 1=1 is shown for CROSS JOIN with no WHERE clause
  • "lock DBC.AccessRights" => DDL statement is being executed
  • Aggregates normally have two steps, aggregating and formatting
  • if one or more rows inserted into spool … go to step … => recursive query
  • covering CP merge Merge several CP, no additional base table access needed.
  • from n partition .. by way of index #n partition elimination by scanning NUSI
  • redistributed … to few AMPs Few AMPs indicates light-weight redistribution

Auto-stats

  • Criteria for ranking recommendations
    • Cardinality accuracy: # of times actual result-set was significantly different from estimated
    • Query cost: Average cost of the queries where the stat is recommended
    • Freq: # of times the stat was recommended in a workload
    • Average importance: as assigned by optimizer