Skip to content

Performance

  • Maximize the warehouse cache by scaling up (more server nodes => more local cache)
  • BP clustering key: large (TB+) infrequently changing tables, 1-3 columns in order of low to high cardinality
  • pruning: eliminating micro partition from scan that are known to not satisfy the query filter
    • filtering using sub-query doesn't use pruning
    • more effective on dates and numbers, less effective on long, high cardinality strings such as UUID
  • query load for an interval = total execution time of all queries / total clock time
  • snowflake.account_usage.query_load_percent % of participating WH nodes
  • interpreting profile
    • Network refers to number of bytes sent between nodes of a cluster
  • concurrency is determined by:
    • MAX_CONCURRENCY_LEVEL (default 8). Note: fully parallel queries per cluster count as occupying the full slot. If a query utilizes less than all threads, it is possible to have more queries than the limit. E.g. CALL needs only 1 thread
    • available memory may reduce the ability to run more queries in a cluster
  • If CTAS has CLUSTER BY, it may run slower because the data is sorted synchronously. This may also require a larger warehouse because sorting is quite intensive operation (Ref Perf FCTO Daniel Gardner)
  • CTEs are inlined if referenced once, otherwise they maybe materialized
  • Faster DML
    • Each micro-partition can optionally have one delta file. It exists only if there are logical deletes
    • Snowflake optimizer can choose for each query, and each micro-partition to write either as copy-on-write or update delta file that is independent of other micro-partition within the same query
    • there is no background process that cleans up delta files, they are managed as part of foreground SQL DML statements (Ref: Darren Gardner)
  • small file compaction: INSERT or COPY will pickup recently added small files (FDNs) to combine them into a bigger file. This happens only if there is no clustering key to auto-cluster.
  • QAS (Ref Darren Gardner)
  • For Scans can only do scan, projection, filter and aggregate RSOs - QAS returns the results to WH by writing to remote storage
    • For COPY INTO QAS can do end-to-end
  • Gen2 WH (Ref Gardner):
    • Adaptive scanning (?) for FDN; ICEBERG tables can do adaptive scan in both Gen1 and Gen2 WH
    • Faster-DML for FDN tables. ICEBERG tables do not use Faster DML
  • get_query_operator_stats(<query-id>): table function that returns individual steps of explain profile

SQL Tips and Tricks

  • Optimizing GROUP BY
    • Limit GROUP BY clause to use only the true grouping columns and use ANY_VALUE function for other columns in SELECT
    • Use CTE to group rows first before joins
  • disjunctive joins (with OR condition) normally shows up as cartesian join due to filter being applied after the join. Use UNION to speed up
  • uncorrelated sub-queries are usually more efficient than correlated sub-queries
  • GROUP BY with many distinct values are less efficient than fewer distinct values
  • When combining data from two tables, FULL OUTER JOIN doesn't allow predicate pushdown, but UNION ALL for effective pruning
  • DIRECTED JOIN allows enforced join order. The tables are joined in the same order they appear in the FROM clause

Clustering

  • define a clustering key for large (TB+) tables
  • clustering key can include expression TO_DATE(), SUBSTRING, or JSON path on VARIANT
  • high average_depth value indicates high overlap and low clustering
  • clustering key is carried over when using create table ... clone ..., but not when using create table ... like ...
  • billing for automatic clustering shows up in special AUTOMATIC_CLUSTERING VWH.
  • range predicates on high-cardinality columns that are clustered will likely perform poorly
  • For character columns only the first 5 or 6 bytes are used for clustering
    • Min/Max values for character columns stores up to 32 bytes
  • INSERT can cause optimizer to perform union (visible in SnoVi) that combines new and existing data prior to writing the data. This can happen if there are too many small files (< 2MB) in the target table and it doesn't already have a clustering key (Ref Darren Gardner).
    • Solution: have a auto-clustering, this prevents optimizer from compacting
  • if CTAS creates a table which defines a clustering key, an automatic ORDER BY after the query (ref: DJ of fcto-perfsol)
    • prefer cluster by over order by when using CTAS. order by is expensive because it's a perfect sort unlike cluster by
    • it is possible auto-clustering can reduce clustering of a perfectly clustered obtained by order by

Metrics

  • alter table t1 {suspend | resume} recluster for controlling re-clustering
    • note: newly created clone tables will have automatic clustering suspended
  • system$clustering_depth() returns a single number (clustering depth), (optionally specify a predicate to get depth of only a subset of data)
  • system$clustering_information(): returns following information
    • average_depth: for a column is number of micro-partitions that contain overlapping ranges
    • total_constant_partition_count: number of partitions which are in optimal state and won't benefit from re-clustering
    • average_overlaps: how many other partitions, on average, overlap with partition's MIN and MAX values
    • cluster_by_keys: LINEAR is the (only supported) re-clustering method.
    • partition_depth_histogram: histogram of number of partitions overlap depth

BP

  • multi-column clustering: order columns from lowest-to-highest cardinality
  • multi-column clustering: no more than 3 or 4 columns
  • multi-column clustering: pick columns that are most used in selective filters, then in join filters
  • bad choices: cardinality too low (e.g. gender) or too high (timestamp)
  • target average_depth < 10 and constant partitions to be > 80%

Search Optimization

  • internally a bloom filter persisted in a hidden materialized view (Ref: Perf Opt Fundamentals lessonly.com)
  • permissions needed to configure: add search optimization on schema and ownership on table
  • can configure for the entire table, or specifically columns with search methods. ... on <method>(* | <column>...)
    • supported search methods: equality, substring, and geography (but not geometry)
    • equality supports numeric, string, binary and variant types
  • SO on variant columns applies to specified object and its children, e.g. src:a.b includes src:a.b.c etc
  • substring within regex are supported
  • min 5 char for substring and regex searches
  • not supported: joins, casts (except fixed-point number to string)
  • searches against variant must be explicitly enabled with ON
  • ideal for: size=100+GB, cardinality=100-200k, queries run for 10s+
  • Variant columns containing JSON null are slow because they are not extracted
    • BP: Either extract null into a column before loading; or use strip_null if possible
    • BP: Each unique element stores values of single (JSON) data types
  • Use account_usage.search_optimization_benefits contains partitions_pruned_additional column that indicates how many partitions were skipped because of SO
  • Use system$estimate_search_optimization_costs('<table>', '<method>(<target>)') to estimate

Loading

  • table size is more important than number of rows for loading
  • BP:
    • loading compressed CSV files is the fastest at about 3x compared to Parquet or ORC
    • large number of columns require more time
    • loading into semi-structured is ~20% slower than loading into structured
    • compressed data loads faster than uncompressed data
  • JDBC driver uploads files differently depending on if they are deemed big or small with cutoff being 200MB.
    • When uploading multiple large files, they are chunked and chunks being uploaded in parallel, but large files themselves are uploaded serially.
    • Small files are not chunked, but multiple small files can be uploaded in parallel

Joins

  • Snowflake prefers right-deep join tree
    • small dimensions on left-side for build and big fact tables on right-side for probe
    • values on the left-side are hashed and used as a bloom filter for probing the right table
    • favoring scan of dimension tables first and the pushing down to fact table gives rise to right deep tree plan in query profile
  • JoinFilter: indicates dynamic pruning; values from left (aka Build) (usually dimension) table are passed on to table scan of fact table (right, aka probe)
    • bloom filters are probabilistic; they may produce false positive, but never false negative
    • hashmaps from multiple build sides can be pushed down to the probe side, => a fact table can use values from multiple dimensions to limit the number of partitions to scan.
    • A large reduction in JoinFilter is always desirable.
  • Two types of data distribution for joins:
    • Broadcast: small tables replicated to all nodes
    • Hash-Partitioned: if the build table is too large to be copied to every node, all data are hashed and redistributed to each node
      • bytes sent over network are the shuffled bytes
  • Join-skew avoidance: Snowflake uses broadcast join by detecting popular values during build-side, and using direct join for other values
  • Optimizer doesn't produce different query plans based on if the table has a clustering key, nor does it have a sort-merge join

Caches

Metadata Cache

  • LRU caches query compilations
  • cache is per GS node
  • turned cold every week with customers moving to new GS nodes during new releases
  • MIN, MAX and COUNT functions

Warehouse Data Cache

  • fetched S3 data is stored on local SSD using LRU
  • if a micro-partition is assigned to a specific node, then queries needing that MP will assign scan to the same partition
  • cache is by node (Ref Darren Gardner)

Results Cache

  • stores query results in S3 for reuse up to 24 hours with maximum of 31 days
  • this cache can be turned off by user using parameter USE_CACHED_RESULT
  • small (<100k) query results are stored in cloud-services layer
  • SHOW if users share the same role, or SELECT if users with permissions on all tables
  • changes can be made to the table if the micro-partitions needed for the query aren't affected
  • When using variables in query, cache will be reused only if variables haven't been set again since last cache.

AU/OU Views

  • table_query_pruning_history and column_query_pruning_history
    • partitions_pruned, partitions_scanned, rows_scanned, rows_matched, rows_pruned
    • BP: use for drilling down on column usage of a particular table
    • BP: group by table and look for pruning ratio and unused rows ratio
  • query_insight: identifies inefficiency (insight_type_id) and opportunities (is_opportunity)