Misc¶

GRANT and REVOKE are immediately effective without having the user to log in/out.
- internal cache is invalidated thus has overhead of getting privileges cached again
XML data is stored internally as JSON
A typical Snowflake node: c5d.2xlarge 16GB RAM, 200GB SSD (Ref prod deployment, SnoVi)
changes to time-travel period are managed using S3 policies and can take up to 24 hours
SCOPED TEMPORARY is used to [limit lifetime]*https://snowflake.slack.com/archives/C01BD4CSEFN/p1682697640049339?thread_ts=1682653288.573179&cid=C01BD4CSEFN) of the temporary object to stored-procedure call
- Python and JVM stored-procedures cannot create temporary objects; but can create scoped temporary objects
Large result-sets are passed to the client via blob; smaller (set of) results are transferred directly via Snowflake VPC over HTTPS (Ref: Jim Brody PrivateLink lunch-n-learn)
All SO bloom search-trees are built case-insensitive (Ref: perf fcto)
temporary tables are also written to remote storage, this is because they are tied to the session, keeping them on local storage makes them tied to the cluster they are running on
performance:
- select only the needed column due to Snowflake being a columnar database
- Suspending WH discards cache, thus slow initial start-up
- large ORDER BY, high-cardinality GROUP BY or large joins normally benefit from larger WH
Snowpipe: for periods less than 1 min, the latency reduction isn't guaranteed
Assume 200MB/sec/core for warehouse processing throughput (Ref: an example given in Adaptive WH Eng video)

WebUI¶

Web UI shows query history for 14 days, but ACCOUNT_USAGE schema in SNOWFLAKE share has more details and longer history
Warehouse size is blank in History tab for queries that don't use warehouse
queries execution in Web UI
- duration consist of 3 parts: Queuing (wait time for WH to assigned), Compilation (parsing), Execution
- Bytes Scanned: Remote (green), Local (blue)
Besides AccountAdmin, SecurityAdmin role can see the "Account" tab in Web UI
Min/Max Values and Scaling Policy options for Warehouse are shown only for Enterprise+ customers
only AccountAdmin can see "Notification" icon
Downloadable software - SnowSQL, ODBC and SnowCD (there is no "R" driver)
data load wizard has file size limitations

Glossary¶

DOP: Degree of Parallelism
RSO: Row Set Operator

SnoVi¶

Workers = nodes; instance = threads (usually 8/node)
Useful fields:
- ioLocalFdnWritesBytes: local/temp writes
- ioRemoteFdnWritesBytes: remote S3 writes
- QO Stats: query optimization (compile) time
- stable plan: shows when a plan is pinned; can be requested through support
Shows only XP level information. i.e. post GS layer activities
Execution Log -> XP Misc => large value means high overhead (not useful work)
Query retries
- Step # for retries are in 1000 + original step #;
- customers can see step retries, but not job retries
- a job failure is when the entire query is sent back to GS layer for rerunning
- Misc time (Query Details -> Black bar in Statistics) - prev retry time
Worker => Node, Instance => thread
QAS has a separate query-id, which is in the Query Detail -> Execution Log. This QID is internal and can't be searched. But can be put directly in the URL to bring it up
Each parameter that is active in in Query Detail -> Execution Log
Can compare two query-ids using Compare tab
- More useful compare: type a second query-id in the URL with , separating two query-ids
high mutex time => a lot of data exchanges (SnowVi Overview Video)
Skew = Tot threads / threads used, e.g. 16 threads / 2 busy threads => ~ 8x Skew
RSOs Skew tab: InputRecords is the number of rows flows in and NetSent is flow out of each RSO by instance and thread
Investigate Other time in RSO/RSO Normalized charts in All Stats Tab: Look at each RSO/thread to see when they started/ended
All Stats -> Instance Usage: Shows how many instances (threads) were busy
If an RSO shows distribution method as local_synchronous => no shuffle, for hash => shuffle
Box# are meaningful (as in order of execution) in SnoVi v/s meaningless in Profile
BLOOM_ASYNC Join (not the first choice of Snowflake) is reshuffle (Ref PerfSol call) of both (build and prob) sides, whereas BROADCAST (preferred) is where build/left side is broadcasted to all nodes of probe/right side
SnoVi's CPU utilization doesn't include CPU used by UDFs
DOP = Degree of Parallelism
MV on external tables, aggregates at file level to materialize (Ref Darren Gardner perf CTO)
ScanBack operator usually appears in DML operations and reads and then rewrites all rows from the affected MPs
- Total number of MPs/rows read/rewritten can be obtained by looking at filesUnregistered and numOriginalRows attributes of the ScanBack operator
Split operator can occur:
- MERGE when some of the matched rows remain unchanged (e.g. due to additional condition WHEN MATCHED AND ...), the RSO references only the target columns; and the other branch contains changed data (RSO references target and source data)
Small File Compaction may happen during an INSERT (per Darren FCTO), where Snowflake, when performing INSERTs may combine micro-partitions that are too small
Two aggregates nodes in succession (first one being called either bottom or child) is for local and global sorts
ScanBack operator is used for writing micro-partitions that have some of the rows that were updated. It is depicted with number of rows going in and number of rows coming out. They represent how many rows were needed to be updated and how many rows actually ended up being written. Very high ratio of rows-written to rows-changed could indicate updates are spread across large number of micro-partitions. Clustering or faster-DML is normally the solution