Snowflake
Architecture
- Snowflake Architecture
- Multi-Cluster Architecture
- Shared Disk and MPP Processing
- Virtual Warehouses
- Zero-Copy Cloning
Data Storage & Management
Data Loading & Integration
QueryProcessing & Optimization
- Query Caching
- Query Profile & Optimization
- Materialized Views
- Adaptive Query Execution
- Query Acceleration Service
SQL & Analytics Features
❄️ Snowflake Query Profile & Optimization: Mastering Query Performance
Have you ever wondered why some SQL queries in Snowflake are lightning fast while others take ages? The answer lies in understanding and optimizing your query execution plan — and that’s where the Snowflake Query Profile becomes your best friend.
The Query Profile is Snowflake’s built-in visualization tool that helps you see how your queries are executed step by step, identify bottlenecks, and fine-tune performance for efficiency and cost savings.
🧩 What is a Query Profile in Snowflake?
A Query Profile is a graphical and data-driven view of how a query was executed in Snowflake. It shows details such as:
- Execution steps (operators)
- Processing time at each step
- Data scanned
- Partitions pruned
- Join operations
- Network overhead
- Query statistics
In simpler terms — it’s like an X-ray of your SQL query, showing what happens inside Snowflake’s query engine.
⚙️ How to Access Query Profile
You can access the Query Profile in three ways:
-
Snowflake Web UI:
- Navigate to History → Query → Query ID → Query Profile
-
Snowsight (New UI):
- Select query → Click “Query Details” → “Query Profile”
-
Using SQL Command:
SELECT *FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())WHERE query_id = '<your_query_id>';
🌐 ** – Query Execution Flow in Snowflake**
🧠 Understanding the Query Profile Structure
Snowflake Query Profile consists of multiple nodes and edges. Each node represents a step in query execution, and edges show data flow between them.
Key Elements:
Term | Meaning |
---|---|
Node | Operation step (e.g., scan, join, filter) |
Node Time | Time taken by that step |
Rows Produced | Number of rows output from that step |
Bytes Scanned | Amount of data processed |
Pruning Ratio | % of partitions skipped |
Spill to Disk | When data exceeds memory capacity |
🧩 Major Phases of Query Execution in Snowflake
Phase | Description |
---|---|
1. Compilation | SQL parsing, optimization, and plan creation |
2. Execution | Data retrieval, filtering, joining |
3. Result Generation | Final aggregation and output |
4. Caching | Results cached for reuse |
🧠 Example 1: Analyzing Simple Query Profile
Let’s start with a basic query.
SELECT region, SUM(sales_amount)FROM salesGROUP BY region;
🔍 Steps Observed in Query Profile:
- Table Scan – Reads data from the sales table.
- Aggregation – Sums the sales by region.
- Result Output – Returns aggregated data.
🧾 Key Insights:
- If
Bytes Scanned
is high, consider clustering or micro-partition pruning. - If most time is spent on “Aggregation,” ensure data types are optimized.
💡 Optimization Tip:
Use clustering keys on high-cardinality columns to improve pruning efficiency.
ALTER TABLE sales CLUSTER BY (region);
🧠 Example 2: Query with JOIN and Filter
SELECT c.customer_name, SUM(o.order_total)FROM customers cJOIN orders o ON c.customer_id = o.customer_idWHERE o.order_date >= '2025-01-01'GROUP BY c.customer_name;
🔍 Query Profile Highlights:
- Join Node: Time spent combining data.
- Scan Node: Filtered partitions based on
order_date
. - Aggregation Node: Summation and grouping.
🧾 Performance Observations:
Bottleneck | Cause | Solution |
---|---|---|
Large JOIN Time | No pruning | Filter before join |
High Scan Cost | Non-clustered date column | Add clustering key |
Aggregation Delay | Skewed data | Use distributed aggregation |
💡 Optimization Tip:
Use filter pushdown (apply filters early) to minimize join input size.
SELECT c.customer_name, SUM(o.order_total)FROM customers cJOIN ( SELECT * FROM orders WHERE order_date >= '2025-01-01') o ON c.customer_id = o.customer_idGROUP BY c.customer_name;
🧠 Example 3: Window Function Performance
SELECT region, SUM(sales_amount) OVER (PARTITION BY region ORDER BY date) AS running_totalFROM sales;
🔍 Query Profile Highlights:
- Window Node: Most compute-intensive operation.
- Scan Node: Large data volume processed.
- Sort Node: Sorting data for window partition.
🧾 Optimization Strategies:
Issue | Cause | Solution |
---|---|---|
Slow Sorting | Large dataset | Pre-cluster data by region/date |
High I/O | Repetitive calculations | Use CTEs or temporary tables |
Compute Overload | Complex window | Use fewer partitions |
💡 Optimization Tip:
Pre-sort your data physically using clustering keys.
ALTER TABLE sales CLUSTER BY (region, date);
📊 Common Nodes in Query Profile
Node Name | Meaning |
---|---|
SCAN | Reads data from storage |
FILTER | Applies WHERE conditions |
JOIN | Combines data from two sources |
AGGREGATE | Performs SUM, COUNT, etc. |
WINDOW | Executes analytic functions |
SORT | Orders data |
RESULT | Returns data to user |
🧠 How to Interpret Query Profile Metrics
Metric | Description |
---|---|
Execution Time | Total time taken by query |
Bytes Scanned | Data volume read |
Rows Produced | Output count |
Partitions Scanned | Micro-partitions accessed |
Pruning Ratio | Efficiency of filtering |
Disk Spill | Data overflowed from memory to disk |
🌐 ** – Query Optimization Lifecycle**
🧩 Top Query Optimization Techniques
Optimization | Description | Example |
---|---|---|
Clustering Keys | Helps micro-partition pruning | ALTER TABLE orders CLUSTER BY (region) |
Result Caching | Reuse query results | USE_CACHED_RESULT = TRUE |
CTEs | Simplify complex queries | WITH temp AS (...) SELECT ... |
Materialized Views | Store computed results | CREATE MATERIALIZED VIEW mv AS SELECT ... |
Limit Columns | Avoid SELECT * | SELECT col1, col2 only |
Filter Pushdown | Apply filters early | Use WHERE in subqueries |
💡 Real-World Performance Tuning Example
Scenario:
A BI dashboard runs this query frequently:
SELECT category, AVG(price)FROM productsWHERE region = 'US'GROUP BY category;
Problem:
- Slow performance
- High bytes scanned
Optimization:
ALTER TABLE products CLUSTER BY (region);
After re-clustering, the Query Profile shows:
- 70% fewer partitions scanned
- 60% lower query time
✅ Result: Query runs twice as fast.
🧠 How to Remember for Interview
Use the mnemonic:
“P-C-E” → Parse, Compile, Execute
And another for query profile analysis:
“S-F-J-A” → Scan, Filter, Join, Aggregate
Memory Trick | Meaning |
---|---|
PCE | Query lifecycle stages |
SFJA | Common nodes in execution graph |
“Red Hot Node” | Node with highest time usage = bottleneck |
🧠 Interview Q&A Cheatsheet
Question | Answer |
---|---|
What is a Query Profile in Snowflake? | Visualization of query execution steps. |
How do you access it? | Through Query History or Snowsight UI. |
What are common bottlenecks? | Large scans, joins, and sorts. |
What metrics are important? | Bytes scanned, node time, pruning ratio. |
How do you optimize queries? | Clustering, filter pushdown, materialized views. |
How do you find data skew? | Using Query Profile’s “Rows Produced” per node. |
📈 Why Query Profile & Optimization Matter
Benefit | Description |
---|---|
1. Cost Efficiency | Reduce compute credits by optimizing heavy queries |
2. Speed | Improve query runtime for dashboards and reports |
3. Scalability | Handle higher concurrency smoothly |
4. Resource Management | Identify expensive operations early |
5. Reliability | Avoid long-running or failed queries |
🧠 Common Mistakes to Avoid
Mistake | Impact | Solution |
---|---|---|
Using SELECT * | Unnecessary data scan | Select only required columns |
No clustering | Full table scan | Use clustering keys |
Unfiltered joins | Large intermediate data | Apply WHERE before JOIN |
Repeated subqueries | Redundant computation | Use CTEs or temp tables |
Ignoring cache | Higher cost | Enable caching |
📊 Query Optimization Cycle
🧩 Monitoring Query Performance
Use Snowflake’s ACCOUNT_USAGE views:
SELECT query_id, query_text, total_elapsed_time, bytes_scanned, rows_producedFROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORYWHERE start_time > DATEADD('hour', -2, CURRENT_TIMESTAMP());
This helps identify expensive queries to optimize using Query Profiles.
🧭 Practical Optimization Checklist
Step | Action |
---|---|
✅ Step 1 | Run Query and Capture Query ID |
✅ Step 2 | Open Query Profile |
✅ Step 3 | Identify Top 3 Nodes by Execution Time |
✅ Step 4 | Analyze Data Scanned & Rows Produced |
✅ Step 5 | Apply Clustering or Pruning Optimization |
✅ Step 6 | Re-run and Compare Profiles |
🧠 Visualization – Query Bottleneck Flow
💡 Pro Tip: Combine Query Profile + Query History
Together, they reveal both what is slow and why it’s slow.
- Use Query History to spot slow queries.
- Use Query Profile to drill down into specific bottlenecks.
🧠 How to Prepare for Exams
- Practice using Snowsight Query Profile with different SQL queries.
- Memorize execution phases (Parse → Optimize → Execute).
- Learn common nodes and metrics (SCAN, FILTER, JOIN, AGGREGATE).
- Use mnemonic SFJA to recall node order.
- Be able to explain how clustering keys impact pruning.
🧩 ** – Query Profile Layers**
🎯 Conclusion
Snowflake Query Profile & Optimization isn’t just a diagnostic tool — it’s your window into query performance intelligence. By analyzing execution plans, understanding node-level metrics, and applying optimization techniques, you can turn sluggish queries into high-speed insights.
💡 Final Thought:
“In Snowflake, optimization starts with observation — and Query Profile is your map.”
✅ Quick Recap
Topic | Key Point |
---|---|
What is Query Profile? | Visual query execution analyzer |
Main Benefit | Detect bottlenecks |
Tools | Web UI, Snowsight, SQL |
Optimization Techniques | Clustering, pruning, caching |
Mnemonics | PCE (Parse-Compile-Execute), SFJA (Scan-Filter-Join-Aggregate) |