Advanced SQL Concepts

Once you’re comfortable with SELECT, JOIN, GROUP BY, and basic subqueries, a second tier of SQL capabilities opens up — one that’s expected of data engineers, analytics engineers, and senior analysts. This guide covers the patterns and techniques that separate intermediate SQL users from those who can solve genuinely complex analytical problems.

Window Functions: Analytics Without Collapsing Rows

Window functions perform calculations across a set of rows related to the current row — without collapsing results the way GROUP BY does. This makes them ideal for ranking, running totals, moving averages, and comparing rows to their neighbors.

Basic syntax:

function_name(expression) OVER (
    PARTITION BY column      -- reset the window for each group
    ORDER BY column          -- order within the window
    ROWS/RANGE frame_spec    -- optional: define the window size
)

Ranking functions:

SELECT
    employee_id,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_dense_rank,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

The difference:

ROW_NUMBER() — always unique, no ties
RANK() — ties get the same number, next number skips (1, 1, 3)
DENSE_RANK() — ties get the same number, next number doesn’t skip (1, 1, 2)

Running totals and moving averages:

SELECT
    order_date,
    daily_revenue,
    SUM(daily_revenue) OVER (ORDER BY order_date) AS running_total,
    AVG(daily_revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS seven_day_moving_avg
FROM daily_sales;

Accessing adjacent rows:

SELECT
    product_id,
    sale_date,
    revenue,
    LAG(revenue, 1) OVER (PARTITION BY product_id ORDER BY sale_date) AS prev_day_revenue,
    revenue - LAG(revenue, 1) OVER (PARTITION BY product_id ORDER BY sale_date) AS day_over_day_change
FROM product_sales;

Common Table Expressions (CTEs)

CTEs use the WITH clause to define named intermediate result sets. They make complex queries readable by breaking them into logical steps.

WITH
customer_orders AS (
    SELECT
        customer_id,
        COUNT(*) AS order_count,
        SUM(amount) AS total_spent,
        MAX(order_date) AS last_order_date
    FROM orders
    GROUP BY customer_id
),
customer_segments AS (
    SELECT
        customer_id,
        total_spent,
        CASE
            WHEN total_spent >= 5000 THEN 'Platinum'
            WHEN total_spent >= 1000 THEN 'Gold'
            WHEN total_spent >= 250  THEN 'Silver'
            ELSE 'Bronze'
        END AS segment
    FROM customer_orders
)
SELECT
    cs.segment,
    COUNT(*) AS customer_count,
    AVG(co.total_spent) AS avg_spend,
    AVG(co.order_count) AS avg_orders
FROM customer_segments cs
JOIN customer_orders co USING (customer_id)
GROUP BY cs.segment
ORDER BY avg_spend DESC;

CTEs can reference each other in sequence, and in most databases multiple CTEs can be defined in the same WITH block, separated by commas.

Recursive CTEs

Recursive CTEs handle hierarchical or graph data — organizational trees, category hierarchies, bill-of-materials, network paths. They’re one of the most powerful but least-used features in SQL.

-- Walk an org chart from a given manager down
WITH RECURSIVE org_tree AS (
    -- Anchor: start with the given manager
    SELECT employee_id, name, manager_id, 0 AS depth
    FROM employees
    WHERE employee_id = 100  -- starting manager

    UNION ALL

    -- Recursive: find direct reports of each person in the tree
    SELECT e.employee_id, e.name, e.manager_id, ot.depth + 1
    FROM employees e
    JOIN org_tree ot ON e.manager_id = ot.employee_id
)
SELECT name, depth
FROM org_tree
ORDER BY depth, name;

Structure:

Anchor member — the base case (starting rows)
UNION ALL
Recursive member — joins back to the CTE itself, referencing rows from the previous iteration

Lateral Joins

A lateral join (called CROSS APPLY / OUTER APPLY in SQL Server) allows the subquery on the right side to reference columns from the table on the left — something a regular subquery cannot do.

-- For each customer, get their 3 most recent orders
SELECT
    c.customer_id,
    c.name,
    recent.order_id,
    recent.amount,
    recent.order_date
FROM customers c
CROSS JOIN LATERAL (
    SELECT order_id, amount, order_date
    FROM orders o
    WHERE o.customer_id = c.customer_id   -- references c from outer query
    ORDER BY order_date DESC
    LIMIT 3
) recent;

Lateral joins are particularly useful for:

Top-N per group without window functions
Calling set-returning functions row by row
Unnesting arrays or JSON into rows per record

Advanced Filtering Patterns

FILTER clause on aggregates (cleaner than CASE WHEN):

SELECT
    COUNT(*) AS total_orders,
    COUNT(*) FILTER (WHERE status = 'completed') AS completed,
    COUNT(*) FILTER (WHERE status = 'refunded') AS refunded,
    SUM(amount) FILTER (WHERE status = 'completed') AS completed_revenue
FROM orders;

Conditional aggregation with CASE WHEN (works in all databases):

SELECT
    SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) AS completed_revenue,
    SUM(CASE WHEN status = 'refunded' THEN amount ELSE 0 END) AS refunded_revenue
FROM orders;

EXISTS vs IN for subqueries:

-- EXISTS is typically faster when the subquery result is large
SELECT customer_id, name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.amount > 1000
);

String and JSON Functions

Modern SQL databases handle semi-structured data natively.

PostgreSQL JSON:

-- Extract fields from JSON column
SELECT
    id,
    event_data->>'user_id' AS user_id,
    (event_data->>'amount')::numeric AS amount,
    event_data->'metadata'->>'source' AS source
FROM events
WHERE event_data->>'type' = 'purchase';

-- Query arrays in JSON
SELECT id, jsonb_array_elements_text(event_data->'tags') AS tag
FROM events;

String manipulation:

SELECT
    LOWER(TRIM(email)) AS normalized_email,
    SPLIT_PART(email, '@', 2) AS domain,
    LEFT(phone, 3) AS area_code,
    REGEXP_REPLACE(description, '[^a-zA-Z0-9 ]', '', 'g') AS clean_description
FROM users;

Query Performance and Optimization

Writing a query that returns the right answer is necessary but not sufficient. At scale, performance matters.

Use EXPLAIN ANALYZE to see what the database actually does:

EXPLAIN ANALYZE
SELECT c.name, SUM(o.amount)
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2025-01-01'
GROUP BY c.customer_id, c.name
ORDER BY SUM(o.amount) DESC;

Look for:

Seq Scan on large tables — usually means a missing index
Hash Join vs Nested Loop — hash join is better for large result sets
High rows estimated vs rows actual discrepancy — stale statistics

Common performance anti-patterns:

-- BAD: function on indexed column prevents index use
WHERE YEAR(order_date) = 2025

-- GOOD: range comparison uses the index
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'

-- BAD: leading wildcard prevents index use
WHERE name LIKE '%smith%'

-- GOOD: leading literal uses index
WHERE name LIKE 'smith%'

-- BAD: SELECT * when you need 2 columns from a 50-column table
SELECT * FROM orders WHERE status = 'pending'

-- GOOD: only fetch what you need
SELECT order_id, amount FROM orders WHERE status = 'pending'

Pivoting Data

SQL doesn’t have native PIVOT syntax in PostgreSQL, but conditional aggregation achieves the same result:

-- Sales by product, pivoted to months as columns
SELECT
    product_name,
    SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 1 THEN amount END) AS jan,
    SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 2 THEN amount END) AS feb,
    SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 3 THEN amount END) AS mar,
    SUM(CASE WHEN EXTRACT(MONTH FROM sale_date) = 4 THEN amount END) AS apr
FROM sales
WHERE EXTRACT(YEAR FROM sale_date) = 2025
GROUP BY product_name;

Snowflake and SQL Server have native PIVOT operators; BigQuery uses PIVOT as well (added in 2021).

Set Operations

UNION, INTERSECT, and EXCEPT (or MINUS in Oracle) operate on entire result sets:

-- All customers and suppliers in one list
SELECT name, 'customer' AS type FROM customers
UNION
SELECT company_name, 'supplier' AS type FROM suppliers;

-- Customers who are also registered as suppliers
SELECT email FROM customers
INTERSECT
SELECT contact_email FROM suppliers;

-- Customers who have NOT placed an order in 2025
SELECT customer_id FROM customers
EXCEPT
SELECT DISTINCT customer_id FROM orders WHERE order_date >= '2025-01-01';

UNION deduplicates (like DISTINCT). UNION ALL keeps all rows including duplicates — and is faster because it skips the deduplication step.

Date and Time Operations

Date math is database-specific. Common patterns:

-- PostgreSQL
SELECT
    NOW() AS current_timestamp,
    CURRENT_DATE AS today,
    CURRENT_DATE - INTERVAL '30 days' AS thirty_days_ago,
    DATE_TRUNC('week', order_date) AS week_start,
    EXTRACT(DOW FROM order_date) AS day_of_week,  -- 0=Sunday, 6=Saturday
    order_date + INTERVAL '1 month' AS next_month
FROM orders;

-- Calculate age / days between dates
SELECT
    customer_id,
    CURRENT_DATE - first_order_date AS days_since_first_order,
    AGE(last_order_date, first_order_date) AS order_lifespan
FROM customer_summary;

The jump from basic SQL to advanced SQL is largely a matter of knowing these tools exist and practicing them on real data. Window functions and CTEs will change how you think about analytical problems — once you know them, you’ll find uses for them constantly.