SQL Best Practices

SQL is easy to write badly. Queries that work correctly but perform poorly, update the wrong rows, or expose injection vulnerabilities are everywhere. These practices help you avoid the most common traps.

1. Always Use Explicit Column Lists

Avoid SELECT * in production:

-- Avoid:
SELECT * FROM orders;

-- Better:
SELECT order_id, customer_id, amount, status, created_at
FROM orders;

SELECT * returns unnecessary data, breaks when column order changes, and makes queries harder to understand. The exception: quick one-off exploration queries where you genuinely want to see everything.

2. Format Queries for Readability

SQL has no enforced style, but consistent formatting makes queries reviewable:

-- Hard to read
select o.order_id,c.name,sum(oi.quantity*oi.unit_price) as total from orders o join customers c on o.customer_id=c.customer_id join order_items oi on o.order_id=oi.order_id where o.status='completed' group by o.order_id,c.name having sum(oi.quantity*oi.unit_price)>100 order by total desc;

-- Much better
SELECT
    o.order_id,
    c.name AS customer_name,
    SUM(oi.quantity * oi.unit_price) AS total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.status = 'completed'
GROUP BY o.order_id, c.name
HAVING SUM(oi.quantity * oi.unit_price) > 100
ORDER BY total DESC;

Consistent conventions: uppercase keywords, one clause per line, table aliases, meaningful aliases.

3. Always Qualify Column Names with Table Aliases

When joining multiple tables, prefix every column with its table alias:

-- Ambiguous: which table does 'status' come from?
SELECT order_id, name, status
FROM orders JOIN customers ON orders.customer_id = customers.customer_id;

-- Clear:
SELECT o.order_id, c.name, o.status
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;

This prevents ambiguity errors and makes it immediately clear where each column comes from.

4. Index What You Filter and Join On

An index on a column means the database can look up matching rows without scanning the entire table:

-- If you frequently filter by status and created_at:
CREATE INDEX idx_orders_status_date ON orders(status, created_at DESC);

-- If you join orders to customers on customer_id:
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- If you filter by a text column:
CREATE INDEX idx_products_category ON products(category);

Index design rules:

Columns used in WHERE conditions — especially high-cardinality columns (email, user_id)
Columns used in JOIN ON clauses (foreign keys)
Columns in ORDER BY that appear in queries with LIMIT
Composite indexes follow left-prefix rule: (a, b, c) helps queries filtering on a, a + b, or a + b + c — but not on b or c alone

5. Avoid Functions on Indexed Columns in WHERE

Wrapping an indexed column in a function prevents the index from being used:

-- Sargable (index can be used):
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'

-- Not sargable (function prevents index use):
WHERE YEAR(created_at) = 2025
WHERE LOWER(email) = 'alice@example.com'

-- Better for case-insensitive search:
-- Create a functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Then query:
WHERE LOWER(email) = 'alice@example.com'

6. Use EXPLAIN ANALYZE to Diagnose Slow Queries

Before optimizing a slow query, understand what the database is actually doing:

EXPLAIN ANALYZE
SELECT c.name, SUM(o.amount)
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name;

What to look for:

Seq Scan on a large table → usually means a missing index
Hash Join → efficient for large datasets (good)
Nested Loop → good for small tables, bad for large unconstrained ones
High actual rows vs estimated rows → stale table statistics; run ANALYZE table_name
High cost nodes → focus optimization there first

7. Never Concatenate User Input into SQL

SQL injection is still the most common database vulnerability. The fix is always parameterized queries:

-- NEVER DO THIS (SQL injection risk):
query = "SELECT * FROM users WHERE email = '" + user_input + "'"

-- Always use parameterized queries:
-- Python (psycopg2)
cursor.execute("SELECT * FROM users WHERE email = %s", (user_input,))

-- Node.js (pg)
client.query("SELECT * FROM users WHERE email = $1", [userInput])

-- Java (PreparedStatement)
PreparedStatement stmt = conn.prepareStatement(
    "SELECT * FROM users WHERE email = ?"
);
stmt.setString(1, userEmail);

The database driver handles escaping. User input is never interpreted as SQL.

8. Wrap Multi-Statement Operations in Transactions

Operations that span multiple statements should succeed or fail together:

BEGIN;
    -- Transfer funds between accounts
    UPDATE accounts SET balance = balance - 500 WHERE account_id = 101;
    UPDATE accounts SET balance = balance + 500 WHERE account_id = 202;
    -- If either fails, ROLLBACK; otherwise COMMIT
COMMIT;

Without a transaction, a crash between the two UPDATE statements would leave the database in an inconsistent state (money removed but not added).

9. Preview Before Deleting or Updating

Always confirm your WHERE clause selects the right rows:

-- Run this first:
SELECT COUNT(*), MIN(order_date), MAX(order_date)
FROM orders
WHERE status = 'cancelled' AND created_at < '2024-01-01';

-- If count is right, run:
DELETE FROM orders
WHERE status = 'cancelled' AND created_at < '2024-01-01';

Many engineers wrap destructive operations in a transaction with ROLLBACK until they’re certain:

BEGIN;
DELETE FROM orders WHERE status = 'cancelled' AND created_at < '2024-01-01';
-- Check: SELECT COUNT(*) FROM orders;
ROLLBACK;  -- or COMMIT once verified

10. Use CTEs for Readability, Not Just DRY

CTEs (Common Table Expressions) aren’t just for reuse — they make complex queries readable by naming intermediate steps:

WITH
completed_orders AS (
    SELECT customer_id, SUM(amount) AS lifetime_value
    FROM orders
    WHERE status = 'completed'
    GROUP BY customer_id
),
high_value_customers AS (
    SELECT customer_id
    FROM completed_orders
    WHERE lifetime_value > 1000
)
SELECT c.name, c.email, co.lifetime_value
FROM customers c
JOIN completed_orders co USING (customer_id)
WHERE c.customer_id IN (SELECT customer_id FROM high_value_customers);

Each CTE has a name that describes what it represents — the query reads like a series of named steps.

11. Avoid SELECT DISTINCT as a Crutch

DISTINCT is often used to “fix” duplicate rows from a join. But duplicates in a join usually mean the join itself is wrong:

-- If this returns duplicates, the JOIN is probably wrong
SELECT DISTINCT customer_id FROM orders o JOIN promotions p ON ...;

-- Fix the JOIN logic instead of masking it with DISTINCT

Legitimate uses of DISTINCT: getting unique values from a single column, or deduplicating results from a UNION.

12. Use NULLs Deliberately

NULL means “unknown” — not zero, not empty string. Treat them intentionally:

-- Don't use 0 or '' to represent missing data
-- Use NULL and handle it explicitly:
WHERE phone IS NOT NULL          -- filter for known phones
COALESCE(phone, 'Unknown')       -- display fallback
COUNT(phone) vs COUNT(*)         -- COUNT(column) excludes NULLs

And remember: NULL != NULL in SQL. Comparing with = NULL always returns NULL (false). Always use IS NULL / IS NOT NULL.