SQL Best Practices
SQL is easy to write badly. Queries that work correctly but perform poorly, update the wrong rows, or expose injection vulnerabilities are everywhere. These practices help you avoid the most common traps.
1. Always Use Explicit Column Lists
Avoid SELECT * in production:
-- Avoid:SELECT * FROM orders;
-- Better:SELECT order_id, customer_id, amount, status, created_atFROM orders;SELECT * returns unnecessary data, breaks when column order changes, and makes queries harder to understand. The exception: quick one-off exploration queries where you genuinely want to see everything.
2. Format Queries for Readability
SQL has no enforced style, but consistent formatting makes queries reviewable:
-- Hard to readselect o.order_id,c.name,sum(oi.quantity*oi.unit_price) as total from orders o join customers c on o.customer_id=c.customer_id join order_items oi on o.order_id=oi.order_id where o.status='completed' group by o.order_id,c.name having sum(oi.quantity*oi.unit_price)>100 order by total desc;
-- Much betterSELECT o.order_id, c.name AS customer_name, SUM(oi.quantity * oi.unit_price) AS totalFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN order_items oi ON o.order_id = oi.order_idWHERE o.status = 'completed'GROUP BY o.order_id, c.nameHAVING SUM(oi.quantity * oi.unit_price) > 100ORDER BY total DESC;Consistent conventions: uppercase keywords, one clause per line, table aliases, meaningful aliases.
3. Always Qualify Column Names with Table Aliases
When joining multiple tables, prefix every column with its table alias:
-- Ambiguous: which table does 'status' come from?SELECT order_id, name, statusFROM orders JOIN customers ON orders.customer_id = customers.customer_id;
-- Clear:SELECT o.order_id, c.name, o.statusFROM orders oJOIN customers c ON o.customer_id = c.customer_id;This prevents ambiguity errors and makes it immediately clear where each column comes from.
4. Index What You Filter and Join On
An index on a column means the database can look up matching rows without scanning the entire table:
-- If you frequently filter by status and created_at:CREATE INDEX idx_orders_status_date ON orders(status, created_at DESC);
-- If you join orders to customers on customer_id:CREATE INDEX idx_orders_customer ON orders(customer_id);
-- If you filter by a text column:CREATE INDEX idx_products_category ON products(category);Index design rules:
- Columns used in
WHEREconditions — especially high-cardinality columns (email, user_id) - Columns used in
JOIN ONclauses (foreign keys) - Columns in
ORDER BYthat appear in queries withLIMIT - Composite indexes follow left-prefix rule:
(a, b, c)helps queries filtering ona,a + b, ora + b + c— but not onborcalone
5. Avoid Functions on Indexed Columns in WHERE
Wrapping an indexed column in a function prevents the index from being used:
-- Sargable (index can be used):WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
-- Not sargable (function prevents index use):WHERE YEAR(created_at) = 2025WHERE LOWER(email) = 'alice@example.com'
-- Better for case-insensitive search:-- Create a functional indexCREATE INDEX idx_users_email_lower ON users(LOWER(email));-- Then query:WHERE LOWER(email) = 'alice@example.com'6. Use EXPLAIN ANALYZE to Diagnose Slow Queries
Before optimizing a slow query, understand what the database is actually doing:
EXPLAIN ANALYZESELECT c.name, SUM(o.amount)FROM customers cJOIN orders o ON c.customer_id = o.customer_idWHERE o.status = 'completed'GROUP BY c.customer_id, c.name;What to look for:
Seq Scanon a large table → usually means a missing indexHash Join→ efficient for large datasets (good)Nested Loop→ good for small tables, bad for large unconstrained ones- High
actual rowsvsestimated rows→ stale table statistics; runANALYZE table_name - High cost nodes → focus optimization there first
7. Never Concatenate User Input into SQL
SQL injection is still the most common database vulnerability. The fix is always parameterized queries:
-- NEVER DO THIS (SQL injection risk):query = "SELECT * FROM users WHERE email = '" + user_input + "'"
-- Always use parameterized queries:-- Python (psycopg2)cursor.execute("SELECT * FROM users WHERE email = %s", (user_input,))
-- Node.js (pg)client.query("SELECT * FROM users WHERE email = $1", [userInput])
-- Java (PreparedStatement)PreparedStatement stmt = conn.prepareStatement( "SELECT * FROM users WHERE email = ?");stmt.setString(1, userEmail);The database driver handles escaping. User input is never interpreted as SQL.
8. Wrap Multi-Statement Operations in Transactions
Operations that span multiple statements should succeed or fail together:
BEGIN; -- Transfer funds between accounts UPDATE accounts SET balance = balance - 500 WHERE account_id = 101; UPDATE accounts SET balance = balance + 500 WHERE account_id = 202; -- If either fails, ROLLBACK; otherwise COMMITCOMMIT;Without a transaction, a crash between the two UPDATE statements would leave the database in an inconsistent state (money removed but not added).
9. Preview Before Deleting or Updating
Always confirm your WHERE clause selects the right rows:
-- Run this first:SELECT COUNT(*), MIN(order_date), MAX(order_date)FROM ordersWHERE status = 'cancelled' AND created_at < '2024-01-01';
-- If count is right, run:DELETE FROM ordersWHERE status = 'cancelled' AND created_at < '2024-01-01';Many engineers wrap destructive operations in a transaction with ROLLBACK until they’re certain:
BEGIN;DELETE FROM orders WHERE status = 'cancelled' AND created_at < '2024-01-01';-- Check: SELECT COUNT(*) FROM orders;ROLLBACK; -- or COMMIT once verified10. Use CTEs for Readability, Not Just DRY
CTEs (Common Table Expressions) aren’t just for reuse — they make complex queries readable by naming intermediate steps:
WITHcompleted_orders AS ( SELECT customer_id, SUM(amount) AS lifetime_value FROM orders WHERE status = 'completed' GROUP BY customer_id),high_value_customers AS ( SELECT customer_id FROM completed_orders WHERE lifetime_value > 1000)SELECT c.name, c.email, co.lifetime_valueFROM customers cJOIN completed_orders co USING (customer_id)WHERE c.customer_id IN (SELECT customer_id FROM high_value_customers);Each CTE has a name that describes what it represents — the query reads like a series of named steps.
11. Avoid SELECT DISTINCT as a Crutch
DISTINCT is often used to “fix” duplicate rows from a join. But duplicates in a join usually mean the join itself is wrong:
-- If this returns duplicates, the JOIN is probably wrongSELECT DISTINCT customer_id FROM orders o JOIN promotions p ON ...;
-- Fix the JOIN logic instead of masking it with DISTINCTLegitimate uses of DISTINCT: getting unique values from a single column, or deduplicating results from a UNION.
12. Use NULLs Deliberately
NULL means “unknown” — not zero, not empty string. Treat them intentionally:
-- Don't use 0 or '' to represent missing data-- Use NULL and handle it explicitly:WHERE phone IS NOT NULL -- filter for known phonesCOALESCE(phone, 'Unknown') -- display fallbackCOUNT(phone) vs COUNT(*) -- COUNT(column) excludes NULLsAnd remember: NULL != NULL in SQL. Comparing with = NULL always returns NULL (false). Always use IS NULL / IS NOT NULL.