SQL Joins and Table Relationships

Joins combine rows from two or more tables based on a related column. They’re how you answer questions that require data from multiple tables — “which customer placed this order?”, “which products are in each category?”, “which employees work in which departments?”.

How Joins Work

Every join needs:

Two tables (or a table joined to itself)
A join condition — how to match rows between them

SELECT columns
FROM table_a
[JOIN TYPE] JOIN table_b
    ON table_a.shared_column = table_b.shared_column;

The join condition is typically a foreign key relationship — orders.customer_id = customers.customer_id — but it can be any boolean expression.

INNER JOIN

Returns only rows that have a match in BOTH tables:

SELECT o.order_id, c.name, o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;

INNER JOIN Venn diagram

  orders      customers
  ┌──────┐   ┌──────────┐
  │      │███│          │
  │      │███│          │
  └──────┘   └──────────┘
        only matching rows

INNER JOIN and JOIN are synonymous — INNER is the default and can be omitted
Customers with no orders are excluded
Orders with an invalid customer_id are excluded

LEFT JOIN (LEFT OUTER JOIN)

Returns all rows from the left table plus matching rows from the right table. Unmatched right-table values are NULL:

-- All customers, including those with no orders
SELECT c.customer_id, c.name, COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

LEFT JOIN diagram

  customers    orders
  ┌──────────┐   ┌──────┐
  │██████████│███│      │
  │██████████│███│      │
  └──────────┘   └──────┘
  all left rows + matches

Finding rows with NO match (anti-join pattern):

-- Customers who have never placed an order
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

RIGHT JOIN (RIGHT OUTER JOIN)

The mirror of LEFT JOIN — returns all rows from the right table plus matches from the left. In practice, most developers rewrite RIGHT JOINs as LEFT JOINs by swapping the tables, which is more readable.

-- Equivalent queries:
SELECT * FROM orders o RIGHT JOIN customers c ON o.customer_id = c.customer_id;
-- Same as:
SELECT * FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id;

FULL OUTER JOIN

Returns all rows from both tables. Rows without a match in the other table have NULLs for that table’s columns:

-- All customers and all orders, even if unmatched
SELECT c.name, o.order_id, o.amount
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;

Useful for reconciliation queries — finding rows that exist in one dataset but not the other.

MySQL note: MySQL doesn’t support FULL OUTER JOIN natively. Simulate it with a UNION:

SELECT c.name, o.order_id FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id
UNION
SELECT c.name, o.order_id FROM customers c RIGHT JOIN orders o ON c.customer_id = o.customer_id;

CROSS JOIN

Returns the Cartesian product — every row from table A combined with every row from table B:

-- 12 rows if months has 12 rows and products has 10 rows: 12 × 10 = 120 rows
SELECT m.month_name, p.product_name
FROM months m
CROSS JOIN products p;

Cross joins are useful for generating combinations — size × color grids, all (user, feature) pairs for A/B testing, or date × product dimensions in analytics. Avoid accidentally creating them (a FROM a, b without a WHERE clause is an implicit cross join).

Self Join

A table joined to itself — used for hierarchical data (org charts, categories) or comparing rows within the same table:

-- Find all employees and their manager's name
SELECT
    e.employee_id,
    e.name AS employee_name,
    m.name AS manager_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;

-- Find products that cost more than the average for their category
SELECT p1.name, p1.price, p1.category
FROM products p1
JOIN (
    SELECT category, AVG(price) AS avg_price
    FROM products
    GROUP BY category
) p2 ON p1.category = p2.category
WHERE p1.price > p2.avg_price;

Joining Multiple Tables

You can chain multiple joins — the result of each join becomes the input for the next:

SELECT
    c.name AS customer_name,
    o.order_id,
    p.name AS product_name,
    oi.quantity,
    oi.unit_price
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2025-01-01'
ORDER BY o.order_date, c.name;

Using USING Instead of ON

When joining on columns with the same name in both tables, USING is a cleaner shorthand:

-- ON syntax
SELECT * FROM orders o JOIN customers c ON o.customer_id = c.customer_id;

-- USING syntax (customer_id appears only once in the result)
SELECT * FROM orders JOIN customers USING (customer_id);

-- Multiple columns
JOIN products USING (product_id, category_id)

Join Performance

Indexes on join columns. The most impactful optimization. Foreign key columns in child tables are frequently joined on and should almost always have an index:

-- Create an index on the FK column that's joined frequently
CREATE INDEX idx_orders_customer ON orders(customer_id);

Join type affects rows returned, not performance. Whether you use INNER or LEFT JOIN doesn’t directly change how fast the join executes — it affects the number of rows in the result. Performance depends on indexes, table sizes, and the query planner’s chosen algorithm (nested loop, hash join, merge join).

Check your EXPLAIN ANALYZE output:

EXPLAIN ANALYZE
SELECT c.name, SUM(o.amount)
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

Look for Hash Join (efficient for larger datasets), Nested Loop (good for small tables or indexed lookups), and Seq Scan on large tables (usually means a missing index).

Common Mistakes

Forgetting the join condition:

-- Missing ON clause = implicit CROSS JOIN (every row × every row)
SELECT * FROM orders, customers;  -- dangerous!

Joining on a non-indexed column in a large table: The database falls back to a full scan of both tables to find matches.

NULL matching: NULL = NULL is always NULL (not TRUE) in SQL. Rows with NULL in the join column will never match, regardless of join type.

Duplicate rows from one-to-many joins: If one customer has 10 orders, joining to customer details produces 10 rows per customer. When aggregating, be aware of what’s being counted:

-- WRONG: counts order rows, not distinct customers
SELECT COUNT(*) FROM customers c JOIN orders o ON c.customer_id = o.customer_id;

-- RIGHT: distinct customers who have orders
SELECT COUNT(DISTINCT c.customer_id) FROM customers c JOIN orders o ON c.customer_id = o.customer_id;