Technology  /  SQL

🗄️ SQL 40 guides · updated 2026

The language of data — from SELECT and JOINs to window functions, query plans, and the performance tuning that separates juniors from seniors.

SQL DISTINCT

DISTINCT eliminates duplicate rows from query results. It’s used in SELECT to return only unique combinations of the selected columns.


Basic Usage

-- All unique city values
SELECT DISTINCT city FROM customers;
-- All unique status values
SELECT DISTINCT status FROM orders;

Without DISTINCT, these queries return one row per customer or order — with many duplicates if multiple customers are in the same city.


DISTINCT Applies to the Entire Row

DISTINCT evaluates the combination of all selected columns, not just the first one:

-- All unique city + country combinations
SELECT DISTINCT city, country
FROM customers
ORDER BY country, city;
-- A row is only excluded if BOTH city AND country are identical to another row

There’s no way to apply DISTINCT to only one column in a multi-column SELECT while keeping all other columns — use GROUP BY or a window function for that.


COUNT DISTINCT

Count unique values in a column:

SELECT
COUNT(*) AS total_orders,
COUNT(DISTINCT customer_id) AS unique_customers,
COUNT(DISTINCT product_id) AS unique_products_sold,
COUNT(DISTINCT DATE_TRUNC('day', order_date)) AS active_days
FROM orders
WHERE status = 'completed';

COUNT(DISTINCT column) ignores NULLs — it counts only unique non-null values.


DISTINCT vs GROUP BY

DISTINCT and GROUP BY both deduplicate, but they serve different purposes:

-- DISTINCT: just deduplicate (no aggregation)
SELECT DISTINCT city FROM customers;
-- GROUP BY: deduplicate AND aggregate
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city;
-- Both return unique cities, but GROUP BY adds the count

When you only need unique values (no aggregation), DISTINCT is simpler. When you need aggregates, use GROUP BY.

Performance: On large datasets with good indexes, both are similar in cost. Without indexes, both require sorting or hashing to find duplicates.


DISTINCT in Practice: Deduplicating Data

A common use case is finding what distinct values exist in a dimension column:

-- What categories do we have?
SELECT DISTINCT category FROM products ORDER BY category;
-- What countries are our customers from?
SELECT DISTINCT country FROM customers ORDER BY country;
-- What status values exist in our orders?
SELECT DISTINCT status FROM orders ORDER BY status;

When DISTINCT Masks a Problem

DISTINCT is sometimes used to “fix” a query that returns duplicates because of a bad join:

-- This returns duplicates if a customer has multiple orders
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
-- DISTINCT hides the root cause: this should be:
SELECT customer_id, name FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders);
-- Or: check if the join is correct

If you’re adding DISTINCT to remove duplicates and aren’t sure why duplicates appear, investigate the join logic first — DISTINCT may be masking a deeper issue.


DISTINCT ON (PostgreSQL)

PostgreSQL’s DISTINCT ON keeps one row per group of the specified column(s), choosing which row by the ORDER BY clause. This is a PostgreSQL extension — not standard SQL:

-- Get the most recent order for each customer
SELECT DISTINCT ON (customer_id)
customer_id,
order_id,
amount,
created_at
FROM orders
ORDER BY customer_id, created_at DESC;

DISTINCT ON (customer_id) keeps the first row per customer in the ORDER BY sequence. Since we order by created_at DESC, we get the most recent order.

The equivalent with a window function (standard SQL):

SELECT customer_id, order_id, amount, created_at
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders
) t
WHERE rn = 1;

DISTINCT in Subqueries

-- Products that have been sold (without duplicates from multiple orders)
SELECT DISTINCT product_id FROM order_items;
-- Count of products ever sold
SELECT COUNT(DISTINCT product_id) AS products_sold FROM order_items;
-- Customers who ordered in more than one month
SELECT customer_id
FROM orders
GROUP BY customer_id
HAVING COUNT(DISTINCT DATE_TRUNC('month', order_date)) > 1;

Performance Considerations

DISTINCT requires comparing all selected columns across all rows to find duplicates. This involves either:

Both are expensive on large result sets. If you find yourself using DISTINCT on a large table frequently:

  1. Make sure an index exists on the columns
  2. Consider whether the query structure could return fewer duplicates before DISTINCT
  3. Check if EXISTS or a JOIN restructure could avoid the deduplication entirely