`source` and `ref` Functions in dbt

In data transformation landscape, dbt (Data Build Tool) has emerged as a powerful tool that simplifies how organizations manage their data models and analytics workflows. Two fundamental functions in dbt, source and ref, play a crucial role in structuring and maintaining scalable, efficient, and well-documented data pipelines.

✔ source Function – Helps reference raw data tables from an external source.
✔ ref Function – Enables referencing models within a dbt project to ensure modularity and dependency management.

By the end of this guide, you will understand how source and ref improve dbt projects, where to use them, and how to implement them in real-world scenarios.

1. Understanding `source` and `ref` Functions in dbt

1.1 What is the `source` Function in dbt?

The source function in dbt is used to reference raw tables that exist in an external database (e.g., Snowflake, BigQuery, Redshift). It helps document data sources and ensures transparency by defining where raw data originates.

Example Usage:

SELECT * FROM {{ source('ecommerce', 'orders') }}

✔ Refers to the “orders” table from the “ecommerce” schema.

1.2 What is the `ref` Function in dbt?

The ref function is used to reference other dbt models within a project. It ensures dependencies are automatically handled and maintains a modular and reusable codebase.

Example Usage:

SELECT * FROM {{ ref('customer_orders') }}

✔ Refers to the transformed customer_orders model within dbt.

2. Key Benefits of `source` and `ref` in dbt

Feature	`source`	`ref`
References external raw tables	✅	❌
References transformed dbt models	❌	✅
Improves data lineage tracking	✅	✅
Ensures modular and reusable queries	❌	✅
Works with automated documentation	✅	✅
Helps in dependency management	❌	✅

✔ Using source ensures clear documentation of raw data origins.
✔ Using ref helps maintain model relationships dynamically.

3. Some useful Examples of `source` and `ref` in dbt

Example 1: Defining a `source` for Orders Data in an E-commerce Store

Scenario: An e-commerce business loads raw order data into a warehouse (Snowflake/BigQuery). We want to reference it correctly using the source function.

Step 1: Define the source in sources.yml

version: 2

sources:
  - name: ecommerce
    description: "Raw e-commerce data"
    schema: raw_data
    tables:
      - name: orders
        description: "Contains all order transactions"
        columns:
          - name: order_id
            description: "Unique order identifier"
          - name: customer_id
            description: "ID of the customer who placed the order"

Step 2: Use the source function in a dbt model

SELECT
    order_id,
    customer_id,
    total_amount,
    order_date
FROM {{ source('ecommerce', 'orders') }}

✔ References orders from the ecommerce source.
✔ Ensures proper documentation and clear lineage.

Example 2: Creating a Transformed Model Using `ref`

Scenario: We need to create a cleaned customer orders dataset from the raw orders table.

Step 1: Create a dbt model (customer_orders.sql) using ref

WITH orders AS (
    SELECT * FROM {{ source('ecommerce', 'orders') }}
)

SELECT
    order_id,
    customer_id,
    total_amount,
    order_date,
    CASE
        WHEN total_amount > 100 THEN 'VIP'
        ELSE 'Regular'
    END AS customer_type
FROM orders

✔ Uses source to reference raw orders.
✔ Transforms and categorizes customers into VIP and Regular.

Example 3: Building a Sales Aggregation Model Using `ref`

Scenario: The finance team needs a report summarizing monthly revenue.

Step 1: Create monthly_revenue.sql Model Using ref

WITH customer_orders AS (
    SELECT * FROM {{ ref('customer_orders') }}
)

SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(total_amount) AS monthly_revenue
FROM customer_orders
GROUP BY month
ORDER BY month

✔ Uses ref to pull transformed customer orders.
✔ Computes monthly revenue for financial analysis.

Example 4: Linking Customers with Orders Using `ref`

Scenario: A marketing team wants to combine customer details with order history for targeted campaigns.

Step 1: Create a dbt model (customer_order_summary.sql)

WITH customers AS (
    SELECT * FROM {{ source('ecommerce', 'customers') }}
)

, orders AS (
    SELECT * FROM {{ ref('customer_orders') }}
)

SELECT
    customers.customer_id,
    customers.full_name,
    COUNT(orders.order_id) AS total_orders,
    SUM(orders.total_amount) AS total_spent
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id
GROUP BY customers.customer_id, customers.full_name

✔ Uses source for customers and ref for orders.
✔ Links orders and customer data for personalized marketing.

Example 5: Data Quality Check with `source` and `ref`

Scenario: The data team wants to validate data completeness by counting records in sources vs. models.

Step 1: Create a dbt test model (data_quality_check.sql)

WITH source_orders AS (
    SELECT COUNT(*) AS source_count FROM {{ source('ecommerce', 'orders') }}
)

, transformed_orders AS (
    SELECT COUNT(*) AS model_count FROM {{ ref('customer_orders') }}
)

SELECT
    source_orders.source_count,
    transformed_orders.model_count,
    CASE
        WHEN source_orders.source_count = transformed_orders.model_count THEN 'Valid'
        ELSE 'Data Mismatch'
    END AS data_check_status
FROM source_orders, transformed_orders

✔ Compares raw vs. transformed row counts.
✔ Flags potential data loss or transformation issues.

4. When and Where to Use `source` and `ref`

When to Use `source`

✔ When referencing raw tables from external databases.
✔ When defining data lineage and documentation.
✔ When ensuring data validation and transparency.

When to Use `ref`

✔ When referencing other dbt models within a project.
✔ When ensuring dependencies are correctly handled.
✔ When maintaining modular and reusable SQL code.

5. How to Use `source` and `ref` in Your dbt Project

Step 1: Install dbt

pip install dbt

Step 2: Configure dbt Connection (`profiles.yml`)

Define database credentials for Snowflake, BigQuery, or Redshift.

Step 3: Define `sources.yml` to Document Raw Data

Create a sources file and specify raw tables.

Step 4: Use `source` and `ref` in dbt Models

Write SQL transformations using source (for raw data) and ref (for transformed models).

Step 5: Run dbt Models

dbt run

Step 6: Test Data Quality

dbt test

Mastering source and ref in dbt ensures clean, structured, and scalable data transformations. These functions simplify data management, improve model dependencies, and enhance transparency in modern analytics workflows.

Start using source and ref today to build robust, scalable, and well-documented dbt projects! 🚀

Data Build Tools

source and ref Functions in dbt

1. Understanding source and ref Functions in dbt

1.1 What is the source Function in dbt?

1.2 What is the ref Function in dbt?

2. Key Benefits of source and ref in dbt

3. Some useful Examples of source and ref in dbt

Example 1: Defining a source for Orders Data in an E-commerce Store

Example 2: Creating a Transformed Model Using ref

Example 3: Building a Sales Aggregation Model Using ref

Example 4: Linking Customers with Orders Using ref

Example 5: Data Quality Check with source and ref

4. When and Where to Use source and ref

When to Use source

When to Use ref

5. How to Use source and ref in Your dbt Project