Data Build Tools
- dbt (Data Build Tool) for Scalable Data Transformation
- dbt Workflow for Efficient Data Ingestion and Transformation
- How DBT Works
- Enhancing Your Data Workflow
- Transforming Data with dbt
- Build Your DAG Using DBT
- dbt Semantic Layer
- First project setup for DBT
- Unveiling the Power of profiles.yml in DBT
- source and ref Functions in dbt
source
and ref
Functions in dbt
In data transformation landscape, dbt (Data Build Tool) has emerged as a powerful tool that simplifies how organizations manage their data models and analytics workflows. Two fundamental functions in dbt, source
and ref
, play a crucial role in structuring and maintaining scalable, efficient, and well-documented data pipelines.
✔ source
Function – Helps reference raw data tables from an external source.
✔ ref
Function – Enables referencing models within a dbt project to ensure modularity and dependency management.
By the end of this guide, you will understand how source
and ref
improve dbt projects, where to use them, and how to implement them in real-world scenarios.
1. Understanding source
and ref
Functions in dbt
1.1 What is the source
Function in dbt?
The source
function in dbt is used to reference raw tables that exist in an external database (e.g., Snowflake, BigQuery, Redshift). It helps document data sources and ensures transparency by defining where raw data originates.
Example Usage:
SELECT * FROM {{ source('ecommerce', 'orders') }}
✔ Refers to the “orders” table from the “ecommerce” schema.
1.2 What is the ref
Function in dbt?
The ref
function is used to reference other dbt models within a project. It ensures dependencies are automatically handled and maintains a modular and reusable codebase.
Example Usage:
SELECT * FROM {{ ref('customer_orders') }}
✔ Refers to the transformed customer_orders
model within dbt.
2. Key Benefits of source
and ref
in dbt
Feature | source | ref |
---|---|---|
References external raw tables | ✅ | ❌ |
References transformed dbt models | ❌ | ✅ |
Improves data lineage tracking | ✅ | ✅ |
Ensures modular and reusable queries | ❌ | ✅ |
Works with automated documentation | ✅ | ✅ |
Helps in dependency management | ❌ | ✅ |
✔ Using source
ensures clear documentation of raw data origins.
✔ Using ref
helps maintain model relationships dynamically.
3. Some useful Examples of source
and ref
in dbt
Example 1: Defining a source
for Orders Data in an E-commerce Store
Scenario: An e-commerce business loads raw order data into a warehouse (Snowflake/BigQuery). We want to reference it correctly using the source
function.
Step 1: Define the source
in sources.yml
version: 2
sources:
- name: ecommerce
description: "Raw e-commerce data"
schema: raw_data
tables:
- name: orders
description: "Contains all order transactions"
columns:
- name: order_id
description: "Unique order identifier"
- name: customer_id
description: "ID of the customer who placed the order"
Step 2: Use the source
function in a dbt model
SELECT
order_id,
customer_id,
total_amount,
order_date
FROM {{ source('ecommerce', 'orders') }}
✔ References orders from the ecommerce source.
✔ Ensures proper documentation and clear lineage.
Example 2: Creating a Transformed Model Using ref
Scenario: We need to create a cleaned customer orders dataset from the raw orders table.
Step 1: Create a dbt model (customer_orders.sql
) using ref
WITH orders AS (
SELECT * FROM {{ source('ecommerce', 'orders') }}
)
SELECT
order_id,
customer_id,
total_amount,
order_date,
CASE
WHEN total_amount > 100 THEN 'VIP'
ELSE 'Regular'
END AS customer_type
FROM orders
✔ Uses source
to reference raw orders.
✔ Transforms and categorizes customers into VIP and Regular.
Example 3: Building a Sales Aggregation Model Using ref
Scenario: The finance team needs a report summarizing monthly revenue.
Step 1: Create monthly_revenue.sql
Model Using ref
WITH customer_orders AS (
SELECT * FROM {{ ref('customer_orders') }}
)
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS monthly_revenue
FROM customer_orders
GROUP BY month
ORDER BY month
✔ Uses ref
to pull transformed customer orders.
✔ Computes monthly revenue for financial analysis.
Example 4: Linking Customers with Orders Using ref
Scenario: A marketing team wants to combine customer details with order history for targeted campaigns.
Step 1: Create a dbt model (customer_order_summary.sql
)
WITH customers AS (
SELECT * FROM {{ source('ecommerce', 'customers') }}
)
, orders AS (
SELECT * FROM {{ ref('customer_orders') }}
)
SELECT
customers.customer_id,
customers.full_name,
COUNT(orders.order_id) AS total_orders,
SUM(orders.total_amount) AS total_spent
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id
GROUP BY customers.customer_id, customers.full_name
✔ Uses source
for customers and ref
for orders.
✔ Links orders and customer data for personalized marketing.
Example 5: Data Quality Check with source
and ref
Scenario: The data team wants to validate data completeness by counting records in sources vs. models.
Step 1: Create a dbt test model (data_quality_check.sql
)
WITH source_orders AS (
SELECT COUNT(*) AS source_count FROM {{ source('ecommerce', 'orders') }}
)
, transformed_orders AS (
SELECT COUNT(*) AS model_count FROM {{ ref('customer_orders') }}
)
SELECT
source_orders.source_count,
transformed_orders.model_count,
CASE
WHEN source_orders.source_count = transformed_orders.model_count THEN 'Valid'
ELSE 'Data Mismatch'
END AS data_check_status
FROM source_orders, transformed_orders
✔ Compares raw vs. transformed row counts.
✔ Flags potential data loss or transformation issues.
4. When and Where to Use source
and ref
When to Use source
✔ When referencing raw tables from external databases.
✔ When defining data lineage and documentation.
✔ When ensuring data validation and transparency.
When to Use ref
✔ When referencing other dbt models within a project.
✔ When ensuring dependencies are correctly handled.
✔ When maintaining modular and reusable SQL code.
5. How to Use source
and ref
in Your dbt Project
Step 1: Install dbt
pip install dbt
Step 2: Configure dbt Connection (profiles.yml
)
Define database credentials for Snowflake, BigQuery, or Redshift.
Step 3: Define sources.yml
to Document Raw Data
Create a sources file and specify raw tables.
Step 4: Use source
and ref
in dbt Models
Write SQL transformations using source
(for raw data) and ref
(for transformed models).
Step 5: Run dbt Models
dbt run
Step 6: Test Data Quality
dbt test
Mastering source
and ref
in dbt ensures clean, structured, and scalable data transformations. These functions simplify data management, improve model dependencies, and enhance transparency in modern analytics workflows.
Start using source
and ref
today to build robust, scalable, and well-documented dbt projects! 🚀