Data Build Tools
Core Foundations of dbt
- What is dbt
- ELT vs ETL
- dbt Project Structure
- dbt_project.yml
- profiles.yml
- dbt Core vs dbt Cloud
- dbt Environment Management
Models and Materializations
source and ref Functions in dbt
In data transformation landscape, dbt (Data Build Tool) has emerged as a powerful tool that simplifies how organizations manage their data models and analytics workflows. Two fundamental functions in dbt, source and ref, play a crucial role in structuring and maintaining scalable, efficient, and well-documented data pipelines.
✔ source Function – Helps reference raw data tables from an external source.
✔ ref Function – Enables referencing models within a dbt project to ensure modularity and dependency management.
By the end of this guide, you will understand how source and ref improve dbt projects, where to use them, and how to implement them in real-world scenarios.
1. Understanding source and ref Functions in dbt
1.1 What is the source Function in dbt?
The source function in dbt is used to reference raw tables that exist in an external database (e.g., Snowflake, BigQuery, Redshift). It helps document data sources and ensures transparency by defining where raw data originates.
Example Usage:
SELECT * FROM {{ source('ecommerce', 'orders') }}✔ Refers to the “orders” table from the “ecommerce” schema.
1.2 What is the ref Function in dbt?
The ref function is used to reference other dbt models within a project. It ensures dependencies are automatically handled and maintains a modular and reusable codebase.
Example Usage:
SELECT * FROM {{ ref('customer_orders') }}✔ Refers to the transformed customer_orders model within dbt.
2. Key Benefits of source and ref in dbt
| Feature | source | ref |
|---|---|---|
| References external raw tables | ✅ | ❌ |
| References transformed dbt models | ❌ | ✅ |
| Improves data lineage tracking | ✅ | ✅ |
| Ensures modular and reusable queries | ❌ | ✅ |
| Works with automated documentation | ✅ | ✅ |
| Helps in dependency management | ❌ | ✅ |
✔ Using source ensures clear documentation of raw data origins.
✔ Using ref helps maintain model relationships dynamically.
3. Some useful Examples of source and ref in dbt
Example 1: Defining a source for Orders Data in an E-commerce Store
Scenario: An e-commerce business loads raw order data into a warehouse (Snowflake/BigQuery). We want to reference it correctly using the source function.
Step 1: Define the source in sources.yml
version: 2
sources: - name: ecommerce description: "Raw e-commerce data" schema: raw_data tables: - name: orders description: "Contains all order transactions" columns: - name: order_id description: "Unique order identifier" - name: customer_id description: "ID of the customer who placed the order"Step 2: Use the source function in a dbt model
SELECT order_id, customer_id, total_amount, order_dateFROM {{ source('ecommerce', 'orders') }}✔ References orders from the ecommerce source.
✔ Ensures proper documentation and clear lineage.
Example 2: Creating a Transformed Model Using ref
Scenario: We need to create a cleaned customer orders dataset from the raw orders table.
Step 1: Create a dbt model (customer_orders.sql) using ref
WITH orders AS ( SELECT * FROM {{ source('ecommerce', 'orders') }})
SELECT order_id, customer_id, total_amount, order_date, CASE WHEN total_amount > 100 THEN 'VIP' ELSE 'Regular' END AS customer_typeFROM orders✔ Uses source to reference raw orders.
✔ Transforms and categorizes customers into VIP and Regular.
Example 3: Building a Sales Aggregation Model Using ref
Scenario: The finance team needs a report summarizing monthly revenue.
Step 1: Create monthly_revenue.sql Model Using ref
WITH customer_orders AS ( SELECT * FROM {{ ref('customer_orders') }})
SELECT DATE_TRUNC('month', order_date) AS month, SUM(total_amount) AS monthly_revenueFROM customer_ordersGROUP BY monthORDER BY month✔ Uses ref to pull transformed customer orders.
✔ Computes monthly revenue for financial analysis.
Example 4: Linking Customers with Orders Using ref
Scenario: A marketing team wants to combine customer details with order history for targeted campaigns.
Step 1: Create a dbt model (customer_order_summary.sql)
WITH customers AS ( SELECT * FROM {{ source('ecommerce', 'customers') }})
, orders AS ( SELECT * FROM {{ ref('customer_orders') }})
SELECT customers.customer_id, customers.full_name, COUNT(orders.order_id) AS total_orders, SUM(orders.total_amount) AS total_spentFROM customersLEFT JOIN orders ON customers.customer_id = orders.customer_idGROUP BY customers.customer_id, customers.full_name✔ Uses source for customers and ref for orders.
✔ Links orders and customer data for personalized marketing.
Example 5: Data Quality Check with source and ref
Scenario: The data team wants to validate data completeness by counting records in sources vs. models.
Step 1: Create a dbt test model (data_quality_check.sql)
WITH source_orders AS ( SELECT COUNT(*) AS source_count FROM {{ source('ecommerce', 'orders') }})
, transformed_orders AS ( SELECT COUNT(*) AS model_count FROM {{ ref('customer_orders') }})
SELECT source_orders.source_count, transformed_orders.model_count, CASE WHEN source_orders.source_count = transformed_orders.model_count THEN 'Valid' ELSE 'Data Mismatch' END AS data_check_statusFROM source_orders, transformed_orders✔ Compares raw vs. transformed row counts.
✔ Flags potential data loss or transformation issues.
4. When and Where to Use source and ref
When to Use source
✔ When referencing raw tables from external databases.
✔ When defining data lineage and documentation.
✔ When ensuring data validation and transparency.
When to Use ref
✔ When referencing other dbt models within a project.
✔ When ensuring dependencies are correctly handled.
✔ When maintaining modular and reusable SQL code.
5. How to Use source and ref in Your dbt Project
Step 1: Install dbt
pip install dbtStep 2: Configure dbt Connection (profiles.yml)
Define database credentials for Snowflake, BigQuery, or Redshift.
Step 3: Define sources.yml to Document Raw Data
Create a sources file and specify raw tables.
Step 4: Use source and ref in dbt Models
Write SQL transformations using source (for raw data) and ref (for transformed models).
Step 5: Run dbt Models
dbt runStep 6: Test Data Quality
dbt testMastering source and ref in dbt ensures clean, structured, and scalable data transformations. These functions simplify data management, improve model dependencies, and enhance transparency in modern analytics workflows.
Start using source and ref today to build robust, scalable, and well-documented dbt projects! 🚀