Data Build Tools
Core Foundations of dbt
- What is dbt
- ELT vs ETL
- dbt Project Structure
- dbt_project.yml
- profiles.yml
- dbt Core vs dbt Cloud
- dbt Environment Management
Models and Materializations
๐ง dbt Model Dependencies: ref()
In data analytics engineering, managing relationships between data models is the backbone of efficient pipelines.
Imagine youโre transforming raw data into insights โ one SQL model depends on another. You need a system that automatically knows the order in which to run these models.
Thatโs where dbt (Data Build Tool) shines with its ref()
function, which links models together and builds a dependency graph automatically.
In this guide, youโll learn:
- What model dependencies are in dbt
- How
ref()
works - Three practical coding examples
- A visual Mer
- Memory tips for interviews
- Why itโs crucial for scalable data projects
๐ What Are Model Dependencies in dbt?
In dbt, each model is a SQL file that represents a transformation. However, these models often depend on each other.
For example:
- You first clean raw data in one model (
stg_orders.sql
). - Then you use that cleaned data in another model (
customer_orders.sql
).
To link them properly, dbt provides the ref()
function, which:
โ Creates a dependency between models. โ Ensures that models run in the correct order. โ Automatically updates schema or table references.
โ๏ธ How ref()
Works
The syntax is simple yet powerful:
select * from {{ ref('model_name') }}
- The
{{ }}
syntax indicates a Jinja expression. - The
ref()
function tells dbt: โThis model depends on another one named'model_name'
.โ
When dbt runs, it builds a Directed Acyclic Graph (DAG) โ a visual map showing model dependencies.
โ Example:
select * from {{ ref('stg_orders') }}
This means: โUse the output of stg_orders.sql
as input for this model.โ
๐งญ Model Dependency Flow
Hereโs how model dependencies work in dbt:
โโโโโโโโโโโโโโโโโโโโโโโโ โ raw_orders (source) โ โโโโโโโโโโโโโโโโโโโโโโโโ โ โผ โโโโโโโโโโโโโโโโโโโโโโโโ โ stg_orders (model) โ โ Cleans data โโโโโโโโโโโโโโโโโโโโโโโโ โ โผ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ customer_orders (model) โ โ Uses ref('stg_orders') โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โผ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ sales_summary (model) โ โ Uses ref('customer_orders') โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
This chain shows that dbt runs models in dependency order โ bottom models depend on top models.
๐งฉ Example Set 1: Basic Model Dependencies
Letโs start with a simple scenario.
๐งฎ Example 1: Building a Staging Model
File: models/staging/stg_customers.sql
-- Cleans raw customer dataselect id as customer_id, trim(name) as customer_name, lower(email) as emailfrom {{ source('raw', 'customers') }}
This model pulls raw data from the source and cleans it.
๐งฎ Example 2: Creating a Core Model that Depends on the Staging Model
File: models/core/customer_orders.sql
-- Aggregates data by customerselect c.customer_id, c.customer_name, count(o.order_id) as total_orders, sum(o.amount) as total_spentfrom {{ ref('stg_customers') }} as cjoin {{ ref('stg_orders') }} as o on c.customer_id = o.customer_idgroup by 1, 2
Here, dbt knows this model depends on both stg_customers
and stg_orders
.
โ dbt automatically builds them first before executing this model.
๐งฎ Example 3: Final Summary Model
File: models/marts/sales_summary.sql
-- Final summary for business reportsselect region, sum(total_spent) as revenue, count(distinct customer_id) as customersfrom {{ ref('customer_orders') }}group by region
This final layer uses data from the core model, forming a multi-level dependency chain.
๐ How dbt Builds the Dependency Graph (DAG)
When you run:
dbt run
dbt automatically:
- Scans each model for
ref()
calls. - Builds a dependency graph.
- Executes models in topological order.
Example build order:
stg_customers โ stg_orders โ customer_orders โ sales_summary
You can visualize this with:
dbt docs generatedbt docs serve
Then, open the DAG in your browser โ a perfect visualization of how models connect.
๐งฉ Example Set 2: Intermediate Dependency Scenarios
Now, letโs explore more advanced use cases.
๐งฎ Example 1: Conditional References
{% if target.name == 'prod' %} select * from {{ ref('sales_summary') }}{% else %} select * from {{ ref('sales_summary_dev') }}{% endif %}
๐ก Use Case: This approach lets you dynamically reference different models for different environments (like dev or prod).
๐งฎ Example 2: Multiple Ref() in Joins
select p.product_id, p.product_name, sum(o.quantity) as total_soldfrom {{ ref('stg_products') }} pjoin {{ ref('stg_orders') }} o on p.product_id = o.product_idgroup by 1, 2
Here, both models stg_products
and stg_orders
are dependencies. dbt ensures both are ready before running this.
๐งฎ Example 3: Ref with CTEs (Common Table Expressions)
with customer_data as ( select * from {{ ref('stg_customers') }} ), order_data as ( select * from {{ ref('stg_orders') }} )select c.customer_id, c.customer_name, sum(o.amount) as total_spentfrom customer_data cjoin order_data o on c.customer_id = o.customer_idgroup by 1, 2
Using ref()
inside CTEs keeps your SQL modular and readable.
๐งฉ Example Set 3: Advanced Project-Level Dependencies
๐งฎ Example 1: Cross-Project Reference
If you organize models by folders, you can use relative paths in ref()
.
select * from {{ ref('marts.sales_summary') }}
This tells dbt to look inside the marts folder for the sales_summary.sql
model.
๐งฎ Example 2: Macros Using Ref()
You can even call ref()
inside custom macros to make dynamic SQL reusable.
{% macro union_models(model_list) %} {% for model in model_list %} select * from {{ ref(model) }} {% if not loop.last %} union all {% endif %} {% endfor %}{% endmacro %}
Usage:
{{ union_models(['stg_orders', 'stg_customers']) }}
๐งฎ Example 3: Ref() with Incremental Models
You can combine ref()
with incremental strategies for large datasets.
{{ config(materialized='incremental') }}
select *from {{ ref('stg_transactions') }}where updated_at > (select max(updated_at) from {{ this }})
Here, dbt only processes new records, ensuring faster builds.
๐ง How to Remember dbt Model Dependencies (for Interview/Exam)
To remember how dbt links models, use the mnemonic โR-D-Gโ:
Letter | Concept | Memory Tip |
---|---|---|
R | Ref() | โUse ref() to reference another model.โ |
D | Dependencies | โRef builds the dependency DAG.โ |
G | Graph (DAG) | โdbt visualizes the graph automatically.โ |
๐ก Interview Flashcards
Question | Answer |
---|---|
What does ref() do in dbt? | It links models and defines dependencies. |
Why not hardcode table names? | ref() automatically updates schema references. |
How does dbt know model order? | It scans ref() calls to build a DAG. |
What is a DAG in dbt? | Directed Acyclic Graph showing model dependencies. |
Can you use ref() in macros? | Yes, to dynamically reference models. |
What happens if a referenced model fails? | dbt halts the run for dependent models. |
โก Why Itโs Important to Learn Model Dependencies
Understanding model dependencies is crucial for professional dbt use.
Benefit | Description |
---|---|
Automation | dbt runs models in the right order automatically |
Scalability | Simplifies managing 100s of interdependent models |
Clarity | DAG visualizes relationships for better understanding |
Maintainability | Easier to debug and modify pipelines |
Reusability | ref() enables modular design across teams |
Environment Agility | Automatically adapts to schema changes |
Collaboration | Developers can work independently on dependent models |
๐งฑ Best Practices
Practice | Description |
---|---|
Always use ref() instead of hardcoding | Ensures portability and dependency tracking |
Name models clearly | Helps visualize DAG structure |
Avoid circular dependencies | Keep graph acyclic (no loops) |
Group models by layer | Example: staging โ core โ marts |
Use macros for repetitive ref logic | Promotes clean, DRY code |
Visualize DAG regularly | Detects broken or missing dependencies early |
๐งฉ Common Mistakes & Fixes
Mistake | Problem | Fix |
---|---|---|
Hardcoded table names | Schema breaks after deployment | Replace with ref() |
Circular references | dbt run fails | Reorganize dependencies |
Model naming inconsistency | DAG confusion | Use naming convention (stg_ , core_ , mart_ ) |
No DAG visualization | Hidden dependency errors | Run dbt docs serve regularly |
๐ Real-World Use Cases
Industry | Use Case |
---|---|
E-Commerce | Chain of dependencies from raw orders โ cleaned โ sales summary |
Finance | Data lineage tracking for compliance |
Marketing | Campaign-level performance aggregation |
Healthcare | Layered models for patient, hospital, and outcome analysis |
SaaS Analytics | Building metrics dashboards across multiple teams |
โ๏ธ Visualizing Dependencies in dbt
You can view model relationships interactively using the dbt Docs tool.
Commands:
dbt docs generatedbt docs serve
The output shows a graph of models (DAG) โ each node represents a model, and edges represent ref()
relationships.
This makes debugging or optimizing pipelines much easier.
๐ง Quick Recap
โ
Model Dependencies define how models rely on one another.
โ
ref()
is the key dbt function for linking models.
โ
dbt builds a DAG to execute models in correct order.
โ
You can use ref()
in models, macros, and even conditionals.
โ
It ensures consistency, automation, and maintainability.
๐ Final Thoughts
Mastering model dependencies in dbt through the ref()
function is like mastering the foundation of a data city โ every building (model) stands on another, and ref()
is the blueprint that keeps everything connected.
Whether youโre building a simple two-model project or an enterprise-scale pipeline with 500 models, ref()
keeps your transformations organized, traceable, and scalable.
โWithout
ref()
, dbt would just be SQL scripts. Withref()
, it becomes an intelligent, interconnected data system.โ