Data Build Tools
Core Foundations of dbt
- What is dbt
- ELT vs ETL
- dbt Project Structure
- dbt_project.yml
- profiles.yml
- dbt Core vs dbt Cloud
- dbt Environment Management
Models and Materializations
๐ง dbt Model Dependencies: ref()
In data analytics engineering, managing relationships between data models is the backbone of efficient pipelines.
Imagine youโre transforming raw data into insights โ one SQL model depends on another. You need a system that automatically knows the order in which to run these models.
Thatโs where dbt (Data Build Tool) shines with its ref() function, which links models together and builds a dependency graph automatically.
In this guide, youโll learn:
- What model dependencies are in dbt
- How
ref()works - Three practical coding examples
- A visual Mer
- Memory tips for interviews
- Why itโs crucial for scalable data projects
๐ What Are Model Dependencies in dbt?
In dbt, each model is a SQL file that represents a transformation. However, these models often depend on each other.
For example:
- You first clean raw data in one model (
stg_orders.sql). - Then you use that cleaned data in another model (
customer_orders.sql).
To link them properly, dbt provides the ref() function, which:
โ Creates a dependency between models. โ Ensures that models run in the correct order. โ Automatically updates schema or table references.
โ๏ธ How ref() Works
The syntax is simple yet powerful:
select * from {{ ref('model_name') }}- The
{{ }}syntax indicates a Jinja expression. - The
ref()function tells dbt: โThis model depends on another one named'model_name'.โ
When dbt runs, it builds a Directed Acyclic Graph (DAG) โ a visual map showing model dependencies.
โ Example:
select * from {{ ref('stg_orders') }}This means: โUse the output of stg_orders.sql as input for this model.โ
๐งญ Model Dependency Flow
Hereโs how model dependencies work in dbt:
โโโโโโโโโโโโโโโโโโโโโโโโ โ raw_orders (source) โ โโโโโโโโโโโโโโโโโโโโโโโโ โ โผ โโโโโโโโโโโโโโโโโโโโโโโโ โ stg_orders (model) โ โ Cleans data โโโโโโโโโโโโโโโโโโโโโโโโ โ โผ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ customer_orders (model) โ โ Uses ref('stg_orders') โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โผ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ sales_summary (model) โ โ Uses ref('customer_orders') โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโThis chain shows that dbt runs models in dependency order โ bottom models depend on top models.
๐งฉ Example Set 1: Basic Model Dependencies
Letโs start with a simple scenario.
๐งฎ Example 1: Building a Staging Model
File: models/staging/stg_customers.sql
-- Cleans raw customer dataselect id as customer_id, trim(name) as customer_name, lower(email) as emailfrom {{ source('raw', 'customers') }}This model pulls raw data from the source and cleans it.
๐งฎ Example 2: Creating a Core Model that Depends on the Staging Model
File: models/core/customer_orders.sql
-- Aggregates data by customerselect c.customer_id, c.customer_name, count(o.order_id) as total_orders, sum(o.amount) as total_spentfrom {{ ref('stg_customers') }} as cjoin {{ ref('stg_orders') }} as o on c.customer_id = o.customer_idgroup by 1, 2Here, dbt knows this model depends on both stg_customers and stg_orders.
โ dbt automatically builds them first before executing this model.
๐งฎ Example 3: Final Summary Model
File: models/marts/sales_summary.sql
-- Final summary for business reportsselect region, sum(total_spent) as revenue, count(distinct customer_id) as customersfrom {{ ref('customer_orders') }}group by regionThis final layer uses data from the core model, forming a multi-level dependency chain.
๐ How dbt Builds the Dependency Graph (DAG)
When you run:
dbt rundbt automatically:
- Scans each model for
ref()calls. - Builds a dependency graph.
- Executes models in topological order.
Example build order:
stg_customers โ stg_orders โ customer_orders โ sales_summaryYou can visualize this with:
dbt docs generatedbt docs serveThen, open the DAG in your browser โ a perfect visualization of how models connect.
๐งฉ Example Set 2: Intermediate Dependency Scenarios
Now, letโs explore more advanced use cases.
๐งฎ Example 1: Conditional References
{% if target.name == 'prod' %} select * from {{ ref('sales_summary') }}{% else %} select * from {{ ref('sales_summary_dev') }}{% endif %}๐ก Use Case: This approach lets you dynamically reference different models for different environments (like dev or prod).
๐งฎ Example 2: Multiple Ref() in Joins
select p.product_id, p.product_name, sum(o.quantity) as total_soldfrom {{ ref('stg_products') }} pjoin {{ ref('stg_orders') }} o on p.product_id = o.product_idgroup by 1, 2Here, both models stg_products and stg_orders are dependencies. dbt ensures both are ready before running this.
๐งฎ Example 3: Ref with CTEs (Common Table Expressions)
with customer_data as ( select * from {{ ref('stg_customers') }} ), order_data as ( select * from {{ ref('stg_orders') }} )select c.customer_id, c.customer_name, sum(o.amount) as total_spentfrom customer_data cjoin order_data o on c.customer_id = o.customer_idgroup by 1, 2Using ref() inside CTEs keeps your SQL modular and readable.
๐งฉ Example Set 3: Advanced Project-Level Dependencies
๐งฎ Example 1: Cross-Project Reference
If you organize models by folders, you can use relative paths in ref().
select * from {{ ref('marts.sales_summary') }}This tells dbt to look inside the marts folder for the sales_summary.sql model.
๐งฎ Example 2: Macros Using Ref()
You can even call ref() inside custom macros to make dynamic SQL reusable.
{% macro union_models(model_list) %} {% for model in model_list %} select * from {{ ref(model) }} {% if not loop.last %} union all {% endif %} {% endfor %}{% endmacro %}Usage:
{{ union_models(['stg_orders', 'stg_customers']) }}๐งฎ Example 3: Ref() with Incremental Models
You can combine ref() with incremental strategies for large datasets.
{{ config(materialized='incremental') }}
select *from {{ ref('stg_transactions') }}where updated_at > (select max(updated_at) from {{ this }})Here, dbt only processes new records, ensuring faster builds.
๐ง How to Remember dbt Model Dependencies (for Interview/Exam)
To remember how dbt links models, use the mnemonic โR-D-Gโ:
| Letter | Concept | Memory Tip |
|---|---|---|
| R | Ref() | โUse ref() to reference another model.โ |
| D | Dependencies | โRef builds the dependency DAG.โ |
| G | Graph (DAG) | โdbt visualizes the graph automatically.โ |
๐ก Interview Flashcards
| Question | Answer |
|---|---|
What does ref() do in dbt? | It links models and defines dependencies. |
| Why not hardcode table names? | ref() automatically updates schema references. |
| How does dbt know model order? | It scans ref() calls to build a DAG. |
| What is a DAG in dbt? | Directed Acyclic Graph showing model dependencies. |
Can you use ref() in macros? | Yes, to dynamically reference models. |
| What happens if a referenced model fails? | dbt halts the run for dependent models. |
โก Why Itโs Important to Learn Model Dependencies
Understanding model dependencies is crucial for professional dbt use.
| Benefit | Description |
|---|---|
| Automation | dbt runs models in the right order automatically |
| Scalability | Simplifies managing 100s of interdependent models |
| Clarity | DAG visualizes relationships for better understanding |
| Maintainability | Easier to debug and modify pipelines |
| Reusability | ref() enables modular design across teams |
| Environment Agility | Automatically adapts to schema changes |
| Collaboration | Developers can work independently on dependent models |
๐งฑ Best Practices
| Practice | Description |
|---|---|
Always use ref() instead of hardcoding | Ensures portability and dependency tracking |
| Name models clearly | Helps visualize DAG structure |
| Avoid circular dependencies | Keep graph acyclic (no loops) |
| Group models by layer | Example: staging โ core โ marts |
| Use macros for repetitive ref logic | Promotes clean, DRY code |
| Visualize DAG regularly | Detects broken or missing dependencies early |
๐งฉ Common Mistakes & Fixes
| Mistake | Problem | Fix |
|---|---|---|
| Hardcoded table names | Schema breaks after deployment | Replace with ref() |
| Circular references | dbt run fails | Reorganize dependencies |
| Model naming inconsistency | DAG confusion | Use naming convention (stg_, core_, mart_) |
| No DAG visualization | Hidden dependency errors | Run dbt docs serve regularly |
๐ Real-World Use Cases
| Industry | Use Case |
|---|---|
| E-Commerce | Chain of dependencies from raw orders โ cleaned โ sales summary |
| Finance | Data lineage tracking for compliance |
| Marketing | Campaign-level performance aggregation |
| Healthcare | Layered models for patient, hospital, and outcome analysis |
| SaaS Analytics | Building metrics dashboards across multiple teams |
โ๏ธ Visualizing Dependencies in dbt
You can view model relationships interactively using the dbt Docs tool.
Commands:
dbt docs generatedbt docs serveThe output shows a graph of models (DAG) โ each node represents a model, and edges represent ref() relationships.
This makes debugging or optimizing pipelines much easier.
๐ง Quick Recap
โ
Model Dependencies define how models rely on one another.
โ
ref() is the key dbt function for linking models.
โ
dbt builds a DAG to execute models in correct order.
โ
You can use ref() in models, macros, and even conditionals.
โ
It ensures consistency, automation, and maintainability.
๐ Final Thoughts
Mastering model dependencies in dbt through the ref() function is like mastering the foundation of a data city โ every building (model) stands on another, and ref() is the blueprint that keeps everything connected.
Whether youโre building a simple two-model project or an enterprise-scale pipeline with 500 models, ref() keeps your transformations organized, traceable, and scalable.
โWithout
ref(), dbt would just be SQL scripts. Withref(), it becomes an intelligent, interconnected data system.โ