Data Build Tools
Core Foundations of dbt
- What is dbt
- ELT vs ETL
- dbt Project Structure
- dbt_project.yml
- profiles.yml
- dbt Core vs dbt Cloud
- dbt Environment Management
Models and Materializations
dbt Ephemeral Models
In dbt (data build tool), ephemeral models are a powerful concept for creating reusable, temporary SQL transformations that do not generate tables or views in your data warehouse. Instead, they are inlined as Common Table Expressions (CTEs) into downstream models.
Ephemeral models allow you to modularize logic, improve readability, and keep your warehouse clean without creating unnecessary physical objects. They are especially useful for small, intermediate transformations used in multiple downstream models.
In this article, we will explore:
- What ephemeral models are and how they work
- Key characteristics and rules
- Three detailed examples with SQL code
- A Merquine-style diagram showing ephemeral model execution
- Memory techniques for interviews and exams
- Why ephemeral models are important
- Common pitfalls and best practices
1. What Are Ephemeral Models?
Conceptual Definition
- An ephemeral model is a dbt model materialized as
ephemeral
. - It does not create a physical table or view in the warehouse.
- When referenced via
ref()
in a downstream model, dbt inlines its SQL as a CTE in the consuming model. - They are temporary: their logic exists only at compile time, not as a database object.
{{ config(materialized='ephemeral') }}
select user_id, lower(email) as email_cleanfrom {{ ref('raw_users') }}
If another model references this ephemeral model, dbt will insert the SQL above as a CTE in that model, rather than creating a separate table or view.
1.1 Key Characteristics
Feature | Explanation |
---|---|
No storage | Ephemeral models do not consume warehouse storage. |
Inline logic | SQL is inserted into downstream models at compile time. |
Reusable | Logic can be modularized and reused in multiple models. |
Fast iteration | Ideal for temporary transformations or helper logic. |
Cannot be queried directly | Since no table/view exists, you cannot run SELECT * FROM ephemeral_model . |
No incremental support | is_incremental() always returns false. |
1.2 Why Use Ephemeral Models?
- Reduce warehouse clutter: Avoid unnecessary tables and views.
- Encourage modularity: Break complex SQL into smaller, reusable blocks.
- Improve maintainability: Centralize transformations like cleaning, formatting, or filtering.
- Optimize query compilation: Complex transformations can be nested directly into downstream logic without intermediate storage.
2. Ephemeral Models vs Other Materializations
Materialization | Storage | Execution | Use Case |
---|---|---|---|
table | Yes | Precomputed | Large datasets, high query performance |
view | No | Recomputed on read | Lightweight, always fresh |
incremental | Yes | New/updated rows | Growing datasets, cost-efficient |
ephemeral | No | Inlined CTE | Temporary logic, helper transformations |
Key difference: ephemeral models are never stored, only inlined, making them ideal for intermediate transformations.
3. How Ephemeral Models Work
-
Write the ephemeral model
{{ config(materialized='ephemeral') }}select user_id, lower(email) as email_clean from {{ ref('raw_users') }} -
Reference it in another model
{{ config(materialized='table') }}with cleaned_users as (select * from {{ ref('ephemeral_clean_users') }})selectu.user_id,u.email_clean,o.order_idfrom cleaned_users ujoin {{ ref('raw_orders') }} o on u.user_id = o.customer_id -
dbt compile: The SQL from the ephemeral model is inserted into the CTE
cleaned_users
. -
dbt run: Downstream model executes the SQL, including inlined ephemeral logic.
Benefit: You never create an extra table, but downstream models can still reuse the logic.
4. Example Programs for Ephemeral Models
Below are three unique example scenarios showing ephemeral model usage.
Example Set 1: Cleaning User Data
ephemeral_clean_users.sql
{{ config(materialized='ephemeral') }}
select id as user_id, lower(trim(email)) as email_clean, coalesce(first_name, 'unknown') as first_name_clean, coalesce(last_name, 'unknown') as last_name_cleanfrom {{ ref('raw_users') }}where active = true
downstream model using it:
{{ config(materialized='table') }}
with cleaned_users as ( select * from {{ ref('ephemeral_clean_users') }})select u.user_id, u.email_clean, count(o.order_id) as order_countfrom cleaned_users ujoin {{ ref('raw_orders') }} o on u.user_id = o.customer_idgroup by 1, 2
Explanation:
- Cleaned users logic is reused but not stored as a separate table.
- Downstream table includes cleaned users directly in its CTE.
Example Set 2: Calculating Metrics for Reuse
ephemeral_user_metrics.sql
{{ config(materialized='ephemeral') }}
select user_id, sum(amount) as total_spent, count(distinct order_id) as orders_countfrom {{ ref('raw_orders') }}group by user_id
downstream model:
{{ config(materialized='table') }}
with metrics as ( select * from {{ ref('ephemeral_user_metrics') }})select u.id as user_id, u.signup_date, m.total_spent, m.orders_countfrom {{ ref('raw_users') }} uleft join metrics m on u.id = m.user_id
Benefit:
- Aggregation logic is centralized and reused in any downstream model without creating intermediate tables.
Example Set 3: Filtering & Transforming Event Data
ephemeral_filtered_events.sql
{{ config(materialized='ephemeral') }}
select event_id, user_id, event_type, event_tsfrom {{ ref('raw_events') }}where event_type in ('purchase', 'signup')
downstream model:
{{ config(materialized='table') }}
with filtered_events as ( select * from {{ ref('ephemeral_filtered_events') }})select e.user_id, count(*) as event_count, max(event_ts) as last_eventfrom filtered_events egroup by e.user_id
Explanation:
- Filtering logic for specific event types is reusable in multiple analyses.
- No extra storage is used.
5. Merquine-Style Diagram for Ephemeral Models
raw tables │ ▼ephemeral models (inlined SQL) │ ▼downstream models (table/view/incremental) │ ▼BI dashboards / analytics
- Compile phase: dbt inlines ephemeral SQL into downstream models.
- Execution phase: warehouse executes the full query including inlined logic.
- No ephemeral table or view is created.
6. Memory & Interview/Exam Tips
Mnemonics
- “E-C-T-E” → Ephemeral = CTE = Temporary = Embedded
- Think: “Ephemeral never exists outside downstream”
Key Points to Remember
- Ephemeral = inlined CTE
- Cannot be queried directly
- Reusable logic for downstream models
- No storage, no incremental logic
- Compile-time only
Flashcards / Q&A
- Q: Can you query an ephemeral model directly? A: No, it only exists inside downstream models.
- Q: How does dbt handle ephemeral models during
dbt run
? A: SQL is inlined into downstream CTEs; executed as part of downstream model. - Q: Why use ephemeral instead of view? A: To avoid creating temporary warehouse objects and to keep logic modular.
Interview Talking Points
- Explain modularity & reusability.
- Explain performance benefit: no extra tables to maintain.
- Mention trade-offs: debugging can be harder; cannot isolate ephemeral model directly.
7. Importance of Ephemeral Models
- Clean warehouse: Avoid unnecessary tables or views.
- Reusable logic: Modularize SQL transformations, maintain a single source of truth.
- Faster iterations: No separate build steps; ephemeral SQL executes inline.
- Cost efficiency: No storage or materialization cost.
- Encourages maintainable design: Simplifies dependency chains in large projects.
In modern analytics projects, ephemeral models are critical for intermediate transformations and modular pipelines.
8. Common Pitfalls & Best Practices
Pitfall | Explanation | How to Avoid |
---|---|---|
Overcomplicating ephemeral logic | Very complex transformations can slow downstream execution | Keep ephemeral models simple; consider materializing if complex |
Deep nesting | Multiple ephemeral models nested in one downstream query can impact compile/execution time | Limit nesting depth; consider materializing heavy intermediate results |
Debugging difficulties | Cannot query ephemeral directly | Use dbt compile to inspect inlined SQL |
Using ephemeral for huge datasets | Large datasets may perform better as table or incremental | Materialize intermediate models for big tables |
Forgetting dependencies | Reference upstream sources correctly via ref() | Always use ref() for lineage and dependency tracking |
Best Practices
- Use ephemeral models for helper transformations, cleaning, or filtering
- Keep SQL small and readable
- Avoid nesting multiple ephemeral models deeply
- Use downstream tables/views for heavy computations
- Document ephemeral logic clearly in dbt docs
9. Summary
- Ephemeral models are temporary CTE-based models that do not create tables or views.
- They are inline-only, reusable, and ideal for modular transformations.
- Use them for cleaning, filtering, and aggregating small intermediate datasets.
- Cannot query directly, no incremental support, and are executed at compile-time.
- Ephemeral models improve maintainability, efficiency, and modularity in dbt projects.
- Use diagrams, examples, and mnemonics to remember the concept for interviews and exams.