Data Build Tools
Core Foundations of dbt
- What is dbt
- ELT vs ETL
- dbt Project Structure
- dbt_project.yml
- profiles.yml
- dbt Core vs dbt Cloud
- dbt Environment Management
Models and Materializations
🧠 What is dbt? – The Heart of Modern Data Transformation
In the modern data world, raw data flows from multiple sources — web apps, CRMs, APIs, and cloud warehouses. But before analysts can gain insights, this data must be cleaned, modeled, and transformed into a usable format.
This is where dbt (Data Build Tool) revolutionizes data workflows.
dbt isn’t a database, a visualization tool, or a pipeline orchestrator. It’s a transformation framework that lets analysts and engineers:
- Write SQL-based transformations,
- Version-control them using Git,
- Automate testing, documentation, and deployment,
- And follow the ELT (Extract, Load, Transform) approach efficiently.
🔍 Definition
dbt (Data Build Tool) is an open-source command-line tool that allows data analysts and engineers to transform data already loaded into a data warehouse using SQL and Jinja templating.
In simpler terms:
- You extract and load data using tools like Fivetran or Airbyte.
- Then, you transform that data inside the warehouse using dbt.
Hence, dbt forms the T in ELT — “Transform”.
🧩 How dbt Fits in the ELT Process
| Step | Tool | Description |
|---|---|---|
| Extract (E) | Fivetran, Airbyte, Stitch | Data is pulled from various sources. |
| Load (L) | Snowflake, BigQuery, Redshift | Raw data is loaded into the warehouse. |
| Transform (T) | dbt | SQL models clean, join, and enrich data. |
✅ dbt focuses solely on transformations, not ingestion or visualization.
🧠 ** dbt in ELT Pipeline**
✅ Explanation: dbt sits between your data warehouse and BI tools, ensuring clean, analytics-ready data.
⚙️ Core Components of dbt
| Component | Description |
|---|---|
| Models | SQL files that define transformations. |
| Tests | Validate data quality (e.g., uniqueness, not null). |
| Sources | Define where raw data comes from. |
| Macros | Reusable SQL functions using Jinja. |
| Snapshots | Capture historical changes in data. |
| Docs | Auto-generate documentation for models. |
🧱 Understanding dbt Models
In dbt, a model is simply a SQL file that represents a table or view in your data warehouse.
When you run dbt run, dbt:
- Compiles the SQL models,
- Executes them in dependency order,
- Creates views/tables in the warehouse.
💡 Example 1 – Basic dbt Model
Here’s a simple model file: models/orders_cleaned.sql
-- models/orders_cleaned.sqlSELECT id, customer_id, order_date, status, total_amountFROM {{ source('raw', 'orders') }}WHERE status IS NOT NULL;Run:
dbt run --select orders_cleaned✅ Result: Creates a cleaned orders table in your warehouse.
💡 Example 2 – Using Jinja Templating in dbt
dbt supports Jinja, a templating language, to make SQL dynamic.
-- models/top_customers.sqlSELECT customer_id, SUM(total_amount) AS total_spentFROM {{ ref('orders_cleaned') }}GROUP BY customer_idHAVING total_spent > {{ var('high_spender_threshold', 1000) }}You can define high_spender_threshold in your dbt_project.yml.
✅ Result: A dynamic query filtering customers based on configurable spending thresholds.
💡 Example 3 – Chaining dbt Models
You can chain multiple models using the ref() function.
-- models/customer_summary.sqlSELECT c.customer_id, c.name, t.total_spentFROM {{ ref('customers') }} cJOIN {{ ref('top_customers') }} tON c.customer_id = t.customer_id;✅ Result: dbt automatically builds dependency graphs and executes models in the correct order.
🧠 ** dbt Model Dependencies**
✅ Note:
ref() automatically builds lineage so dbt knows which models depend on which.
🧪 Data Testing in dbt
dbt allows you to test your data just like software engineers test code.
Built-in tests:
uniquenot_nullaccepted_valuesrelationships
🧩 Example 1 – Adding Tests
In models/schema.yml:
version: 2
models: - name: orders_cleaned columns: - name: id tests: - unique - not_null - name: status tests: - accepted_values: values: ['shipped', 'pending', 'cancelled']Run tests:
dbt test✅ Result: Ensures data integrity and flags any quality issues.
🧠 dbt and ELT Philosophy
Traditional ETL (Extract → Transform → Load) performs transformation before loading data. But in modern cloud systems, we use ELT — transform after loading.
| ETL | ELT |
|---|---|
| Transform outside warehouse | Transform inside warehouse |
| Requires external tool | Uses dbt |
| Slower and complex | Faster and simpler |
✅ dbt enables ELT by leveraging the power of cloud warehouses for transformations.
⚙️ dbt Workflow Summary
- Write SQL transformations as dbt models.
- Define sources and dependencies.
- Run transformations with
dbt run. - Validate data using
dbt test. - Generate lineage and documentation.
- Deploy in CI/CD pipelines or dbt Cloud.
🧠 ** dbt Workflow**
🎯 Why dbt is Important to Learn
| Reason | Description |
|---|---|
| 💪 Transforms Inside Warehouse | Uses native SQL — no need for external servers. |
| ⚙️ Automation | Handles dependencies, tests, and documentation automatically. |
| 🔍 Data Quality | Ensures clean, tested data for analytics. |
| 🧠 Version Control | Integrates with Git for team collaboration. |
| ☁️ Integration | Works seamlessly with Snowflake, BigQuery, Redshift, Databricks. |
| 📊 Lineage Visibility | Visual DAGs show how models relate. |
| 🧾 Industry Demand | dbt is a must-have skill for modern data engineers. |
🧠 Memory Tricks – For Interview and Exams
| Concept | Trick |
|---|---|
| dbt = Data Build Tool | “dbt builds your data pipeline.” |
| ELT Model | “Extract, Load, then Transform.” |
| ref() Function | “dbt references other models smartly.” |
| Tests | “dbt tests data like code.” |
| Docs | “dbt explains itself automatically.” |
💡 Mnemonic:
“dbt = Define, Build, Test.”
🧩 Example 1 – dbt Source Definition
You can define your raw data sources using sources.yml.
version: 2
sources: - name: raw database: analytics schema: public tables: - name: orders - name: customers✅ Benefit: dbt knows where data originates and can generate lineage automatically.
🧩 Example 2 – dbt Snapshots
Snapshots help capture slowly changing dimensions (SCD Type 2).
-- snapshots/customers_snapshot.sql{% snapshot customers_snapshot %}{{ config( target_schema='snapshots', unique_key='customer_id', strategy='timestamp', updated_at='updated_at') }}
SELECT * FROM {{ source('raw', 'customers') }}
{% endsnapshot %}✅ Result: Tracks how customer data evolves over time.
🧩 Example 3 – dbt Macros
Macros are reusable SQL snippets that make transformations modular.
-- macros/convert_to_usd.sql{% macro convert_to_usd(amount, currency) %}CASE WHEN {{ currency }} = 'USD' THEN {{ amount }} WHEN {{ currency }} = 'EUR' THEN {{ amount }} * 1.1 ELSE {{ amount }}END{% endmacro %}Usage in model:
SELECT order_id, {{ convert_to_usd('total_amount', 'currency') }} AS total_usdFROM {{ ref('orders_cleaned') }}✅ Result: Reusability reduces duplication and simplifies maintenance.
🧠 ** dbt Model Dependency DAG**
✅ Explanation: dbt automatically visualizes these relationships through a Directed Acyclic Graph (DAG).
💡 How to Remember dbt Concepts Easily
| Concept | Memory Anchor |
|---|---|
| Models | Think of SQL blueprints |
| Tests | Like software QA for data |
| Macros | SQL functions/templates |
| Sources | Starting point of data |
| Snapshots | Historical checkpoints |
| Docs | Your project’s “User Manual” |
💡 Visualization Trick: Imagine dbt as a kitchen:
- Sources = Raw ingredients
- Models = Recipes
- Tests = Taste check
- Docs = Recipe book
🧠 Interview Questions on dbt
- What is dbt and how does it differ from ETL?
- What is the purpose of
ref()andsource()functions? - Explain dbt tests and their types.
- What is the role of Jinja in dbt?
- What are dbt macros and how are they used?
- How does dbt ensure data quality?
- What is dbt Cloud vs dbt Core?
✅ Short Tip: dbt = Transform + Test + Document your data, all in SQL.
🧩 Best Practices for Using dbt
| Practice | Benefit |
|---|---|
Use ref() for dependencies | Maintains data lineage |
Use source() for raw tables | Enhances traceability |
| Add tests for critical columns | Ensures data integrity |
| Modularize models | Promotes reusability |
| Document all models | Improves collaboration |
📊 ** dbt End-to-End Lifecycle**
✅ Summary: dbt acts as the bridge between your warehouse and analytics layer.
🏁 Conclusion
dbt is transforming the way teams build, test, and manage data transformations in the cloud. By combining SQL simplicity, software engineering best practices, and automation, dbt enables data teams to move faster, collaborate better, and trust their data more.
In the ELT world, dbt is not just a tool — it’s the foundation of modern data transformation.
🌟 Final Thought:
“dbt lets data teams work like software engineers — testing, versioning, and automating everything with SQL.”