Data Build Tools
Core Foundations of dbt
- What is dbt
- ELT vs ETL
- dbt Project Structure
- dbt_project.yml
- profiles.yml
- dbt Core vs dbt Cloud
- dbt Environment Management
Models and Materializations
🧭 dbt Project Structure – Understanding Folder Hierarchy for Efficient Data Transformation
In modern data transformation workflows, dbt (data build tool) has become the gold standard. It empowers data engineers and analysts to transform data directly in the warehouse using SQL and modular, version-controlled projects.
But to work effectively in dbt, you must first understand its project structure — how dbt organizes files, models, macros, and configurations.
Think of it like understanding a city map before navigating it. Once you know where things live, building transformations becomes effortless.
🧩 What Is a dbt Project?
A dbt project is simply a folder (directory) that contains:
- SQL models for transformations,
- YAML files for tests and documentation,
- Jinja macros,
- Configurations that tell dbt how to build your data pipeline.
A dbt project acts as a blueprint for your data transformations.
📁 Default dbt Project Structure
When you create a new dbt project (dbt init my_project), dbt automatically generates a folder hierarchy like this:
my_project/│├── models/│ ├── staging/│ ├── marts/│ └── schema.yml│├── tests/│├── analyses/│├── macros/│├── snapshots/│├── seeds/│├── dbt_project.yml└── README.mdLet’s explore each folder in depth.
🧱 1. models/
This is the heart of your dbt project — where you define SQL transformations.
- Each
.sqlfile insidemodels/becomes a table or view in your data warehouse. - dbt executes these SQL files in dependency order (using
ref()links). - Typically, models are organized by layers:
models/├── staging/│ └── stg_orders.sql├── marts/│ └── fct_sales.sql└── schema.ymlExample 1: Simple dbt Model (stg_orders.sql)
-- models/staging/stg_orders.sqlSELECT id AS order_id, customer_id, order_date, total_amountFROM {{ source('raw', 'orders') }}WHERE status = 'shipped';✅ Explanation:
- Reads from a raw table defined as a source.
- Applies a simple transformation.
- dbt builds this as a view/table inside your data warehouse.
🧾 2. schema.yml (Documentation + Tests)
Each folder (like staging/ or marts/) usually has a schema.yml that documents models and defines tests.
Example 2: Schema File for Models
version: 2
models: - name: stg_orders description: "Cleaned and standardized orders data" columns: - name: order_id tests: - not_null - unique - name: customer_id description: "Unique ID of customer"✅ Explanation:
- This YAML file ties metadata and tests to each model.
- Tests run automatically using
dbt test.
⚙️ 3. macros/
This folder holds Jinja macros — reusable functions written in SQL + Jinja templating.
They help you avoid repetition and implement dynamic logic.
Example 3: Custom Macro
-- macros/calculate_discount.sql{% macro calculate_discount(price, percent) %} ({{ price }} * (1 - {{ percent }}/100)){% endmacro %}Now use this macro in any model:
SELECT order_id, {{ calculate_discount('price', 10) }} AS discounted_priceFROM {{ ref('stg_orders') }};✅ Explanation: Macros make SQL cleaner and reusable — similar to Python functions.
🌱 4. seeds/
Seeds are CSV files loaded directly into your warehouse as tables. Perfect for lookup tables or static reference data.
Example:
seeds/├── countries.csvTo load them:
dbt seed✅ Result:
A new table countries is created in your warehouse.
🕒 5. snapshots/
Snapshots track historical changes in your data over time (slowly changing dimensions).
Example:
-- snapshots/customers_snapshot.sql{% snapshot customers_snapshot %}{{ config( target_schema='snapshots', unique_key='customer_id', strategy='timestamp', updated_at='updated_at') }}
SELECT * FROM {{ ref('stg_customers') }}
{% endsnapshot %}Run:
dbt snapshot✅ Result: A table that stores versions of customer data over time.
🔍 6. analyses/
This folder stores ad-hoc queries or exploratory SQL files. They are not built as models but are version-controlled.
Example:
analyses/├── sales_trends.sqlYou can run them manually with:
dbt compile🧪 7. tests/
Contains custom tests beyond basic schema tests.
Example:
-- tests/valid_country.sqlSELECT *FROM {{ ref('customers') }}WHERE country NOT IN (SELECT country FROM {{ ref('countries') }})Run:
dbt test✅ Explanation: dbt flags rows failing test conditions, ensuring data quality.
⚙️ 8. dbt_project.yml
This is the configuration brain of the project.
Example:
name: my_projectversion: 1.0profile: my_profile
models: my_project: staging: materialized: view marts: materialized: table✅ Explanation:
- Defines materialization type (view/table).
- Controls project behavior globally.
🧱 9. README.md
Documentation for your project — purpose, structure, and conventions. Keep it updated for onboarding and collaboration.
📈 ** dbt Project Hierarchy**
✅ Visualization: Shows the main directories and relationships within a dbt project.
🧠 How dbt Organizes Execution
When you run dbt run:
- dbt reads dbt_project.yml for configurations.
- It compiles SQL + Jinja templates.
- Builds dependency order using
ref()links. - Executes in parallel inside your warehouse.
This structure ensures modularity, scalability, and reproducibility.
💡 Example 4 – Complete Mini Project
dbt_ecommerce/├── models/│ ├── staging/│ │ └── stg_orders.sql│ ├── marts/│ │ └── fct_sales.sql│ └── schema.yml├── macros/│ └── calculate_discount.sql├── seeds/│ └── countries.csv└── dbt_project.ymlRun:
dbt rundbt test✅ Result:
- dbt builds both models
- Runs tests
- Uses macros dynamically
💡 Example 5 – Folder Customization
You can customize folder names in dbt_project.yml:
model-paths: ["transformations"]analysis-paths: ["queries"]seed-paths: ["datafiles"]✅ Result:
dbt looks for models in transformations/ instead of the default models/.
💡 Example 6 – Multi-Package Structure
Advanced dbt projects may include multiple sub-projects or packages.
data_team/├── dbt_project.yml├── models/│ ├── finance/│ ├── marketing/│ └── sales/✅ Use case: Each team maintains its own folder, but shares global macros and seeds.
🧠 How to Remember dbt Project Structure
| Concept | Memory Tip |
|---|---|
| models/ | “Where transformations live” – the engine room |
| macros/ | “SQL shortcuts” – like Python functions |
| seeds/ | “Static data seeds growth” |
| snapshots/ | “Time machine for data” |
| tests/ | “Quality gatekeeper” |
| analyses/ | “Playground for analysts” |
| dbt_project.yml | ”The project brain” |
💡 Mnemonic:
“Models Make Smart Transformations, Seeds Strengthen Snapshots, Tests Trust Data.”
🧠 ** Data Flow Through dbt**
✅ Explanation: Shows how data flows through model layers organized in folders.
💼 Why Understanding dbt Project Structure Matters
1. Collaboration
A clear hierarchy helps multiple engineers contribute without conflicts.
2. Maintainability
Each model and folder has a defined purpose — easy to debug and extend.
3. Scalability
Projects grow naturally with staging, marts, macros, and tests.
4. Interview Relevance
Common questions:
- “What’s inside a dbt project?”
- “How does dbt know what to build?”
- “Where do macros live?”
- “Difference between seeds and snapshots?”
5. Performance & Reuse
Macros and modular models reduce redundancy and improve maintainability.
💡 How to Prepare for Exams & Interviews
📘 Flashcards:
- Q: “Where are transformations defined in dbt?”
A: Inside the
models/folder as SQL files. - Q: “What is
dbt_project.ymlused for?” A: Defines configurations, paths, and model settings.
🧠 Mnemonics:
“Models build, Macros think, Seeds grow, Snapshots remember.”
🔁 Practice:
- Build a mini dbt project from scratch.
- Create one macro, one model, and one test.
- Run
dbt runanddbt testto see it in action.
🧭 Best Practices for Organizing dbt Projects
-
Follow Layered Approach:
- staging → intermediate → marts
-
Keep Models Modular:
- One logical transformation per file.
-
Document Everything:
- Use
schema.ymlfor descriptions and tests.
- Use
-
Use Macros Wisely:
- Don’t repeat logic.
-
Version Control:
- Keep project in Git for collaboration.
-
Consistent Naming:
- Prefix models (e.g.,
stg_,int_,fct_).
- Prefix models (e.g.,
🧠 ** Layered Folder Example**
✅ Explanation: Each layer builds upon the previous one — improving clarity and modularity.
🧩 When to Customize Project Structure
You may customize folder hierarchy when:
- Working in large teams with multiple domains.
- Organizing multi-environment setups (dev, prod).
- Implementing domain-driven design (marketing, finance, ops).
Example:
models/├── marketing/│ ├── staging/│ ├── marts/├── finance/│ ├── staging/│ ├── marts/✅ Each domain manages its own models independently.
🧠 Interview Cheat Sheet
| Question | Answer |
|---|---|
| What are the main folders in dbt? | models, macros, seeds, snapshots, analyses, tests |
What’s dbt_project.yml? | Configuration file defining paths and settings |
| Where do transformations live? | Inside the models/ folder |
| Purpose of seeds? | Load CSV data directly as tables |
| What are macros? | Reusable Jinja functions for SQL |
| What is a snapshot? | Captures data state over time |
| How does dbt find model order? | Uses ref() dependencies |
🧩 Summary
| Folder | Purpose |
|---|---|
| models/ | Transformation SQL files |
| schema.yml | Documentation + tests |
| macros/ | Reusable SQL logic |
| seeds/ | Static CSV datasets |
| snapshots/ | Historical tracking |
| analyses/ | Exploratory queries |
| tests/ | Data quality validation |
| dbt_project.yml | Central configuration |
🏁 Conclusion
Understanding the dbt project structure is essential for every data engineer or analyst working in the modern ELT ecosystem.
It teaches how dbt organizes:
- Models (transformations)
- Macros (reusability)
- Seeds (static data)
- Tests (data quality)
- Snapshots (history tracking)
Once you understand this folder hierarchy, you can easily: ✅ Collaborate in teams, ✅ Write cleaner, modular SQL, and ✅ Debug transformations faster.
🌟 Final Thought:
“A well-structured dbt project is like a well-organized library — easy to find, understand, and grow.”