🧭 dbt Project Structure – Understanding Folder Hierarchy for Efficient Data Transformation


In modern data transformation workflows, dbt (data build tool) has become the gold standard. It empowers data engineers and analysts to transform data directly in the warehouse using SQL and modular, version-controlled projects.

But to work effectively in dbt, you must first understand its project structure — how dbt organizes files, models, macros, and configurations.

Think of it like understanding a city map before navigating it. Once you know where things live, building transformations becomes effortless.


🧩 What Is a dbt Project?

A dbt project is simply a folder (directory) that contains:

  • SQL models for transformations,
  • YAML files for tests and documentation,
  • Jinja macros,
  • Configurations that tell dbt how to build your data pipeline.

A dbt project acts as a blueprint for your data transformations.


📁 Default dbt Project Structure

When you create a new dbt project (dbt init my_project), dbt automatically generates a folder hierarchy like this:

my_project/
├── models/
│ ├── staging/
│ ├── marts/
│ └── schema.yml
├── tests/
├── analyses/
├── macros/
├── snapshots/
├── seeds/
├── dbt_project.yml
└── README.md

Let’s explore each folder in depth.


🧱 1. models/

This is the heart of your dbt project — where you define SQL transformations.

  • Each .sql file inside models/ becomes a table or view in your data warehouse.
  • dbt executes these SQL files in dependency order (using ref() links).
  • Typically, models are organized by layers:
models/
├── staging/
│ └── stg_orders.sql
├── marts/
│ └── fct_sales.sql
└── schema.yml

Example 1: Simple dbt Model (stg_orders.sql)

-- models/staging/stg_orders.sql
SELECT
id AS order_id,
customer_id,
order_date,
total_amount
FROM {{ source('raw', 'orders') }}
WHERE status = 'shipped';

Explanation:

  • Reads from a raw table defined as a source.
  • Applies a simple transformation.
  • dbt builds this as a view/table inside your data warehouse.

🧾 2. schema.yml (Documentation + Tests)

Each folder (like staging/ or marts/) usually has a schema.yml that documents models and defines tests.

Example 2: Schema File for Models

version: 2
models:
- name: stg_orders
description: "Cleaned and standardized orders data"
columns:
- name: order_id
tests:
- not_null
- unique
- name: customer_id
description: "Unique ID of customer"

Explanation:

  • This YAML file ties metadata and tests to each model.
  • Tests run automatically using dbt test.

⚙️ 3. macros/

This folder holds Jinja macros — reusable functions written in SQL + Jinja templating.

They help you avoid repetition and implement dynamic logic.

Example 3: Custom Macro

-- macros/calculate_discount.sql
{% macro calculate_discount(price, percent) %}
({{ price }} * (1 - {{ percent }}/100))
{% endmacro %}

Now use this macro in any model:

SELECT
order_id,
{{ calculate_discount('price', 10) }} AS discounted_price
FROM {{ ref('stg_orders') }};

Explanation: Macros make SQL cleaner and reusable — similar to Python functions.


🌱 4. seeds/

Seeds are CSV files loaded directly into your warehouse as tables. Perfect for lookup tables or static reference data.

Example:

seeds/
├── countries.csv

To load them:

Terminal window
dbt seed

Result: A new table countries is created in your warehouse.


🕒 5. snapshots/

Snapshots track historical changes in your data over time (slowly changing dimensions).

Example:

-- snapshots/customers_snapshot.sql
{% snapshot customers_snapshot %}
{{ config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
) }}
SELECT * FROM {{ ref('stg_customers') }}
{% endsnapshot %}

Run:

Terminal window
dbt snapshot

Result: A table that stores versions of customer data over time.


🔍 6. analyses/

This folder stores ad-hoc queries or exploratory SQL files. They are not built as models but are version-controlled.

Example:

analyses/
├── sales_trends.sql

You can run them manually with:

Terminal window
dbt compile

🧪 7. tests/

Contains custom tests beyond basic schema tests.

Example:

-- tests/valid_country.sql
SELECT *
FROM {{ ref('customers') }}
WHERE country NOT IN (SELECT country FROM {{ ref('countries') }})

Run:

Terminal window
dbt test

Explanation: dbt flags rows failing test conditions, ensuring data quality.


⚙️ 8. dbt_project.yml

This is the configuration brain of the project.

Example:

name: my_project
version: 1.0
profile: my_profile
models:
my_project:
staging:
materialized: view
marts:
materialized: table

Explanation:

  • Defines materialization type (view/table).
  • Controls project behavior globally.

🧱 9. README.md

Documentation for your project — purpose, structure, and conventions. Keep it updated for onboarding and collaboration.


📈 ** dbt Project Hierarchy**

dbt_project.yml

models/

staging/

marts/

schema.yml

macros/

seeds/

snapshots/

analyses/

tests/

Visualization: Shows the main directories and relationships within a dbt project.


🧠 How dbt Organizes Execution

When you run dbt run:

  1. dbt reads dbt_project.yml for configurations.
  2. It compiles SQL + Jinja templates.
  3. Builds dependency order using ref() links.
  4. Executes in parallel inside your warehouse.

This structure ensures modularity, scalability, and reproducibility.


💡 Example 4 – Complete Mini Project

dbt_ecommerce/
├── models/
│ ├── staging/
│ │ └── stg_orders.sql
│ ├── marts/
│ │ └── fct_sales.sql
│ └── schema.yml
├── macros/
│ └── calculate_discount.sql
├── seeds/
│ └── countries.csv
└── dbt_project.yml

Run:

Terminal window
dbt run
dbt test

Result:

  • dbt builds both models
  • Runs tests
  • Uses macros dynamically

💡 Example 5 – Folder Customization

You can customize folder names in dbt_project.yml:

model-paths: ["transformations"]
analysis-paths: ["queries"]
seed-paths: ["datafiles"]

Result: dbt looks for models in transformations/ instead of the default models/.


💡 Example 6 – Multi-Package Structure

Advanced dbt projects may include multiple sub-projects or packages.

data_team/
├── dbt_project.yml
├── models/
│ ├── finance/
│ ├── marketing/
│ └── sales/

Use case: Each team maintains its own folder, but shares global macros and seeds.


🧠 How to Remember dbt Project Structure

ConceptMemory Tip
models/“Where transformations live” – the engine room
macros/“SQL shortcuts” – like Python functions
seeds/“Static data seeds growth”
snapshots/“Time machine for data”
tests/“Quality gatekeeper”
analyses/“Playground for analysts”
dbt_project.yml”The project brain”

💡 Mnemonic:

“Models Make Smart Transformations, Seeds Strengthen Snapshots, Tests Trust Data.”


🧠 ** Data Flow Through dbt**

Sources

Staging Models

Intermediate Models

Marts - Final Tables

BI Tools - Tableau, Looker

Explanation: Shows how data flows through model layers organized in folders.


💼 Why Understanding dbt Project Structure Matters

1. Collaboration

A clear hierarchy helps multiple engineers contribute without conflicts.

2. Maintainability

Each model and folder has a defined purpose — easy to debug and extend.

3. Scalability

Projects grow naturally with staging, marts, macros, and tests.

4. Interview Relevance

Common questions:

  • “What’s inside a dbt project?”
  • “How does dbt know what to build?”
  • “Where do macros live?”
  • “Difference between seeds and snapshots?”

5. Performance & Reuse

Macros and modular models reduce redundancy and improve maintainability.


💡 How to Prepare for Exams & Interviews

📘 Flashcards:

  • Q: “Where are transformations defined in dbt?” A: Inside the models/ folder as SQL files.
  • Q: “What is dbt_project.yml used for?” A: Defines configurations, paths, and model settings.

🧠 Mnemonics:

“Models build, Macros think, Seeds grow, Snapshots remember.”

🔁 Practice:

  • Build a mini dbt project from scratch.
  • Create one macro, one model, and one test.
  • Run dbt run and dbt test to see it in action.

🧭 Best Practices for Organizing dbt Projects

  1. Follow Layered Approach:

    • staging → intermediate → marts
  2. Keep Models Modular:

    • One logical transformation per file.
  3. Document Everything:

    • Use schema.yml for descriptions and tests.
  4. Use Macros Wisely:

    • Don’t repeat logic.
  5. Version Control:

    • Keep project in Git for collaboration.
  6. Consistent Naming:

    • Prefix models (e.g., stg_, int_, fct_).

🧠 ** Layered Folder Example**

models/staging

models/intermediate

models/marts

BI Dashboards

Explanation: Each layer builds upon the previous one — improving clarity and modularity.


🧩 When to Customize Project Structure

You may customize folder hierarchy when:

  • Working in large teams with multiple domains.
  • Organizing multi-environment setups (dev, prod).
  • Implementing domain-driven design (marketing, finance, ops).

Example:

models/
├── marketing/
│ ├── staging/
│ ├── marts/
├── finance/
│ ├── staging/
│ ├── marts/

✅ Each domain manages its own models independently.


🧠 Interview Cheat Sheet

QuestionAnswer
What are the main folders in dbt?models, macros, seeds, snapshots, analyses, tests
What’s dbt_project.yml?Configuration file defining paths and settings
Where do transformations live?Inside the models/ folder
Purpose of seeds?Load CSV data directly as tables
What are macros?Reusable Jinja functions for SQL
What is a snapshot?Captures data state over time
How does dbt find model order?Uses ref() dependencies

🧩 Summary

FolderPurpose
models/Transformation SQL files
schema.ymlDocumentation + tests
macros/Reusable SQL logic
seeds/Static CSV datasets
snapshots/Historical tracking
analyses/Exploratory queries
tests/Data quality validation
dbt_project.ymlCentral configuration

🏁 Conclusion

Understanding the dbt project structure is essential for every data engineer or analyst working in the modern ELT ecosystem.

It teaches how dbt organizes:

  • Models (transformations)
  • Macros (reusability)
  • Seeds (static data)
  • Tests (data quality)
  • Snapshots (history tracking)

Once you understand this folder hierarchy, you can easily: ✅ Collaborate in teams, ✅ Write cleaner, modular SQL, and ✅ Debug transformations faster.


🌟 Final Thought:

“A well-structured dbt project is like a well-organized library — easy to find, understand, and grow.”