dbt Certified auto_stories Free Compressed Course : 20% preview

dbt Analytics Engineering Certification Course

bolt Everything you need to pass : in one free course.

20 expert modules derived from 60+ exam-style questions. Covers every domain and scenario : organized by blueprint weight so you study what matters most.

Full access from $ 29 One-time · No subscription

play_arrow Start Learning Free payments See Plans

check_circle 5 of 20 modules free · No account needed

Modules

60+

Questions

star star star star star

4.9/5

description Also available: 3-Page Cheat Sheet by Experts

200+ dbt Certified 93% First-Attempt Pass Rate 4.9/5 Rating

About This Course

dbt Analytics Engineering · 20 modules

This course covers every domain tested on the dbt Analytics Engineering exam. Based on our 60+ real practice questions and prepared by certification experts.

info What you'll learn:

Every exam domain with detailed explanations
Common exam traps that catch unprepared candidates
Key concepts, syntax, and configurations
Real-world scenarios aligned with exam objectives
Quick-reference cheat sheets for last-minute review

Your dbt Analytics Engineering Roadmap

dbt Analytics Engineering certification preparation infographic

You're viewing 5 of 20 free modules

The remaining 15 modules cover advanced topics, exam traps, and scenarios that appear on the certification exam.

Unlock All : $ 29

1. Exam Overview

What the exam is testing

The dbt Analytics Engineering Certification validates whether you can use dbt in a production analytics workflow, not just whether you can remember commands. The exam expects you to reason through realistic project situations such as:

building a clean model DAG from raw sources to marts;
choosing the right materialization for performance and maintainability;
using ref() and source() correctly instead of hard-coded object names;
testing assumptions about source data and transformed models;
debugging model, YAML, SQL, dependency, and pipeline failures;
applying model governance features such as access, groups, versions, contracts, and grants;
using state-aware workflows for CI, slim CI, retries, and production-safe deployments;
managing packages, source freshness, exposures, and documentation so downstream users trust the data.

The real exam is scenario-heavy. A typical question gives you a project problem, then asks for the dbt-native fix. The best answer is usually the option that improves lineage, repeatability, maintainability, and production safety without over-engineering.

Current official exam structure to know

The official dbt Analytics Engineering Certification page currently lists these logistics and domains:

Item	Current detail
Duration	2 hours
Questions	65
Passing score	65%
Supported version	dbt 1.11
Expected background	SQL proficiency and practical dbt experience
Question style	Practical and scenario-based; expect multiple-choice and interactive-style reasoning

Older official guides and older question banks often mention dbt Core 1.7 and include a separate documentation domain. The current public exam page lists 7 domains and no separate documentation domain. Documentation still matters because it appears across sources, model properties, lineage, exposures, and governance scenarios.

How to think like the exam

Think like a production analytics engineer:

Use dbt abstractions before warehouse-specific shortcuts. Prefer ref(), source(), model configs, selectors, tests, contracts, and exposures over manual warehouse object names.
Preserve the DAG. Anything that hides lineage, bypasses dependencies, or relies on manually ordered SQL is suspicious.
Choose the simplest materialization that satisfies usage. Do not make everything incremental or table. Do not make a dashboard-facing mart ephemeral.
Test assumptions, not implementation trivia. Use tests where they express business rules or data quality guarantees.
Debug from dbt outward. Check logs, compiled SQL, YAML validity, dependencies, target/profile config, and then warehouse-specific SQL/errors.
Separate development from production. Use state, defer, clone, CI jobs, and PR review to avoid rebuilding or querying expensive production tables unnecessarily.
Govern public interfaces. If other teams depend on a model, use model access, versioning, contracts, docs, exposures, and deprecation instead of silently changing schemas.

How to use this course

Read Sections 1–3 once to orient yourself. Then study each domain in Section 4 and use Sections 5–10 as quick revision material. The course intentionally merges repeated CSV themes into decision frameworks so you can answer new scenarios rather than memorize answers.

2. Exam Domains

Current domain list

Priority from source bank	Official/current-style domain	Approx. share in analyzed source	What to master
1	Developing and optimizing dbt models	~26%	materializations, `ref`, `source`, sources, modular SQL, Jinja/macros, seeds, snapshots, configs, DAG, git workflow
2	Implementing dbt tests	~17%	generic/singular/custom tests, source tests, test config, severity, filtering, assumptions, CI testing
3	Debugging data modeling errors	~14%	compiled SQL, logs, YAML errors, SQL vs dbt issues, profiles, dependencies, model fixes
4	Troubleshooting and optimizing dbt pipelines	~13%	DAG failures, selectors, retries, clone, scheduling, CI, orchestration boundaries, production failure handling
5	Managing dbt models governance	~12%	access, groups, contracts, versions, deprecation, grants, stable public interfaces
6	Leveraging the dbt state	~10%	state selectors, result selectors, defer, slim CI, manifest/run_results, modified nodes, retry
7	Implementing and Maintaining External Dependencies	~9%	packages, `dbt deps`, package compatibility, exposures, source freshness, downstream dependency awareness

Priority notes

The largest share of the analyzed bank is model development and optimization. This makes sense because almost every other topic depends on a correct mental model of dbt resources, DAG lineage, materializations, and configuration precedence.

High-yield cross-domain concepts:

ref() vs source();
model materializations: view, table, incremental, ephemeral;
tests: generic vs singular vs custom generic;
source freshness and source testing;
compiled SQL for debugging;
YAML indentation and resource properties;
dbt build vs dbt run + dbt test;
model contracts, versions, access, and groups;
state selection, defer, and CI efficiency;
packages and macro compatibility.

What matters most

If the question says...	The exam usually wants you to think about...
Hard-coded schema/table names inside models	Replace with `ref()` for models or `source()` for raw tables
Model builds in wrong order	Missing `ref()` dependencies
Raw table dependency is not documented/tested	Define a source in YAML and use `source()`
Large append-only table	Incremental model with correct `is_incremental()` logic
Small stable mapping file	Seed
Point-in-time history of slowly changing source data	Snapshot
Business users query it often	Table or incremental, not ephemeral
Reusable CTE not directly queried	Ephemeral or macro depending on purpose
Downstream teams depend on the model	Public/protected access, versions, contracts, docs
Only changed models should run in CI	State selectors and `defer`
Failed job should resume safely	`dbt retry` / result selectors, not manual partial guessing
Source is late or stale	Source freshness, not a generic model test
BI dashboard depends on model	Exposure

3. Start-to-Finish Study Path

Foundation phase: build the dbt mental model

Learn first:

What a dbt project is: dbt_project.yml, models, macros, seeds, snapshots, tests, sources, packages.
How dbt builds a DAG from ref() and source().
Difference between development, CI, staging, and production environments.
Basic commands: dbt debug, dbt compile, dbt run, dbt test, dbt build, dbt docs generate, dbt deps, dbt seed, dbt snapshot, dbt source freshness.

Hands-on checklist:

Create a source YAML file with at least one source and table.
Create staging models using source().
Create intermediate/mart models using ref().
Run dbt compile and inspect target/compiled.
Generate docs and inspect lineage.

Intermediate phase: learn production patterns

Focus on:

materializations and their tradeoffs;
incremental models and is_incremental();
test types and test configuration;
model property YAML;
packages and macros;
git workflow and PR review;
docs and exposures.

Hands-on checklist:

Build one view, one table, one incremental model, one ephemeral model.
Add generic tests: not_null, unique, relationships, accepted_values.
Add a singular test for a business rule.
Add source freshness to a source.
Add an exposure for a dashboard.
Install a package and run dbt deps.

Advanced phase: governance, state, and debugging

Focus on:

model contracts and column-level constraints;
model versions and deprecation;
access levels and groups;
grants for warehouse permissions;
slim CI using state:modified+ and --defer;
result selectors and retries;
debugging YAML, SQL, package, and pipeline failures.

Hands-on checklist:

Add a contract to a model and intentionally break it.
Add a v2 model and deprecate v1.
Make a protected/public model and test dependency behavior.
Run state selection against a previous manifest.
Debug a failing test from the failure output and compiled SQL.

Final review phase

In the final review, do not reread everything equally. Focus on scenario triggers:

If the model is downstream-facing, think governance.
If the model is expensive and append-only, think incremental.
If dependencies are invisible, think ref()/source().
If only changed work should run, think state/defer.
If source data timeliness is the issue, think freshness.
If a dashboard breaks, think exposure, model contract/versioning, docs, and lineage.
If the error is unclear, inspect logs and compiled SQL before changing dbt configs.

4. Core Concepts by Domain

Domain 1 : Developing and Optimizing dbt Models

Concepts

This is the highest-yield domain. You must know how dbt turns modular SQL files into a dependency graph and production data objects.

Key resource types:

Resource	What it represents	Common exam signal
Model	A SQL or Python transformation managed by dbt	Build clean staging/intermediate/mart layers
Source	Raw data object loaded outside dbt	Use when referring to raw tables
Seed	Static CSV version-controlled in the project	Small lookup/mapping/reference data
Snapshot	Point-in-time history of mutable source records	SCD-style history where source overwrites changes
Macro	Reusable Jinja logic	Repeated SQL pattern or generated logic
Test	Data assertion	Validate uniqueness, not null, relationships, accepted values, business rules
Exposure	Downstream asset such as BI dashboard, notebook, ML job	Show downstream dependency and ownership
Package	External reusable dbt project	Shared macros/models/tests from dbt Hub or Git

`ref()` and `source()`

ref() is for dbt models. source() is for raw objects loaded outside dbt.

Need	Use	Why
A mart depends on `stg_orders`	`{{ ref('stg_orders') }}`	Creates DAG dependency and environment-aware relation name
A staging model reads raw `stripe.payments`	`{{ source('stripe', 'payments') }}`	Documents raw dependency and supports source tests/freshness
A model directly queries `analytics_prod.stg_orders`	Replace with `ref()`	Hard-coding breaks lineage and environment portability
A model directly queries `raw.shopify.orders`	Replace with `source()`	Raw dependencies belong in source YAML

Exam trap: If the question says models build in the wrong order, do not choose a scheduler workaround. dbt order comes from ref() dependencies.

Materializations

Materialization	Use when	Avoid when	Exam trap
View	Logic should stay lightweight and always query fresh upstream data	Heavy repeated dashboard queries need fast performance	View does not store results; it can push cost to query time
Table	Model is expensive to compute and queried often	Data changes frequently and full rebuild is too expensive	Table rebuilds entire relation each run
Incremental	Large table, small new/changed subset per run	Large percentage updates each run or logic cannot isolate changes	Requires correct filter and unique key/strategy where needed
Ephemeral	Reusable intermediate logic not queried directly	Many downstream refs create repeated SQL; business users need to query it	Ephemeral is inlined as CTE, not created as a database object
Seed	Small static CSV controlled in git	Large dynamic data or frequently updated operational data	Seeds are not an ingestion system
Snapshot	Track historical changes in mutable source records	You only need latest state or immutable event data	Snapshot is not a materialization for performance

Incremental models

Use incremental when a model has many rows and only a small subset is added or changed each run.

Core reasoning:

is_incremental() gates logic that should only run on incremental runs.
The SQL must be valid for both full-refresh and incremental runs.
Use a reliable event/update timestamp or high-water mark.
Use unique_key and an incremental strategy when records can update.
Use --full-refresh when logic changes require rebuilding historical rows.
Schema changes may require on_schema_change handling or full refresh depending on the warehouse and change type.

Common bad answers:

“Incremental models are always rebuilt.” False.
“Incremental models are always best for large tables.” Not if most rows change each run.
“You never need is_incremental().” Usually false for selective processing.
“Use ephemeral to make a large dashboard model faster.” Usually wrong; ephemeral can duplicate heavy SQL.

Sources

A source maps to a raw data location, commonly database + schema, with tables underneath. Use sources to centralize raw object naming and document external dependencies.

Good source YAML includes:

source name;
database/schema where needed;
tables;
source and column descriptions;
tests on raw data assumptions;
freshness where timeliness matters;
loaded timestamp field for freshness checks.

Exam trap: If multiple raw tables are in the same database/schema, they are usually one source with multiple tables, not multiple sources.

Modularity and DRY SQL

Good dbt modeling decomposes SQL into layers:

Layer	Typical purpose	Typical materialization
Staging	Clean, rename, cast, standardize one source	View, sometimes ephemeral/table
Intermediate	Reusable transformations and joins	Ephemeral, view, table depending on cost
Mart	Business-facing facts/dimensions	Table/incremental for performance

Use macros for reusable logic patterns, not for hiding business-critical model lineage. Use models when you need DAG visibility and testable transformation steps.

Jinja, variables, and environment config

Feature	Use case	Trap
`var()`	Project variables provided in `dbt_project.yml` or CLI	Do not store secrets in vars
`env_var()`	Environment-specific values and secrets	Must be available in runtime environment
`target`	Branch logic by target/profile	Avoid excessive target-specific model logic that makes behavior hard to test
`config()`	Model-level configs in SQL	Remember config precedence
`dbt_project.yml`	Default project/folder configs	Bad indentation or wrong resource path breaks expectations

Python models

Python models are used for transformations easier in Python than SQL, such as advanced data science-style transformations. Exam reasoning remains dbt-native:

They are still models in the DAG.
They can use dbt.ref() and dbt.source().
They are not a replacement for simple SQL transformations.
Support depends on the adapter/platform.

Git workflow

Expected git skills:

create feature branches;
commit changes;
pull from main/head branch to stay updated;
resolve conflicts;
open pull requests;
use CI before merging.

Exam trap: If the question says your branch is behind main, the correct general action is to pull/reconcile with the head branch, not manually copy files or merge straight into production.

Patterns

Raw table reference → define source + use source().
Model-to-model dependency → use ref().
Repeated business logic → refactor into staging/intermediate models or macros.
Expensive dashboard model → table or incremental.
Append-only high-volume data → incremental.
Static mapping table → seed.
Source overwrites values but history needed → snapshot.
Direct warehouse object permissions required → grants.

Traps

Choosing a materialization only because it is “faster” without considering freshness, cost, and query pattern.
Using hard-coded schema names in dbt models.
Making every model a table, which increases rebuild time/storage.
Making every staging model ephemeral, which can duplicate heavy SQL downstream.
Treating seeds as ingestion for operational data.
Using macros where a model would provide better lineage and testing.
Forgetting that a table model fully rebuilds by default.

lock

Domain 2 : Managing dbt Models Governance

This module is part of the full course. Unlock all 20 modules + 60+ practice questions.

dbt Analytics Engineering Certification Course

About This Course

Your dbt Analytics Engineering Roadmap

1. Exam Overview

What the exam is testing

Current official exam structure to know

How to think like the exam

How to use this course

2. Exam Domains

Current domain list

Priority notes

What matters most

3. Start-to-Finish Study Path

Foundation phase: build the dbt mental model

Intermediate phase: learn production patterns

Advanced phase: governance, state, and debugging

Final review phase

4. Core Concepts by Domain

Domain 1 : Developing and Optimizing dbt Models

Concepts

ref() and source()

Materializations

Incremental models

Sources

Modularity and DRY SQL

Jinja, variables, and environment config

Python models

Git workflow

Patterns

Traps

Stop guessing. Start passing.

auto_stories More Guides

AWS Solutions Architect – Associate (SAA-C03)

DP-700 Microsoft Fabric Data Engineer Associate

SnowPro Core COF-C03

Unlock Full Course

Exam Q&A Only

Full Prep Package

`ref()` and `source()`