arrow_back Cert

dbt dbt Analytics Engineering

Study & Mastery Mode

🔥 0 streak

Score 0% /65%

timer Mock Exam lock Pro menu_book Course description 3-Page download Free

menu_book

dbt Analytics Engineering

Compressed Course

dbt Analytics Engineering Certification — Compressed Exam Course

Exam target: dbt Analytics Engineering Certification Exam
Current alignment used in this guide: dbt Core / dbt platform concepts around dbt 1.11, with legacy v1.7/v1.9 themes retained where still exam-relevant.
Source pattern summary: 1,100 scenario-style practice rows, strongest concentration in developing/optimizing models, testing, debugging, pipeline troubleshooting, governance, state, and external dependencies.

1. Exam Overview

What the exam is testing

The dbt Analytics Engineering Certification validates whether you can use dbt in a production analytics workflow, not just whether you can remember commands. The exam expects you to reason through realistic project situations such as:

building a clean model DAG from raw sources to marts;
choosing the right materialization for performance and maintainability;
using ref() and source() correctly instead of hard-coded object names;
testing assumptions about source data and transformed models;
debugging model, YAML, SQL, dependency, and pipeline failures;
applying model governance features such as access, groups, versions, contracts, and grants;
using state-aware workflows for CI, slim CI, retries, and production-safe deployments;
managing packages, source freshness, exposures, and documentation so downstream users trust the data.

The real exam is scenario-heavy. A typical question gives you a project problem, then asks for the dbt-native fix. The best answer is usually the option that improves lineage, repeatability, maintainability, and production safety without over-engineering.

Current official exam structure to know

The official dbt Analytics Engineering Certification page currently lists these logistics and domains:

Item	Current detail
Duration	2 hours
Questions	65
Passing score	65%
Supported version	dbt 1.11
Expected background	SQL proficiency and practical dbt experience
Question style	Practical and scenario-based; expect multiple-choice and interactive-style reasoning

Older official guides and older question banks often mention dbt Core 1.7 and include a separate documentation domain. The current public exam page lists 7 domains and no separate documentation domain. Documentation still matters because it appears across sources, model properties, lineage, exposures, and governance scenarios.

How to think like the exam

Think like a production analytics engineer:

Use dbt abstractions before warehouse-specific shortcuts. Prefer ref(), source(), model configs, selectors, tests, contracts, and exposures over manual warehouse object names.
Preserve the DAG. Anything that hides lineage, bypasses dependencies, or relies on manually ordered SQL is suspicious.
Choose the simplest materialization that satisfies usage. Do not make everything incremental or table. Do not make a dashboard-facing mart ephemeral.
Test assumptions, not implementation trivia. Use tests where they express business rules or data quality guarantees.
Debug from dbt outward. Check logs, compiled SQL, YAML validity, dependencies, target/profile config, and then warehouse-specific SQL/errors.
Separate development from production. Use state, defer, clone, CI jobs, and PR review to avoid rebuilding or querying expensive production tables unnecessarily.
Govern public interfaces. If other teams depend on a model, use model access, versioning, contracts, docs, exposures, and deprecation instead of silently changing schemas.

How to use this course

Read Sections 1–3 once to orient yourself. Then study each domain in Section 4 and use Sections 5–10 as quick revision material. The course intentionally merges repeated CSV themes into decision frameworks so you can answer new scenarios rather than memorize answers.

2. Exam Domains

Current domain list

Priority from source bank	Official/current-style domain	Approx. share in analyzed source	What to master
1	Developing and optimizing dbt models	~26%	materializations, `ref`, `source`, sources, modular SQL, Jinja/macros, seeds, snapshots, configs, DAG, git workflow
2	Implementing dbt tests	~17%	generic/singular/custom tests, source tests, test config, severity, filtering, assumptions, CI testing
3	Debugging data modeling errors	~14%	compiled SQL, logs, YAML errors, SQL vs dbt issues, profiles, dependencies, model fixes
4	Troubleshooting and optimizing dbt pipelines	~13%	DAG failures, selectors, retries, clone, scheduling, CI, orchestration boundaries, production failure handling
5	Managing dbt models governance	~12%	access, groups, contracts, versions, deprecation, grants, stable public interfaces
6	Leveraging the dbt state	~10%	state selectors, result selectors, defer, slim CI, manifest/run_results, modified nodes, retry
7	Implementing and Maintaining External Dependencies	~9%	packages, `dbt deps`, package compatibility, exposures, source freshness, downstream dependency awareness

Priority notes

The largest share of the analyzed bank is model development and optimization. This makes sense because almost every other topic depends on a correct mental model of dbt resources, DAG lineage, materializations, and configuration precedence.

High-yield cross-domain concepts:

ref() vs source();
model materializations: view, table, incremental, ephemeral;
tests: generic vs singular vs custom generic;
source freshness and source testing;
compiled SQL for debugging;
YAML indentation and resource properties;
dbt build vs dbt run + dbt test;
model contracts, versions, access, and groups;
state selection, defer, and CI efficiency;
packages and macro compatibility.

What matters most

If the question says...	The exam usually wants you to think about...
Hard-coded schema/table names inside models	Replace with `ref()` for models or `source()` for raw tables
Model builds in wrong order	Missing `ref()` dependencies
Raw table dependency is not documented/tested	Define a source in YAML and use `source()`
Large append-only table	Incremental model with correct `is_incremental()` logic
Small stable mapping file	Seed
Point-in-time history of slowly changing source data	Snapshot
Business users query it often	Table or incremental, not ephemeral
Reusable CTE not directly queried	Ephemeral or macro depending on purpose
Downstream teams depend on the model	Public/protected access, versions, contracts, docs
Only changed models should run in CI	State selectors and `defer`
Failed job should resume safely	`dbt retry` / result selectors, not manual partial guessing
Source is late or stale	Source freshness, not a generic model test
BI dashboard depends on model	Exposure

3. Start-to-Finish Study Path

Foundation phase: build the dbt mental model

Learn first:

What a dbt project is: dbt_project.yml, models, macros, seeds, snapshots, tests, sources, packages.
How dbt builds a DAG from ref() and source().
Difference between development, CI, staging, and production environments.
Basic commands: dbt debug, dbt compile, dbt run, dbt test, dbt build, dbt docs generate, dbt deps, dbt seed, dbt snapshot, dbt source freshness.

Hands-on checklist:

Create a source YAML file with at least one source and table.
Create staging models using source().
Create intermediate/mart models using ref().
Run dbt compile and inspect target/compiled.
Generate docs and inspect lineage.

Intermediate phase: learn production patterns

Focus on:

materializations and their tradeoffs;
incremental models and is_incremental();
test types and test configuration;
model property YAML;
packages and macros;
git workflow and PR review;
docs and exposures.

Hands-on checklist:

Build one view, one table, one incremental model, one ephemeral model.
Add generic tests: not_null, unique, relationships, accepted_values.
Add a singular test for a business rule.
Add source freshness to a source.
Add an exposure for a dashboard.
Install a package and run dbt deps.

Advanced phase: governance, state, and debugging

Focus on:

model contracts and column-level constraints;
model versions and deprecation;
access levels and groups;
grants for warehouse permissions;
slim CI using state:modified+ and --defer;
result selectors and retries;
debugging YAML, SQL, package, and pipeline failures.

Hands-on checklist:

Add a contract to a model and intentionally break it.
Add a v2 model and deprecate v1.
Make a protected/public model and test dependency behavior.
Run state selection against a previous manifest.
Debug a failing test from the failure output and compiled SQL.

Final review phase

In the final review, do not reread everything equally. Focus on scenario triggers:

If the model is downstream-facing, think governance.
If the model is expensive and append-only, think incremental.
If dependencies are invisible, think ref()/source().
If only changed work should run, think state/defer.
If source data timeliness is the issue, think freshness.
If a dashboard breaks, think exposure, model contract/versioning, docs, and lineage.
If the error is unclear, inspect logs and compiled SQL before changing dbt configs.

4. Core Concepts by Domain

Domain 1 — Developing and Optimizing dbt Models

Concepts

This is the highest-yield domain. You must know how dbt turns modular SQL files into a dependency graph and production data objects.

Key resource types:

Resource	What it represents	Common exam signal
Model	A SQL or Python transformation managed by dbt	Build clean staging/intermediate/mart layers
Source	Raw data object loaded outside dbt	Use when referring to raw tables
Seed	Static CSV version-controlled in the project	Small lookup/mapping/reference data
Snapshot	Point-in-time history of mutable source records	SCD-style history where source overwrites changes
Macro	Reusable Jinja logic	Repeated SQL pattern or generated logic
Test	Data assertion	Validate uniqueness, not null, relationships, accepted values, business rules
Exposure	Downstream asset such as BI dashboard, notebook, ML job	Show downstream dependency and ownership
Package	External reusable dbt project	Shared macros/models/tests from dbt Hub or Git

`ref()` and `source()`

ref() is for dbt models. source() is for raw objects loaded outside dbt.

Need	Use	Why
A mart depends on `stg_orders`	`{{ ref('stg_orders') }}`	Creates DAG dependency and environment-aware relation name
A staging model reads raw `stripe.payments`	`{{ source('stripe', 'payments') }}`	Documents raw dependency and supports source tests/freshness
A model directly queries `analytics_prod.stg_orders`	Replace with `ref()`	Hard-coding breaks lineage and environment portability
A model directly queries `raw.shopify.orders`	Replace with `source()`	Raw dependencies belong in source YAML

Exam trap: If the question says models build in the wrong order, do not choose a scheduler workaround. dbt order comes from ref() dependencies.

Materializations

Materialization	Use when	Avoid when	Exam trap
View	Logic should stay lightweight and always query fresh upstream data	Heavy repeated dashboard queries need fast performance	View does not store results; it can push cost to query time
Table	Model is expensive to compute and queried often	Data changes frequently and full rebuild is too expensive	Table rebuilds entire relation each run
Incremental	Large table, small new/changed subset per run	Large percentage updates each run or logic cannot isolate changes	Requires correct filter and unique key/strategy where needed
Ephemeral	Reusable intermediate logic not queried directly	Many downstream refs create repeated SQL; business users need to query it	Ephemeral is inlined as CTE, not created as a database object
Seed	Small static CSV controlled in git	Large dynamic data or frequently updated operational data	Seeds are not an ingestion system
Snapshot	Track historical changes in mutable source records	You only need latest state or immutable event data	Snapshot is not a materialization for performance

Incremental models

Use incremental when a model has many rows and only a small subset is added or changed each run.

Core reasoning:

is_incremental() gates logic that should only run on incremental runs.
The SQL must be valid for both full-refresh and incremental runs.
Use a reliable event/update timestamp or high-water mark.
Use unique_key and an incremental strategy when records can update.
Use --full-refresh when logic changes require rebuilding historical rows.
Schema changes may require on_schema_change handling or full refresh depending on the warehouse and change type.

Common bad answers:

“Incremental models are always rebuilt.” False.
“Incremental models are always best for large tables.” Not if most rows change each run.
“You never need is_incremental().” Usually false for selective processing.
“Use ephemeral to make a large dashboard model faster.” Usually wrong; ephemeral can duplicate heavy SQL.

Sources

A source maps to a raw data location, commonly database + schema, with tables underneath. Use sources to centralize raw object naming and document external dependencies.

Good source YAML includes:

source name;
database/schema where needed;
tables;
source and column descriptions;
tests on raw data assumptions;
freshness where timeliness matters;
loaded timestamp field for freshness checks.

Exam trap: If multiple raw tables are in the same database/schema, they are usually one source with multiple tables, not multiple sources.

Modularity and DRY SQL

Good dbt modeling decomposes SQL into layers:

Layer	Typical purpose	Typical materialization
Staging	Clean, rename, cast, standardize one source	View, sometimes ephemeral/table
Intermediate	Reusable transformations and joins	Ephemeral, view, table depending on cost
Mart	Business-facing facts/dimensions	Table/incremental for performance

Use macros for reusable logic patterns, not for hiding business-critical model lineage. Use models when you need DAG visibility and testable transformation steps.

Jinja, variables, and environment config

Feature	Use case	Trap
`var()`	Project variables provided in `dbt_project.yml` or CLI	Do not store secrets in vars
`env_var()`	Environment-specific values and secrets	Must be available in runtime environment
`target`	Branch logic by target/profile	Avoid excessive target-specific model logic that makes behavior hard to test
`config()`	Model-level configs in SQL	Remember config precedence
`dbt_project.yml`	Default project/folder configs	Bad indentation or wrong resource path breaks expectations

Python models

Python models are used for transformations easier in Python than SQL, such as advanced data science-style transformations. Exam reasoning remains dbt-native:

They are still models in the DAG.
They can use dbt.ref() and dbt.source().
They are not a replacement for simple SQL transformations.
Support depends on the adapter/platform.

Git workflow

Expected git skills:

create feature branches;
commit changes;
pull from main/head branch to stay updated;
resolve conflicts;
open pull requests;
use CI before merging.

Exam trap: If the question says your branch is behind main, the correct general action is to pull/reconcile with the head branch, not manually copy files or merge straight into production.

Patterns

Raw table reference → define source + use source().
Model-to-model dependency → use ref().
Repeated business logic → refactor into staging/intermediate models or macros.
Expensive dashboard model → table or incremental.
Append-only high-volume data → incremental.
Static mapping table → seed.
Source overwrites values but history needed → snapshot.
Direct warehouse object permissions required → grants.

Traps

Choosing a materialization only because it is “faster” without considering freshness, cost, and query pattern.
Using hard-coded schema names in dbt models.
Making every model a table, which increases rebuild time/storage.
Making every staging model ephemeral, which can duplicate heavy SQL downstream.
Treating seeds as ingestion for operational data.
Using macros where a model would provide better lineage and testing.
Forgetting that a table model fully rebuilds by default.

Domain 2 — Managing dbt Models Governance

Concepts

Governance is about protecting model consumers. The exam often frames this as: “A downstream team relies on a model and the analytics team needs to change it safely.”

Key features:

Feature	Purpose	Typical scenario
Model access	Controls whether models can be referenced across groups/projects	Keep internal models private while exposing stable interfaces
Groups	Assign ownership to resources	Domain/team ownership and access control
Model versions	Evolve a public model without breaking consumers	Add/change columns while keeping old version available
Deprecation	Signal old version should be migrated away from	v1 still works but users should move to v2
Contracts	Enforce model shape and column data types	Prevent breaking schema changes
Grants	Apply warehouse permissions	Ensure analysts or BI tools can query produced relations
Documentation	Explain ownership, meaning, usage, and downstream impact	Make models discoverable and trustworthy

Model access

Typical access levels:

Access	Use when	Exam implication
Private	Internal implementation detail	Should not be depended on by other groups
Protected	Reusable within defined boundary/project/group pattern	Safer than public for internal shared layers
Public	Stable interface for broad consumption	Requires stronger docs, contracts, versioning discipline

If a model is used by many teams or BI assets, changing it silently is wrong. Use governance features.

Contracts

A model contract defines/enforces expected columns and types. It helps prevent a model from changing shape unexpectedly.

Use contracts when:

downstream consumers rely on a stable schema;
column presence/type matters;
breaking schema changes must fail early;
incremental loads should reject incompatible data before bad data lands.

Contracts are not a substitute for all data tests. They validate shape/type constraints, not every business rule.

Versions and deprecation

Use model versions when you need to introduce a breaking change while keeping existing consumers stable.

Pattern:

Keep v1 available.
Create v2 with changed interface.
Mark v1 deprecated when ready.
Update docs and communicate migration.
Remove old version only after consumers migrate.

Exam trap: Do not “just rename the model” or “change the column in place” when external consumers depend on it.

Grants

Use grants to configure warehouse-level access to dbt-created relations. Grants answer the question: who can query this object after dbt builds it?

Grants are not the same as model access. Model access controls dbt dependency/interface behavior. Grants control database permissions.

Patterns

Downstream team broke after column rename → use model versioning and contracts.
Internal staging model should not be referenced by other teams → mark private/protected and expose a public mart.
Dashboard user cannot query model after deploy → check grants/warehouse permissions.
Team needs ownership boundaries → use groups and model access.
Need to enforce columns and data types → use contract.

Traps

Confusing access with grants.
Assuming contracts replace not_null, unique, or relationship tests.
Deprecating/removing a model before documenting migration.
Using public access for every model.
Versioning non-breaking changes unnecessarily.

Domain 3 — Debugging Data Modeling Errors

Concepts

Debugging questions usually ask where to look first or what category of error it is.

Debugging order:

Read the error message and identify phase: parsing, compilation, execution, test, dependency, connection.
Run dbt compile if SQL/Jinja generation is suspicious.
Inspect compiled SQL in target/compiled or target/run.
Determine whether the failure is pure SQL/warehouse syntax or dbt/Jinja/YAML config.
Fix locally, run targeted commands, test before merging.

Error categories

Symptom	Likely area	Best next action
YAML parsing error	`.yml` indentation, invalid keys, wrong nesting	Validate YAML and resource property structure
Compilation error	Jinja, macro, `ref`, `source`, config	Run `dbt compile`, inspect compiled SQL/error line
Database syntax error	SQL generated by model	Inspect compiled SQL and run/debug in warehouse if needed
Relation not found	Missing `ref/source`, build order, environment, schema, permissions	Check DAG, source config, target, grants
Test failure	Data violates assertion	Inspect failing rows, severity, test condition
Package macro missing	Dependencies not installed or version mismatch	Run/check `dbt deps`, package version, namespace
Connection/profile error	`profiles.yml`, environment variables, credentials	Run `dbt debug`

Compiled SQL

Compiled SQL is critical because dbt model SQL contains Jinja, macros, and adapter-specific rendering. If a warehouse error references SQL that does not look like your model file, inspect compiled SQL.

Use compiled SQL to:

see expanded ref() and source() relation names;
see macro output;
verify is_incremental() branches;
diagnose SQL syntax errors after Jinja rendering;
distinguish dbt compilation problems from warehouse execution problems.

YAML debugging

YAML traps:

wrong indentation;
putting tests under the wrong level;
misspelling columns, description, data_tests/tests depending on version style;
placing model properties under the wrong model name;
using tabs or invalid quoting;
confusing source table properties with model properties.

SQL vs dbt issue

Question clue	More likely
Error mentions YAML parser, undefined macro, `ref` not found	dbt project/config issue
Error mentions warehouse SQL syntax, invalid identifier, function not found	SQL/warehouse issue after compilation
Error only appears in production target	environment/target/schema/permissions issue
Error appears after package upgrade	package compatibility or macro behavior
Error appears only on incremental runs	incremental branch, unique key, schema change, or high-water logic

Fix and test before merge

Good workflow:

Reproduce locally or in dev environment.
Inspect logs/compiled SQL/failing rows.
Apply minimal fix.
Run targeted model and downstream tests.
Open PR and let CI validate before merging.

Patterns

Failing model due to generated SQL → inspect compiled SQL.
source() relation not found → verify source YAML, database/schema/table, target config, and permissions.
YAML error after adding tests → fix indentation and property nesting.
Test failure after new source load → inspect failing rows; maybe source issue, not dbt code.
Error only in CI → compare CI target/profile/env vars/packages to local.

Traps

Editing compiled SQL instead of model SQL.
Assuming all SQL-looking errors are pure SQL; Jinja may have generated bad SQL.
Running all models when a targeted selector would prove the fix faster.
Ignoring package or adapter version compatibility.
Treating failing tests as “dbt broken” instead of data quality signal.

Domain 4 — Troubleshooting and Optimizing dbt Pipelines

Concepts

This domain tests production operation: jobs, DAG failures, retries, CI, clone, and orchestration boundaries.

Key idea: dbt transforms data inside the warehouse. External orchestrators schedule and coordinate broader workflows, but dbt should still own model dependencies and transformation correctness.

`dbt build` vs `dbt run` / `dbt test`

Command	What it does	Use when
`dbt run`	Builds models	You only want transformations
`dbt test`	Runs tests	You want to validate already-built resources
`dbt build`	Runs seeds, models, snapshots, tests in DAG-aware order	Production/CI workflow requiring build + validation
`dbt compile`	Compiles project without executing transformations	Debug generated SQL/Jinja
`dbt debug`	Checks connection/profile/project validity	Credential/profile/setup issues
`dbt retry`	Retries failed nodes from previous invocation when supported	Resume after transient/partial failure

DAG failure management

If an upstream model fails, downstream dependent models should not blindly run. The exam wants you to respect the DAG rather than bypass it.

Good actions:

fix the upstream failure first;
use selectors to rerun affected nodes;
use + operators to include parents/children as needed;
use dbt build for DAG-aware build/test ordering;
use retries or result selectors to avoid unnecessary rebuilds after transient failures.

Bad actions:

manually build downstream tables out of order;
remove tests to make the job pass;
hard-code production relations into dev models;
ignore failing upstream source freshness.

Clone

dbt clone can create environment copies by cloning relations where supported by the data platform. It is useful for efficient environment setup, especially when you do not want to rebuild all expensive models.

Use clone when:

creating a dev/CI environment from existing production objects;
supported by the adapter/data platform;
you need fast setup without full transformation cost.

Do not use clone as a substitute for fixing model logic or tests.

CI and deployment

Strong CI pattern:

feature branch opens PR;
CI runs only changed models and relevant downstream dependencies;
CI defers unchanged upstream references to production artifacts;
tests validate the changed surface area;
merge only after passing checks.

This reduces cost and catches issues without rebuilding the entire warehouse.

Orchestration boundaries

Tool responsibility	Examples
dbt	transform data, build DAG, test, document, select nodes, manage project dependencies
External orchestrator	schedule workflows, run upstream ingestion, coordinate cross-system jobs, alerting/escalation
Warehouse	execute SQL/Python, enforce permissions, store results, optimize physical execution
BI tool	consume curated models, expose dashboards, not transform core warehouse logic

Patterns

Production job failed halfway → use dbt retry or result selectors after fixing transient issue.
CI too expensive → use state selection and defer.
Need dev schema quickly from prod → dbt clone if adapter supports it.
Downstream jobs fail because upstream source is stale → add or enforce source freshness before downstream build.
Pipeline has wrong order → fix DAG dependencies, not scheduler order.

Traps

Treating the orchestrator as the dependency graph instead of dbt DAG.
Rebuilding everything in CI when slim CI solves the scenario.
Using clone on unsupported platforms or expecting clone to transform data.
Retrying blindly without understanding failed node category.
Skipping tests in production because they slow the job.

Domain 5 — Implementing dbt Tests

Concepts

Tests validate assumptions. The exam cares about selecting the right type of test and configuring it appropriately.

Test categories:

Test type	What it is	Use when
Generic test	Reusable parameterized assertion in YAML	`not_null`, `unique`, `relationships`, `accepted_values`, custom generic tests
Singular test	SQL file that returns failing rows	Complex business rule not easily expressed as generic test
Custom generic test	Reusable test macro	Repeated organization-specific assertion
Source test	Test applied to raw source/table/column	Validate assumptions at the ingestion boundary

Built-in generic tests

Test	Validates	Example scenario
`not_null`	Column has no nulls	Primary key cannot be null
`unique`	Values are unique	Order ID should appear once
`relationships`	Foreign key values exist in parent model	`orders.customer_id` exists in `customers.customer_id`
`accepted_values`	Column values are in allowed set	status must be `placed`, `shipped`, `returned`

Test configuration

Important configs:

Config	Why it matters
`severity`	Warn vs error behavior
`where`	Restrict tested rows, often for recent/incremental data
`limit`	Limit returned failures, not necessarily test scope semantics
`store_failures`	Persist failing rows for inspection
`error_if` / `warn_if`	Threshold-based pass/warn/error
`tags`	Select tests for workflows

Exam trap: store_failures helps inspect failures; it does not prevent scanning the table or reuse old test results as a pass.

Testing incremental models

For large incremental models, do not always test all historical rows if the scenario requires efficiency. Options include:

use a where condition to test recent partitions/new rows;
use contracts/warehouse constraints where supported for column shape/type/not-null-like enforcement;
design incremental logic so new rows are validated during load;
run full validation periodically if required.

Testing sources

Use source tests for assumptions about raw data:

IDs not null;
source natural key uniqueness where expected;
accepted values for raw status codes;
relationships between raw source tables where useful;
freshness checks for timeliness.

Freshness is not the same as a generic test. Freshness measures whether data has arrived recently based on a timestamp.

Singular vs generic tests

Situation	Better choice
Column should never be null	Generic `not_null`
Column should be unique	Generic `unique`
Status must be one of a list	Generic `accepted_values`
Foreign key must exist in parent	Generic `relationships`
Revenue should not be negative after refunds logic	Singular or custom generic depending reuse
Same business rule applies to many models	Custom generic test

Patterns

Need failing rows saved for debugging → store_failures.
Need test only recent incremental data → where config.
Need failure threshold → error_if / warn_if.
Need reusable organization rule → custom generic test.
Need one-off complex SQL assertion → singular test.
Need raw table timeliness → source freshness.

Traps

Over-testing every column without business value.
Confusing relationships with uniqueness.
Applying unique to a column that is only unique in combination with another column.
Using accepted_values when values change often and should come from a reference model.
Treating warnings as passing quality gates in production when the business requires failure.
Not testing sources before trusting staging models.

Domain 6 — Leveraging the dbt State

Concepts

State lets dbt compare the current project to artifacts from a previous run, usually production. This powers slim CI and targeted rebuilds.

Key artifacts:

Artifact	Use
`manifest.json`	Project graph, resources, configs, dependencies; used for state comparison
`run_results.json`	Results of prior invocations; used for result selectors/retry-like workflows
compiled/run artifacts	Debugging and inspection

State selectors

Common patterns:

Selector	Meaning
`state:modified`	Resources changed compared with prior state
`state:new`	Resources new compared with prior state
`state:old`	Resources removed/changed relative to old state context
`state:modified+`	Modified resources and downstream children
`+state:modified`	Parents of modified resources and modified resources
`state:modified+ --defer`	Common slim CI pattern: build changed nodes/downstream while deferring unchanged refs to prod

Exact selector behavior can vary by context and dbt version, but the reasoning is stable: use state to avoid unnecessary full rebuilds while still validating impacted areas.

Defer

--defer tells dbt to resolve references to unselected nodes using a previous state/production environment when possible. It is useful in CI because changed models can reference stable production versions of unchanged upstream models.

Use defer when:

CI should not rebuild the entire upstream graph;
production artifacts are available;
unchanged parent models are trusted;
you still want correct references and realistic dependencies.

Avoid defer when:

you need to validate changes to upstream parents too;
production state artifacts are stale or missing;
environment parity is broken.

Result selectors and retry

Result selectors use prior run results to select nodes by outcome, such as failed or errored nodes. dbt retry helps rerun failed work from the previous invocation in supported contexts.

Use result-aware workflows when:

a production job failed on a subset of nodes;
transient warehouse issue was fixed;
you do not want to rerun everything.

Patterns

PR changed one staging model → run state:modified+ to include impacted downstream models.
CI should not rebuild unchanged parents → use --defer to production state.
Previous production run had failed nodes → retry failed subset after fixing transient issue.
Need compare current branch with production → use prior manifest as state.

Traps

Using state selectors without providing a valid state artifact.
Assuming state:modified automatically includes downstream models; use + when needed.
Deferring to production when the changed model depends on changed upstream logic that is not selected.
Treating state as data freshness; state compares project artifacts, not source arrival times.
Confusing source freshness with state freshness.

Domain 7 — Implementing and Maintaining External Dependencies

Concepts

External dependencies include dbt packages, source freshness, and exposures. These features make a project more maintainable and connected to the broader data ecosystem.

Packages

Packages provide reusable macros, models, and tests. They are declared in package configuration and installed with dbt deps.

Use packages when:

a well-maintained package solves a common need;
your team wants standardized macros/tests;
compatibility with your adapter and dbt version is confirmed.

Package risks:

version incompatibility after dbt upgrade;
adapter-specific macro behavior;
hidden SQL complexity;
unmaintained dependencies;
namespace conflicts;
changing package behavior after upgrade.

Best practice:

pin versions intentionally;
review package changelogs before upgrades;
test package macros in CI;
do not blindly trust package output for critical logic;
run dbt deps when dependencies change or are missing.

Source freshness

Source freshness checks whether raw source data is recent enough. It is based on a timestamp column and configured thresholds.

Use source freshness when:

ingestion can be late;
downstream dashboards depend on timely data;
you need alerting/gating before transformations;
SLAs exist for source updates.

Freshness usually includes:

loaded_at_field;
warning threshold;
error threshold.

Exam trap: Freshness is about recency of loaded source data, not whether transformed model columns are unique/not null.

Exposures

Exposures document downstream dependencies such as dashboards, notebooks, ML jobs, or applications. They connect produced data models to business assets.

Use exposures when:

a BI dashboard depends on dbt models;
owners need to know impact before changing a model;
lineage should show downstream consumers;
failures should be assessed by business impact.

Exposures are documentation/lineage resources; they do not grant permissions or execute dashboards.

Patterns

Missing macro from package → run/check dbt deps, package name/version/namespace.
Package works locally but fails in CI → check dependency installation and version parity.
Package macro works on Snowflake but not BigQuery → check adapter compatibility/dispatch.
Raw data arrives late → configure source freshness.
Dashboard owner needs impact awareness → add exposure with owner and dependency refs.

Traps

Treating dbt deps as a model build command.
Upgrading packages without checking breaking changes.
Using source freshness to test transformed model logic.
Assuming exposures enforce access control.
Not documenting package-generated models/macros because they are “external.”

Documentation Skills Across Domains

Although the current public exam page no longer lists documentation as its own domain, documentation remains a repeated source-bank theme and a practical exam skill.

Know how to document:

model descriptions;
column descriptions;
source descriptions;
source table/column metadata;
exposures;
owner information;
docs blocks where reusable long-form docs help;
lineage through ref(), source(), and exposures.

Commands:

dbt docs generate builds documentation artifacts;
dbt docs serve is commonly used locally to view docs;
dbt Cloud has its own docs hosting/viewing workflows.

Exam trap: Documentation does not create dependencies by itself. ref() and source() create lineage edges; descriptions explain them.

5. Service Selection Guide

In dbt, “service selection” mostly means choosing the right dbt feature/resource/command for the scenario.

Resource selection

Scenario	Choose	Not this	Why
Raw table loaded by Fivetran/Airbyte/custom ingestion	Source	Model ref	Raw objects should be declared as sources
Transform source into clean business table	Model	Seed	Models are transformations
Small manually maintained mapping CSV	Seed	Source	Seed is version-controlled static data
Track historical changes in mutable source	Snapshot	Incremental model only	Snapshot records change history
Business dashboard consumes dataset	Mart model + exposure	Ephemeral model	Needs stable queryable relation and downstream lineage
Reused SQL snippet across many models	Macro	Copy-paste SQL	Macro supports DRY logic
Reusable intermediate transformation with lineage/tests	Intermediate model	Macro only	Model is visible in DAG and testable
External package macro needed	Package + `dbt deps`	Manual copy	Dependency management and versioning

Materialization selection

Question signal	Best answer
“Always fresh, lightweight, not heavily queried”	View
“Expensive transformation, queried often, acceptable rebuild”	Table
“Huge append/update dataset, only small changes per run”	Incremental
“Reusable CTE, not queried directly, lightweight”	Ephemeral
“Small static reference data in repo”	Seed
“Need SCD-like change history from mutable source”	Snapshot

Testing selection

Need	Feature
Column must not be null	`not_null` generic test
Column must be unique	`unique` generic test
Child key must exist in parent model	`relationships` generic test
Status values limited to list	`accepted_values` generic test
Complex one-off assertion	Singular test
Same custom assertion reused	Custom generic test
Test recent partition only	Test `where` config
Store failing records	`store_failures`
Warn before failing	`severity`, `warn_if`, `error_if`
Raw source arrival SLA	Source freshness

Governance selection

Need	Feature
Prevent breaking schema shape	Contract
Release breaking change safely	Model versioning
Mark old model version as not preferred	Deprecation
Control which models can be referenced	Access
Define team/domain ownership	Groups
Apply database permissions	Grants
Show dashboard dependency	Exposure

Pipeline command selection

Need	Command / approach
Check connection/project setup	`dbt debug`
Render SQL/Jinja without execution	`dbt compile`
Build models only	`dbt run`
Run tests only	`dbt test`
Build resources and run tests in DAG order	`dbt build`
Load seeds	`dbt seed`
Run snapshots	`dbt snapshot`
Install packages	`dbt deps`
Check source timeliness	`dbt source freshness`
Generate docs	`dbt docs generate`
Resume failed previous invocation	`dbt retry` / result selectors
Run changed models in CI	state selectors + `--defer`

6. Architecture Patterns

Pattern 1 — Clean source-to-mart DAG

Recommended flow:

Define raw systems as sources.
Build one staging model per source table for cleaning/casting/renaming.
Build intermediate models for reusable joins and business logic.
Build marts as facts/dimensions or reporting-ready tables.
Add tests at source, staging, and mart layers.
Add docs and exposures for downstream assets.

Why alternatives are wrong:

Directly querying raw tables in marts hides dependencies and duplicates cleaning logic.
Putting all business logic in one giant model is hard to test and debug.
Hard-coded production schemas break dev/CI portability.

Pattern 2 — High-volume event fact table

Scenario: events has billions of rows and only new events arrive daily.

Recommended solution:

Use an incremental model.
Filter new records using event timestamp or ingestion timestamp.
Use unique_key if updates/deduplication are required.
Add tests on recent data and critical keys.
Schedule periodic full validation or full refresh only when needed.

Wrong alternatives:

Table materialization: full rebuild is expensive.
View: dashboard users pay cost every query.
Ephemeral: duplicates huge SQL downstream.

Pattern 3 — Slowly changing customer attributes

Scenario: source customer table overwrites customer status and region, but analytics needs history.

Recommended solution:

Use a snapshot with a reliable unique key and change detection strategy.
Build current-state and historical models from the snapshot as needed.

Wrong alternatives:

Seed: data is not static.
View: shows only current state.
Incremental alone: can append/update transformed rows but does not automatically preserve historical state unless designed for it.

Pattern 4 — Shared business metric logic

Scenario: revenue calculation appears in many models.

Recommended solution:

If it is a repeated expression, use a macro.
If it is a reusable dataset, create an intermediate model.
Test the resulting model/business rule.

Wrong alternatives:

Copy-paste logic everywhere; leads to inconsistent metrics.
Put all logic in dashboard SQL; bypasses dbt governance and testing.

Pattern 5 — Safe breaking change to a public model

Scenario: downstream teams depend on dim_customer, but you need to rename/remove columns.

Recommended solution:

Create a new model version.
Keep old version available temporarily.
Add/deploy contract where appropriate.
Deprecate old version with docs and migration guidance.
Update exposures and notify owners.

Wrong alternatives:

Rename columns in place.
Remove the old model immediately.
Rely only on a Slack message.

Pattern 6 — Pull request CI for changed models

Scenario: CI is too expensive because it rebuilds everything.

Recommended solution:

Compare branch against production manifest.
Select state:modified+ or equivalent impacted graph.
Use --defer so unchanged parents resolve to production.
Run tests for selected models.

Wrong alternatives:

Skip CI tests.
Rebuild every model every time.
Hard-code production relations in development models.

Pattern 7 — Source data freshness SLA

Scenario: dashboard must fail/alert when upstream orders data is older than expected.

Recommended solution:

Configure source freshness on the source table with warning/error thresholds.
Run dbt source freshness before downstream build or as a separate job.
Add exposure so dashboard impact is visible.

Wrong alternatives:

Use not_null on loaded_at; that checks nulls, not recency.
Add a dashboard note; it does not enforce the SLA.

Pattern 8 — Debugging a failing model after macro/package change

Recommended solution:

Confirm package versions and dbt deps.
Run dbt compile.
Inspect compiled SQL.
Check adapter compatibility.
Run targeted build/tests.

Wrong alternatives:

Edit compiled SQL.
Remove the macro without understanding generated SQL.
Assume warehouse error means package is unrelated.

7. Exam Traps

Misleading wording patterns

Wording	Trap	Better thinking
“The table exists in the warehouse, so dbt should query it directly”	Hard-coded relation	Use source/ref abstraction
“The scheduler can run models in order”	Scheduler dependency workaround	dbt DAG should encode order via refs
“The dashboard is slow; make upstream models ephemeral”	Misuse of ephemeral	Ephemeral can duplicate SQL; use table/incremental for query performance
“Only a few records change, but table is huge”	Full rebuild	Use incremental
“Need history of changed values”	Incremental confusion	Snapshot is usually the signal
“Model contract exists, so no tests are needed”	Contract overreach	Contracts validate shape, tests validate data assumptions
“Source freshness failed; add `not_null` test”	Wrong quality dimension	Freshness is recency, not nullness
“Package macro is popular, so no need to inspect output”	Blind trust	Review compiled SQL and test behavior
“CI is expensive; skip downstream tests”	Weak CI	Use state selection/defer
“Model is public, so users can query it”	Access vs grants confusion	Public controls dbt interface; grants control warehouse permissions

Wrong-but-plausible answers

The best distractors usually contain one true idea but apply it to the wrong problem.

Examples:

dbt deps is necessary for packages, but it will not fix a bad SQL model.
dbt compile helps debug generated SQL, but it does not build data.
store_failures helps inspect failing rows, but it does not make the test pass or reduce scan scope by itself.
A table materialization improves query speed, but it may be too expensive for huge frequently changing data.
A macro reduces repetition, but it can hide lineage if a model is more appropriate.
Grants allow querying, but they do not define model governance boundaries.

Elimination strategy

When stuck, eliminate answers that:

hard-code database/schema names where dbt abstraction is expected;
bypass dbt DAG or lineage;
remove tests/docs/governance to “fix” failures;
use a feature for the wrong layer, such as source freshness for model values;
over-engineer a simple problem;
ignore production consumers;
require manual steps that dbt can automate reproducibly.

8. Quick Memory Rules

Rules of thumb

Model depends on model → ref().
Model depends on raw table → source().
Need raw data timeliness → source freshness.
Need downstream dashboard lineage → exposure.
Need static lookup CSV → seed.
Need history of mutable source → snapshot.
Need stable public schema → contract + versioning.
Need warehouse permissions → grants.
Need changed-only CI → state selectors + defer.
Need generated SQL debugging → compile and inspect compiled SQL.
Need package installed → dbt deps.
Need build + test in dependency order → dbt build.
Need rerun previous failures → dbt retry or result selectors.

Fast mapping

If you see...	Think...
“build order wrong”	missing `ref()`
“raw object dependency”	source YAML + `source()`
“many downstream dashboards”	public/protected model, docs, exposures, grants, contracts
“breaking schema change”	model versions and deprecation
“column removed unexpectedly”	contract
“large append-only fact”	incremental
“large model queried often”	table or incremental
“reusable CTE only”	ephemeral or macro
“mapping values in CSV”	seed
“late ingestion”	source freshness
“failed compiled SQL”	inspect `target/compiled`
“CI should be cheaper”	slim CI with state and defer
“failed production run subset”	retry/result selectors
“package macro not found”	`dbt deps`, namespace, version

Materialization memory aid

View = live logic.
Table = stored result.
Incremental = stored result plus changed-row strategy.
Ephemeral = CTE in someone else’s query.
Seed = tiny static data.
Snapshot = history of change.

Test memory aid

not_null: required field.
unique: one row per key.
relationships: child has parent.
accepted_values: controlled category list.
singular: special SQL assertion.
custom generic: reusable special assertion.
freshness: source arrived on time.

9. Final Revision Notes

Highest-yield review points

Know the difference between ref() and source() perfectly. This is the root of DAG correctness.
Master materialization tradeoffs. Most scenario questions depend on performance vs freshness vs maintainability.
Understand tests as assumptions. Pick the simplest test that expresses the assumption.
Use compiled SQL for debugging. Never debug complex Jinja only by staring at source SQL.
Govern shared models. Public interfaces need versions, contracts, docs, exposures, and sometimes grants.
State is for comparing project artifacts. Freshness is for data arrival. Do not confuse them.
CI should be selective but safe. Use state/defer rather than skipping validation.
Packages require dependency management. dbt deps, version pinning, adapter compatibility, and tests matter.
dbt build is often the production-friendly answer. It handles resources and tests in DAG-aware order.
Avoid manual warehouse shortcuts. The exam rewards reproducible dbt-native solutions.

Last-day revision list

Review these without going deep:

materialization table: view vs table vs incremental vs ephemeral;
command table: debug/compile/run/test/build/deps/source freshness/docs/retry;
governance table: access vs grants vs contracts vs versions;
testing table: generic vs singular vs custom generic;
state/defer workflow for slim CI;
source freshness and exposures;
common YAML indentation/property traps;
package dependency and macro debugging;
source/staging/intermediate/mart architecture;
incremental model failure modes.

Mini practice scenarios

Use these to self-test:

Scenario	Best answer
Mart references `analytics_dev.stg_orders` directly	Replace with `ref('stg_orders')`
Staging reads `raw.stripe.payments` directly	Define source and use `source('stripe','payments')`
Orders source loaded 5 hours late	Source freshness warning/error
Dashboard depends on `fct_revenue`	Add exposure and strong docs/tests
Rename column used by other teams	New model version, deprecate old version
Big append-only event table	Incremental model
Small country-code CSV	Seed
Need customer status history	Snapshot
Warehouse syntax error after macro expansion	Inspect compiled SQL
CI rebuilds entire project	State selection + defer

10. Exam-Day Checklist

Must-know topics

Before exam day, you should be able to explain without notes:

What ref() does and why it controls model dependencies.
What source() does and how source YAML is structured.
When to use view, table, incremental, ephemeral, seed, and snapshot.
How is_incremental() changes SQL behavior.
Why table models rebuild fully unless incremental behavior is used.
How to test primary keys, foreign keys, allowed values, and business rules.
How test configs like where, severity, error_if, warn_if, and store_failures change behavior.
How source freshness differs from source tests.
How to debug YAML, Jinja, SQL, connection, and package failures.
Where compiled SQL lives and why it matters.
What contracts enforce and what they do not enforce.
Difference between model access and grants.
When to use model versions and deprecation.
How groups map to ownership/governance.
What exposures represent.
How packages are declared and installed.
Why adapter/package compatibility matters.
What state artifacts are used for.
How state:modified+ and --defer support slim CI.
When dbt retry or result selectors help.
How dbt build differs from separate run and test commands.

Final confidence checklist

You are ready when you can quickly eliminate wrong answers that:

hard-code production schemas inside model SQL;
hide DAG dependencies;
use seeds for dynamic operational data;
use snapshots for simple performance optimization;
use ephemeral for dashboard-facing heavy models;
treat contracts as full data quality tests;
treat source freshness as uniqueness/null validation;
skip CI instead of narrowing CI;
confuse grants with model access;
change public models without versioning or migration;
ignore compiled SQL during debugging;
blame dbt for a pure warehouse SQL error without checking compiled SQL.

Appendix A — Source Bank Pattern Extraction

The analyzed practice bank emphasized these recurring terms and ideas:

Theme	Why it matters
Sources and freshness	Frequent scenarios around raw dependencies, source YAML, loaded timestamps, and stale data
CI and state	Many questions test selective builds, deferred refs, previous artifacts, and retry behavior
Tests	Questions repeatedly distinguish generic, singular, custom, source, and configured tests
`ref()` and DAG	Core dependency reasoning appears throughout development and troubleshooting
Materializations	View/table/incremental/ephemeral choices are central to performance scenarios
Seeds and snapshots	Common distractors when candidates confuse static data, mutable history, and transformed tables
Packages/macros	External dependencies and reusable logic create compatibility and debugging scenarios
Governance	Access, contracts, versions, and grants protect downstream consumers
Compiled SQL/YAML	Debugging questions often turn on where the error appears and what artifact to inspect

Consolidated wrong-answer patterns from the bank

Hard-code relations instead of using dbt abstractions.
Use a scheduler to fix a missing DAG dependency.
Use a materialization that optimizes the wrong thing.
Treat every failure as a warehouse issue.
Treat every warning as acceptable in production.
Overuse public models and underuse governance.
Ignore downstream dashboard impact.
Rerun the entire project when state-aware selection is available.
Install packages but fail to test package-generated behavior.

Appendix B — Official Reference Notes

Official dbt pages checked while preparing this file:

dbt Analytics Engineering Certification Exam page: https://www.getdbt.com/certifications/analytics-engineer-certification-exam
dbt Certification page: https://www.getdbt.com/dbt-certification
dbt Analytics Engineering Certification Study Guide PDF: https://www.getdbt.com/dbt-assets/certifications/dbt-certificate-study-guide

The official certification page currently lists dbt 1.11 support and 7 exam coverage areas. The older PDF study guide still contains useful v1.7-style topic details and sample question styles; this course prioritizes the current domain structure while preserving still-relevant concepts from the older outline and the analyzed question bank.

lock_open

Unlock the full course

All 20 modules with detailed explanations, code examples, and exam tips.

upgrade Get Full Access auto_stories View Full Course description 3-Page Summary

list_alt

Revision Queue

Wrong Only

Unanswered Only

Needs Attention

8 topics

Question 20 of 1200

Topic 1: Developing dbt models · 28%

A Databricks project has a staging model stg_inventory and a mart model fct_inventory. The mart currently queries analytics_dev.stg_inventory directly. What is the best change before promoting to production?

Replace the hard-coded relation with ref('stg_inventory')

Replace the model with a seed named stg_inventory

Move the SQL into profiles.yml

Add a source freshness check to the mart only

arrow_back Prev

0 correct

0 wrong

1200 left

2% done

grid_view

Questions

20/1200

Correct Wrong Current Unseen Flagged Locked

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

35 free · 1165 locked

Keyboard Shortcuts

1-5 Select option

↵ Check answer

←→ Prev / Next

F Flag for review

track_changes

Plan Progress

Quick Stats

Answered 0/1200

Correct 0

Wrong 0

Remaining 1200

dbt Analytics Engineering

dbt Analytics Engineering Certification — Compressed Exam Course

1. Exam Overview

What the exam is testing

Current official exam structure to know

How to think like the exam

How to use this course

2. Exam Domains

Current domain list

Priority notes

What matters most

3. Start-to-Finish Study Path

Foundation phase: build the dbt mental model

Intermediate phase: learn production patterns

Advanced phase: governance, state, and debugging

Final review phase

4. Core Concepts by Domain

Domain 1 — Developing and Optimizing dbt Models

Concepts

ref() and source()

Materializations

Incremental models

Sources

Modularity and DRY SQL

Jinja, variables, and environment config

Python models

Git workflow

Patterns

Traps

Domain 2 — Managing dbt Models Governance

Concepts

Model access

Contracts

Versions and deprecation

Grants

Patterns

Traps

Domain 3 — Debugging Data Modeling Errors

Concepts

Error categories

Compiled SQL

YAML debugging

SQL vs dbt issue

Fix and test before merge

Patterns

Traps

Domain 4 — Troubleshooting and Optimizing dbt Pipelines

Concepts

dbt build vs dbt run / dbt test

DAG failure management

Clone

CI and deployment

Orchestration boundaries

Patterns

Traps

Domain 5 — Implementing dbt Tests

Concepts

Built-in generic tests

Test configuration

Testing incremental models

Testing sources

Singular vs generic tests

Patterns

Traps

Domain 6 — Leveraging the dbt State

Concepts

State selectors

Defer

Result selectors and retry

Patterns

Traps

Domain 7 — Implementing and Maintaining External Dependencies

Concepts

Packages

Source freshness

Exposures

Patterns

Traps

Documentation Skills Across Domains

5. Service Selection Guide

`ref()` and `source()`

`dbt build` vs `dbt run` / `dbt test`