dbt Analytics Engineering
Compressed Course
dbt Analytics Engineering Certification โ Compressed Exam Course
Exam target: dbt Analytics Engineering Certification Exam
Current alignment used in this guide: dbt Core / dbt platform concepts around dbt 1.11, with legacy v1.7/v1.9 themes retained where still exam-relevant.
Source pattern summary: 1,100 scenario-style practice rows, strongest concentration in developing/optimizing models, testing, debugging, pipeline troubleshooting, governance, state, and external dependencies.
1. Exam Overview
What the exam is testing
The dbt Analytics Engineering Certification validates whether you can use dbt in a production analytics workflow, not just whether you can remember commands. The exam expects you to reason through realistic project situations such as:
- building a clean model DAG from raw sources to marts;
- choosing the right materialization for performance and maintainability;
- using
ref()andsource()correctly instead of hard-coded object names; - testing assumptions about source data and transformed models;
- debugging model, YAML, SQL, dependency, and pipeline failures;
- applying model governance features such as access, groups, versions, contracts, and grants;
- using state-aware workflows for CI, slim CI, retries, and production-safe deployments;
- managing packages, source freshness, exposures, and documentation so downstream users trust the data.
The real exam is scenario-heavy. A typical question gives you a project problem, then asks for the dbt-native fix. The best answer is usually the option that improves lineage, repeatability, maintainability, and production safety without over-engineering.
Current official exam structure to know
The official dbt Analytics Engineering Certification page currently lists these logistics and domains:
| Item | Current detail |
|---|---|
| Duration | 2 hours |
| Questions | 65 |
| Passing score | 65% |
| Supported version | dbt 1.11 |
| Expected background | SQL proficiency and practical dbt experience |
| Question style | Practical and scenario-based; expect multiple-choice and interactive-style reasoning |
Older official guides and older question banks often mention dbt Core 1.7 and include a separate documentation domain. The current public exam page lists 7 domains and no separate documentation domain. Documentation still matters because it appears across sources, model properties, lineage, exposures, and governance scenarios.
How to think like the exam
Think like a production analytics engineer:
- Use dbt abstractions before warehouse-specific shortcuts. Prefer
ref(),source(), model configs, selectors, tests, contracts, and exposures over manual warehouse object names. - Preserve the DAG. Anything that hides lineage, bypasses dependencies, or relies on manually ordered SQL is suspicious.
- Choose the simplest materialization that satisfies usage. Do not make everything incremental or table. Do not make a dashboard-facing mart ephemeral.
- Test assumptions, not implementation trivia. Use tests where they express business rules or data quality guarantees.
- Debug from dbt outward. Check logs, compiled SQL, YAML validity, dependencies, target/profile config, and then warehouse-specific SQL/errors.
- Separate development from production. Use state, defer, clone, CI jobs, and PR review to avoid rebuilding or querying expensive production tables unnecessarily.
- Govern public interfaces. If other teams depend on a model, use model access, versioning, contracts, docs, exposures, and deprecation instead of silently changing schemas.
How to use this course
Read Sections 1โ3 once to orient yourself. Then study each domain in Section 4 and use Sections 5โ10 as quick revision material. The course intentionally merges repeated CSV themes into decision frameworks so you can answer new scenarios rather than memorize answers.
2. Exam Domains
Current domain list
| Priority from source bank | Official/current-style domain | Approx. share in analyzed source | What to master |
|---|---|---|---|
| 1 | Developing and optimizing dbt models | ~26% | materializations, ref, source, sources, modular SQL, Jinja/macros, seeds, snapshots, configs, DAG, git workflow |
| 2 | Implementing dbt tests | ~17% | generic/singular/custom tests, source tests, test config, severity, filtering, assumptions, CI testing |
| 3 | Debugging data modeling errors | ~14% | compiled SQL, logs, YAML errors, SQL vs dbt issues, profiles, dependencies, model fixes |
| 4 | Troubleshooting and optimizing dbt pipelines | ~13% | DAG failures, selectors, retries, clone, scheduling, CI, orchestration boundaries, production failure handling |
| 5 | Managing dbt models governance | ~12% | access, groups, contracts, versions, deprecation, grants, stable public interfaces |
| 6 | Leveraging the dbt state | ~10% | state selectors, result selectors, defer, slim CI, manifest/run_results, modified nodes, retry |
| 7 | Implementing and Maintaining External Dependencies | ~9% | packages, dbt deps, package compatibility, exposures, source freshness, downstream dependency awareness |
Priority notes
The largest share of the analyzed bank is model development and optimization. This makes sense because almost every other topic depends on a correct mental model of dbt resources, DAG lineage, materializations, and configuration precedence.
High-yield cross-domain concepts:
ref()vssource();- model materializations: view, table, incremental, ephemeral;
- tests: generic vs singular vs custom generic;
- source freshness and source testing;
- compiled SQL for debugging;
- YAML indentation and resource properties;
dbt buildvsdbt run+dbt test;- model contracts, versions, access, and groups;
- state selection,
defer, and CI efficiency; - packages and macro compatibility.
What matters most
| If the question says... | The exam usually wants you to think about... |
|---|---|
| Hard-coded schema/table names inside models | Replace with ref() for models or source() for raw tables |
| Model builds in wrong order | Missing ref() dependencies |
| Raw table dependency is not documented/tested | Define a source in YAML and use source() |
| Large append-only table | Incremental model with correct is_incremental() logic |
| Small stable mapping file | Seed |
| Point-in-time history of slowly changing source data | Snapshot |
| Business users query it often | Table or incremental, not ephemeral |
| Reusable CTE not directly queried | Ephemeral or macro depending on purpose |
| Downstream teams depend on the model | Public/protected access, versions, contracts, docs |
| Only changed models should run in CI | State selectors and defer |
| Failed job should resume safely | dbt retry / result selectors, not manual partial guessing |
| Source is late or stale | Source freshness, not a generic model test |
| BI dashboard depends on model | Exposure |
3. Start-to-Finish Study Path
Foundation phase: build the dbt mental model
Learn first:
- What a dbt project is:
dbt_project.yml, models, macros, seeds, snapshots, tests, sources, packages. - How dbt builds a DAG from
ref()andsource(). - Difference between development, CI, staging, and production environments.
- Basic commands:
dbt debug,dbt compile,dbt run,dbt test,dbt build,dbt docs generate,dbt deps,dbt seed,dbt snapshot,dbt source freshness.
Hands-on checklist:
- Create a source YAML file with at least one source and table.
- Create staging models using
source(). - Create intermediate/mart models using
ref(). - Run
dbt compileand inspecttarget/compiled. - Generate docs and inspect lineage.
Intermediate phase: learn production patterns
Focus on:
- materializations and their tradeoffs;
- incremental models and
is_incremental(); - test types and test configuration;
- model property YAML;
- packages and macros;
- git workflow and PR review;
- docs and exposures.
Hands-on checklist:
- Build one view, one table, one incremental model, one ephemeral model.
- Add generic tests:
not_null,unique,relationships,accepted_values. - Add a singular test for a business rule.
- Add source freshness to a source.
- Add an exposure for a dashboard.
- Install a package and run
dbt deps.
Advanced phase: governance, state, and debugging
Focus on:
- model contracts and column-level constraints;
- model versions and deprecation;
- access levels and groups;
- grants for warehouse permissions;
- slim CI using
state:modified+and--defer; - result selectors and retries;
- debugging YAML, SQL, package, and pipeline failures.
Hands-on checklist:
- Add a contract to a model and intentionally break it.
- Add a v2 model and deprecate v1.
- Make a protected/public model and test dependency behavior.
- Run state selection against a previous manifest.
- Debug a failing test from the failure output and compiled SQL.
Final review phase
In the final review, do not reread everything equally. Focus on scenario triggers:
- If the model is downstream-facing, think governance.
- If the model is expensive and append-only, think incremental.
- If dependencies are invisible, think
ref()/source(). - If only changed work should run, think state/defer.
- If source data timeliness is the issue, think freshness.
- If a dashboard breaks, think exposure, model contract/versioning, docs, and lineage.
- If the error is unclear, inspect logs and compiled SQL before changing dbt configs.
4. Core Concepts by Domain
Domain 1 โ Developing and Optimizing dbt Models
Concepts
This is the highest-yield domain. You must know how dbt turns modular SQL files into a dependency graph and production data objects.
Key resource types:
| Resource | What it represents | Common exam signal |
|---|---|---|
| Model | A SQL or Python transformation managed by dbt | Build clean staging/intermediate/mart layers |
| Source | Raw data object loaded outside dbt | Use when referring to raw tables |
| Seed | Static CSV version-controlled in the project | Small lookup/mapping/reference data |
| Snapshot | Point-in-time history of mutable source records | SCD-style history where source overwrites changes |
| Macro | Reusable Jinja logic | Repeated SQL pattern or generated logic |
| Test | Data assertion | Validate uniqueness, not null, relationships, accepted values, business rules |
| Exposure | Downstream asset such as BI dashboard, notebook, ML job | Show downstream dependency and ownership |
| Package | External reusable dbt project | Shared macros/models/tests from dbt Hub or Git |
ref() and source()
ref() is for dbt models. source() is for raw objects loaded outside dbt.
| Need | Use | Why |
|---|---|---|
A mart depends on stg_orders |
{{ ref('stg_orders') }} |
Creates DAG dependency and environment-aware relation name |
A staging model reads raw stripe.payments |
{{ source('stripe', 'payments') }} |
Documents raw dependency and supports source tests/freshness |
A model directly queries analytics_prod.stg_orders |
Replace with ref() |
Hard-coding breaks lineage and environment portability |
A model directly queries raw.shopify.orders |
Replace with source() |
Raw dependencies belong in source YAML |
Exam trap: If the question says models build in the wrong order, do not choose a scheduler workaround. dbt order comes from ref() dependencies.
Materializations
| Materialization | Use when | Avoid when | Exam trap |
|---|---|---|---|
| View | Logic should stay lightweight and always query fresh upstream data | Heavy repeated dashboard queries need fast performance | View does not store results; it can push cost to query time |
| Table | Model is expensive to compute and queried often | Data changes frequently and full rebuild is too expensive | Table rebuilds entire relation each run |
| Incremental | Large table, small new/changed subset per run | Large percentage updates each run or logic cannot isolate changes | Requires correct filter and unique key/strategy where needed |
| Ephemeral | Reusable intermediate logic not queried directly | Many downstream refs create repeated SQL; business users need to query it | Ephemeral is inlined as CTE, not created as a database object |
| Seed | Small static CSV controlled in git | Large dynamic data or frequently updated operational data | Seeds are not an ingestion system |
| Snapshot | Track historical changes in mutable source records | You only need latest state or immutable event data | Snapshot is not a materialization for performance |
Incremental models
Use incremental when a model has many rows and only a small subset is added or changed each run.
Core reasoning:
is_incremental()gates logic that should only run on incremental runs.- The SQL must be valid for both full-refresh and incremental runs.
- Use a reliable event/update timestamp or high-water mark.
- Use
unique_keyand an incremental strategy when records can update. - Use
--full-refreshwhen logic changes require rebuilding historical rows. - Schema changes may require
on_schema_changehandling or full refresh depending on the warehouse and change type.
Common bad answers:
- โIncremental models are always rebuilt.โ False.
- โIncremental models are always best for large tables.โ Not if most rows change each run.
- โYou never need
is_incremental().โ Usually false for selective processing. - โUse ephemeral to make a large dashboard model faster.โ Usually wrong; ephemeral can duplicate heavy SQL.
Sources
A source maps to a raw data location, commonly database + schema, with tables underneath. Use sources to centralize raw object naming and document external dependencies.
Good source YAML includes:
- source name;
- database/schema where needed;
- tables;
- source and column descriptions;
- tests on raw data assumptions;
- freshness where timeliness matters;
- loaded timestamp field for freshness checks.
Exam trap: If multiple raw tables are in the same database/schema, they are usually one source with multiple tables, not multiple sources.
Modularity and DRY SQL
Good dbt modeling decomposes SQL into layers:
| Layer | Typical purpose | Typical materialization |
|---|---|---|
| Staging | Clean, rename, cast, standardize one source | View, sometimes ephemeral/table |
| Intermediate | Reusable transformations and joins | Ephemeral, view, table depending on cost |
| Mart | Business-facing facts/dimensions | Table/incremental for performance |
Use macros for reusable logic patterns, not for hiding business-critical model lineage. Use models when you need DAG visibility and testable transformation steps.
Jinja, variables, and environment config
| Feature | Use case | Trap |
|---|---|---|
var() |
Project variables provided in dbt_project.yml or CLI |
Do not store secrets in vars |
env_var() |
Environment-specific values and secrets | Must be available in runtime environment |
target |
Branch logic by target/profile | Avoid excessive target-specific model logic that makes behavior hard to test |
config() |
Model-level configs in SQL | Remember config precedence |
dbt_project.yml |
Default project/folder configs | Bad indentation or wrong resource path breaks expectations |
Python models
Python models are used for transformations easier in Python than SQL, such as advanced data science-style transformations. Exam reasoning remains dbt-native:
- They are still models in the DAG.
- They can use
dbt.ref()anddbt.source(). - They are not a replacement for simple SQL transformations.
- Support depends on the adapter/platform.
Git workflow
Expected git skills:
- create feature branches;
- commit changes;
- pull from main/head branch to stay updated;
- resolve conflicts;
- open pull requests;
- use CI before merging.
Exam trap: If the question says your branch is behind main, the correct general action is to pull/reconcile with the head branch, not manually copy files or merge straight into production.
Patterns
- Raw table reference โ define source + use
source(). - Model-to-model dependency โ use
ref(). - Repeated business logic โ refactor into staging/intermediate models or macros.
- Expensive dashboard model โ table or incremental.
- Append-only high-volume data โ incremental.
- Static mapping table โ seed.
- Source overwrites values but history needed โ snapshot.
- Direct warehouse object permissions required โ grants.
Traps
- Choosing a materialization only because it is โfasterโ without considering freshness, cost, and query pattern.
- Using hard-coded schema names in dbt models.
- Making every model a table, which increases rebuild time/storage.
- Making every staging model ephemeral, which can duplicate heavy SQL downstream.
- Treating seeds as ingestion for operational data.
- Using macros where a model would provide better lineage and testing.
- Forgetting that a table model fully rebuilds by default.
Domain 2 โ Managing dbt Models Governance
Concepts
Governance is about protecting model consumers. The exam often frames this as: โA downstream team relies on a model and the analytics team needs to change it safely.โ
Key features:
| Feature | Purpose | Typical scenario |
|---|---|---|
| Model access | Controls whether models can be referenced across groups/projects | Keep internal models private while exposing stable interfaces |
| Groups | Assign ownership to resources | Domain/team ownership and access control |
| Model versions | Evolve a public model without breaking consumers | Add/change columns while keeping old version available |
| Deprecation | Signal old version should be migrated away from | v1 still works but users should move to v2 |
| Contracts | Enforce model shape and column data types | Prevent breaking schema changes |
| Grants | Apply warehouse permissions | Ensure analysts or BI tools can query produced relations |
| Documentation | Explain ownership, meaning, usage, and downstream impact | Make models discoverable and trustworthy |
Model access
Typical access levels:
| Access | Use when | Exam implication |
|---|---|---|
| Private | Internal implementation detail | Should not be depended on by other groups |
| Protected | Reusable within defined boundary/project/group pattern | Safer than public for internal shared layers |
| Public | Stable interface for broad consumption | Requires stronger docs, contracts, versioning discipline |
If a model is used by many teams or BI assets, changing it silently is wrong. Use governance features.
Contracts
A model contract defines/enforces expected columns and types. It helps prevent a model from changing shape unexpectedly.
Use contracts when:
- downstream consumers rely on a stable schema;
- column presence/type matters;
- breaking schema changes must fail early;
- incremental loads should reject incompatible data before bad data lands.
Contracts are not a substitute for all data tests. They validate shape/type constraints, not every business rule.
Versions and deprecation
Use model versions when you need to introduce a breaking change while keeping existing consumers stable.
Pattern:
- Keep v1 available.
- Create v2 with changed interface.
- Mark v1 deprecated when ready.
- Update docs and communicate migration.
- Remove old version only after consumers migrate.
Exam trap: Do not โjust rename the modelโ or โchange the column in placeโ when external consumers depend on it.
Grants
Use grants to configure warehouse-level access to dbt-created relations. Grants answer the question: who can query this object after dbt builds it?
Grants are not the same as model access. Model access controls dbt dependency/interface behavior. Grants control database permissions.
Patterns
- Downstream team broke after column rename โ use model versioning and contracts.
- Internal staging model should not be referenced by other teams โ mark private/protected and expose a public mart.
- Dashboard user cannot query model after deploy โ check grants/warehouse permissions.
- Team needs ownership boundaries โ use groups and model access.
- Need to enforce columns and data types โ use contract.
Traps
- Confusing access with grants.
- Assuming contracts replace
not_null,unique, or relationship tests. - Deprecating/removing a model before documenting migration.
- Using public access for every model.
- Versioning non-breaking changes unnecessarily.
Domain 3 โ Debugging Data Modeling Errors
Concepts
Debugging questions usually ask where to look first or what category of error it is.
Debugging order:
- Read the error message and identify phase: parsing, compilation, execution, test, dependency, connection.
- Run
dbt compileif SQL/Jinja generation is suspicious. - Inspect compiled SQL in
target/compiledortarget/run. - Determine whether the failure is pure SQL/warehouse syntax or dbt/Jinja/YAML config.
- Fix locally, run targeted commands, test before merging.
Error categories
| Symptom | Likely area | Best next action |
|---|---|---|
| YAML parsing error | .yml indentation, invalid keys, wrong nesting |
Validate YAML and resource property structure |
| Compilation error | Jinja, macro, ref, source, config |
Run dbt compile, inspect compiled SQL/error line |
| Database syntax error | SQL generated by model | Inspect compiled SQL and run/debug in warehouse if needed |
| Relation not found | Missing ref/source, build order, environment, schema, permissions |
Check DAG, source config, target, grants |
| Test failure | Data violates assertion | Inspect failing rows, severity, test condition |
| Package macro missing | Dependencies not installed or version mismatch | Run/check dbt deps, package version, namespace |
| Connection/profile error | profiles.yml, environment variables, credentials |
Run dbt debug |
Compiled SQL
Compiled SQL is critical because dbt model SQL contains Jinja, macros, and adapter-specific rendering. If a warehouse error references SQL that does not look like your model file, inspect compiled SQL.
Use compiled SQL to:
- see expanded
ref()andsource()relation names; - see macro output;
- verify
is_incremental()branches; - diagnose SQL syntax errors after Jinja rendering;
- distinguish dbt compilation problems from warehouse execution problems.
YAML debugging
YAML traps:
- wrong indentation;
- putting
testsunder the wrong level; - misspelling
columns,description,data_tests/testsdepending on version style; - placing model properties under the wrong model name;
- using tabs or invalid quoting;
- confusing source table properties with model properties.
SQL vs dbt issue
| Question clue | More likely |
|---|---|
Error mentions YAML parser, undefined macro, ref not found |
dbt project/config issue |
| Error mentions warehouse SQL syntax, invalid identifier, function not found | SQL/warehouse issue after compilation |
| Error only appears in production target | environment/target/schema/permissions issue |
| Error appears after package upgrade | package compatibility or macro behavior |
| Error appears only on incremental runs | incremental branch, unique key, schema change, or high-water logic |
Fix and test before merge
Good workflow:
- Reproduce locally or in dev environment.
- Inspect logs/compiled SQL/failing rows.
- Apply minimal fix.
- Run targeted model and downstream tests.
- Open PR and let CI validate before merging.
Patterns
- Failing model due to generated SQL โ inspect compiled SQL.
source()relation not found โ verify source YAML, database/schema/table, target config, and permissions.- YAML error after adding tests โ fix indentation and property nesting.
- Test failure after new source load โ inspect failing rows; maybe source issue, not dbt code.
- Error only in CI โ compare CI target/profile/env vars/packages to local.
Traps
- Editing compiled SQL instead of model SQL.
- Assuming all SQL-looking errors are pure SQL; Jinja may have generated bad SQL.
- Running all models when a targeted selector would prove the fix faster.
- Ignoring package or adapter version compatibility.
- Treating failing tests as โdbt brokenโ instead of data quality signal.
Domain 4 โ Troubleshooting and Optimizing dbt Pipelines
Concepts
This domain tests production operation: jobs, DAG failures, retries, CI, clone, and orchestration boundaries.
Key idea: dbt transforms data inside the warehouse. External orchestrators schedule and coordinate broader workflows, but dbt should still own model dependencies and transformation correctness.
dbt build vs dbt run / dbt test
| Command | What it does | Use when |
|---|---|---|
dbt run |
Builds models | You only want transformations |
dbt test |
Runs tests | You want to validate already-built resources |
dbt build |
Runs seeds, models, snapshots, tests in DAG-aware order | Production/CI workflow requiring build + validation |
dbt compile |
Compiles project without executing transformations | Debug generated SQL/Jinja |
dbt debug |
Checks connection/profile/project validity | Credential/profile/setup issues |
dbt retry |
Retries failed nodes from previous invocation when supported | Resume after transient/partial failure |
DAG failure management
If an upstream model fails, downstream dependent models should not blindly run. The exam wants you to respect the DAG rather than bypass it.
Good actions:
- fix the upstream failure first;
- use selectors to rerun affected nodes;
- use
+operators to include parents/children as needed; - use
dbt buildfor DAG-aware build/test ordering; - use retries or result selectors to avoid unnecessary rebuilds after transient failures.
Bad actions:
- manually build downstream tables out of order;
- remove tests to make the job pass;
- hard-code production relations into dev models;
- ignore failing upstream source freshness.
Clone
dbt clone can create environment copies by cloning relations where supported by the data platform. It is useful for efficient environment setup, especially when you do not want to rebuild all expensive models.
Use clone when:
- creating a dev/CI environment from existing production objects;
- supported by the adapter/data platform;
- you need fast setup without full transformation cost.
Do not use clone as a substitute for fixing model logic or tests.
CI and deployment
Strong CI pattern:
- feature branch opens PR;
- CI runs only changed models and relevant downstream dependencies;
- CI defers unchanged upstream references to production artifacts;
- tests validate the changed surface area;
- merge only after passing checks.
This reduces cost and catches issues without rebuilding the entire warehouse.
Orchestration boundaries
| Tool responsibility | Examples |
|---|---|
| dbt | transform data, build DAG, test, document, select nodes, manage project dependencies |
| External orchestrator | schedule workflows, run upstream ingestion, coordinate cross-system jobs, alerting/escalation |
| Warehouse | execute SQL/Python, enforce permissions, store results, optimize physical execution |
| BI tool | consume curated models, expose dashboards, not transform core warehouse logic |
Patterns
- Production job failed halfway โ use
dbt retryor result selectors after fixing transient issue. - CI too expensive โ use state selection and defer.
- Need dev schema quickly from prod โ
dbt cloneif adapter supports it. - Downstream jobs fail because upstream source is stale โ add or enforce source freshness before downstream build.
- Pipeline has wrong order โ fix DAG dependencies, not scheduler order.
Traps
- Treating the orchestrator as the dependency graph instead of dbt DAG.
- Rebuilding everything in CI when slim CI solves the scenario.
- Using clone on unsupported platforms or expecting clone to transform data.
- Retrying blindly without understanding failed node category.
- Skipping tests in production because they slow the job.
Domain 5 โ Implementing dbt Tests
Concepts
Tests validate assumptions. The exam cares about selecting the right type of test and configuring it appropriately.
Test categories:
| Test type | What it is | Use when |
|---|---|---|
| Generic test | Reusable parameterized assertion in YAML | not_null, unique, relationships, accepted_values, custom generic tests |
| Singular test | SQL file that returns failing rows | Complex business rule not easily expressed as generic test |
| Custom generic test | Reusable test macro | Repeated organization-specific assertion |
| Source test | Test applied to raw source/table/column | Validate assumptions at the ingestion boundary |
Built-in generic tests
| Test | Validates | Example scenario |
|---|---|---|
not_null |
Column has no nulls | Primary key cannot be null |
unique |
Values are unique | Order ID should appear once |
relationships |
Foreign key values exist in parent model | orders.customer_id exists in customers.customer_id |
accepted_values |
Column values are in allowed set | status must be placed, shipped, returned |
Test configuration
Important configs:
| Config | Why it matters |
|---|---|
severity |
Warn vs error behavior |
where |
Restrict tested rows, often for recent/incremental data |
limit |
Limit returned failures, not necessarily test scope semantics |
store_failures |
Persist failing rows for inspection |
error_if / warn_if |
Threshold-based pass/warn/error |
tags |
Select tests for workflows |
Exam trap: store_failures helps inspect failures; it does not prevent scanning the table or reuse old test results as a pass.
Testing incremental models
For large incremental models, do not always test all historical rows if the scenario requires efficiency. Options include:
- use a
wherecondition to test recent partitions/new rows; - use contracts/warehouse constraints where supported for column shape/type/not-null-like enforcement;
- design incremental logic so new rows are validated during load;
- run full validation periodically if required.
Testing sources
Use source tests for assumptions about raw data:
- IDs not null;
- source natural key uniqueness where expected;
- accepted values for raw status codes;
- relationships between raw source tables where useful;
- freshness checks for timeliness.
Freshness is not the same as a generic test. Freshness measures whether data has arrived recently based on a timestamp.
Singular vs generic tests
| Situation | Better choice |
|---|---|
| Column should never be null | Generic not_null |
| Column should be unique | Generic unique |
| Status must be one of a list | Generic accepted_values |
| Foreign key must exist in parent | Generic relationships |
| Revenue should not be negative after refunds logic | Singular or custom generic depending reuse |
| Same business rule applies to many models | Custom generic test |
Patterns
- Need failing rows saved for debugging โ
store_failures. - Need test only recent incremental data โ
whereconfig. - Need failure threshold โ
error_if/warn_if. - Need reusable organization rule โ custom generic test.
- Need one-off complex SQL assertion โ singular test.
- Need raw table timeliness โ source freshness.
Traps
- Over-testing every column without business value.
- Confusing
relationshipswith uniqueness. - Applying
uniqueto a column that is only unique in combination with another column. - Using
accepted_valueswhen values change often and should come from a reference model. - Treating warnings as passing quality gates in production when the business requires failure.
- Not testing sources before trusting staging models.
Domain 6 โ Leveraging the dbt State
Concepts
State lets dbt compare the current project to artifacts from a previous run, usually production. This powers slim CI and targeted rebuilds.
Key artifacts:
| Artifact | Use |
|---|---|
manifest.json |
Project graph, resources, configs, dependencies; used for state comparison |
run_results.json |
Results of prior invocations; used for result selectors/retry-like workflows |
| compiled/run artifacts | Debugging and inspection |
State selectors
Common patterns:
| Selector | Meaning |
|---|---|
state:modified |
Resources changed compared with prior state |
state:new |
Resources new compared with prior state |
state:old |
Resources removed/changed relative to old state context |
state:modified+ |
Modified resources and downstream children |
+state:modified |
Parents of modified resources and modified resources |
state:modified+ --defer |
Common slim CI pattern: build changed nodes/downstream while deferring unchanged refs to prod |
Exact selector behavior can vary by context and dbt version, but the reasoning is stable: use state to avoid unnecessary full rebuilds while still validating impacted areas.
Defer
--defer tells dbt to resolve references to unselected nodes using a previous state/production environment when possible. It is useful in CI because changed models can reference stable production versions of unchanged upstream models.
Use defer when:
- CI should not rebuild the entire upstream graph;
- production artifacts are available;
- unchanged parent models are trusted;
- you still want correct references and realistic dependencies.
Avoid defer when:
- you need to validate changes to upstream parents too;
- production state artifacts are stale or missing;
- environment parity is broken.
Result selectors and retry
Result selectors use prior run results to select nodes by outcome, such as failed or errored nodes. dbt retry helps rerun failed work from the previous invocation in supported contexts.
Use result-aware workflows when:
- a production job failed on a subset of nodes;
- transient warehouse issue was fixed;
- you do not want to rerun everything.
Patterns
- PR changed one staging model โ run
state:modified+to include impacted downstream models. - CI should not rebuild unchanged parents โ use
--deferto production state. - Previous production run had failed nodes โ retry failed subset after fixing transient issue.
- Need compare current branch with production โ use prior manifest as state.
Traps
- Using state selectors without providing a valid state artifact.
- Assuming
state:modifiedautomatically includes downstream models; use+when needed. - Deferring to production when the changed model depends on changed upstream logic that is not selected.
- Treating state as data freshness; state compares project artifacts, not source arrival times.
- Confusing source freshness with state freshness.
Domain 7 โ Implementing and Maintaining External Dependencies
Concepts
External dependencies include dbt packages, source freshness, and exposures. These features make a project more maintainable and connected to the broader data ecosystem.
Packages
Packages provide reusable macros, models, and tests. They are declared in package configuration and installed with dbt deps.
Use packages when:
- a well-maintained package solves a common need;
- your team wants standardized macros/tests;
- compatibility with your adapter and dbt version is confirmed.
Package risks:
- version incompatibility after dbt upgrade;
- adapter-specific macro behavior;
- hidden SQL complexity;
- unmaintained dependencies;
- namespace conflicts;
- changing package behavior after upgrade.
Best practice:
- pin versions intentionally;
- review package changelogs before upgrades;
- test package macros in CI;
- do not blindly trust package output for critical logic;
- run
dbt depswhen dependencies change or are missing.
Source freshness
Source freshness checks whether raw source data is recent enough. It is based on a timestamp column and configured thresholds.
Use source freshness when:
- ingestion can be late;
- downstream dashboards depend on timely data;
- you need alerting/gating before transformations;
- SLAs exist for source updates.
Freshness usually includes:
loaded_at_field;- warning threshold;
- error threshold.
Exam trap: Freshness is about recency of loaded source data, not whether transformed model columns are unique/not null.
Exposures
Exposures document downstream dependencies such as dashboards, notebooks, ML jobs, or applications. They connect produced data models to business assets.
Use exposures when:
- a BI dashboard depends on dbt models;
- owners need to know impact before changing a model;
- lineage should show downstream consumers;
- failures should be assessed by business impact.
Exposures are documentation/lineage resources; they do not grant permissions or execute dashboards.
Patterns
- Missing macro from package โ run/check
dbt deps, package name/version/namespace. - Package works locally but fails in CI โ check dependency installation and version parity.
- Package macro works on Snowflake but not BigQuery โ check adapter compatibility/dispatch.
- Raw data arrives late โ configure source freshness.
- Dashboard owner needs impact awareness โ add exposure with owner and dependency refs.
Traps
- Treating
dbt depsas a model build command. - Upgrading packages without checking breaking changes.
- Using source freshness to test transformed model logic.
- Assuming exposures enforce access control.
- Not documenting package-generated models/macros because they are โexternal.โ
Documentation Skills Across Domains
Although the current public exam page no longer lists documentation as its own domain, documentation remains a repeated source-bank theme and a practical exam skill.
Know how to document:
- model descriptions;
- column descriptions;
- source descriptions;
- source table/column metadata;
- exposures;
- owner information;
- docs blocks where reusable long-form docs help;
- lineage through
ref(),source(), and exposures.
Commands:
dbt docs generatebuilds documentation artifacts;dbt docs serveis commonly used locally to view docs;- dbt Cloud has its own docs hosting/viewing workflows.
Exam trap: Documentation does not create dependencies by itself. ref() and source() create lineage edges; descriptions explain them.
5. Service Selection Guide
In dbt, โservice selectionโ mostly means choosing the right dbt feature/resource/command for the scenario.
Resource selection
| Scenario | Choose | Not this | Why |
|---|---|---|---|
| Raw table loaded by Fivetran/Airbyte/custom ingestion | Source | Model ref | Raw objects should be declared as sources |
| Transform source into clean business table | Model | Seed | Models are transformations |
| Small manually maintained mapping CSV | Seed | Source | Seed is version-controlled static data |
| Track historical changes in mutable source | Snapshot | Incremental model only | Snapshot records change history |
| Business dashboard consumes dataset | Mart model + exposure | Ephemeral model | Needs stable queryable relation and downstream lineage |
| Reused SQL snippet across many models | Macro | Copy-paste SQL | Macro supports DRY logic |
| Reusable intermediate transformation with lineage/tests | Intermediate model | Macro only | Model is visible in DAG and testable |
| External package macro needed | Package + dbt deps |
Manual copy | Dependency management and versioning |
Materialization selection
| Question signal | Best answer |
|---|---|
| โAlways fresh, lightweight, not heavily queriedโ | View |
| โExpensive transformation, queried often, acceptable rebuildโ | Table |
| โHuge append/update dataset, only small changes per runโ | Incremental |
| โReusable CTE, not queried directly, lightweightโ | Ephemeral |
| โSmall static reference data in repoโ | Seed |
| โNeed SCD-like change history from mutable sourceโ | Snapshot |
Testing selection
| Need | Feature |
|---|---|
| Column must not be null | not_null generic test |
| Column must be unique | unique generic test |
| Child key must exist in parent model | relationships generic test |
| Status values limited to list | accepted_values generic test |
| Complex one-off assertion | Singular test |
| Same custom assertion reused | Custom generic test |
| Test recent partition only | Test where config |
| Store failing records | store_failures |
| Warn before failing | severity, warn_if, error_if |
| Raw source arrival SLA | Source freshness |
Governance selection
| Need | Feature |
|---|---|
| Prevent breaking schema shape | Contract |
| Release breaking change safely | Model versioning |
| Mark old model version as not preferred | Deprecation |
| Control which models can be referenced | Access |
| Define team/domain ownership | Groups |
| Apply database permissions | Grants |
| Show dashboard dependency | Exposure |
Pipeline command selection
| Need | Command / approach |
|---|---|
| Check connection/project setup | dbt debug |
| Render SQL/Jinja without execution | dbt compile |
| Build models only | dbt run |
| Run tests only | dbt test |
| Build resources and run tests in DAG order | dbt build |
| Load seeds | dbt seed |
| Run snapshots | dbt snapshot |
| Install packages | dbt deps |
| Check source timeliness | dbt source freshness |
| Generate docs | dbt docs generate |
| Resume failed previous invocation | dbt retry / result selectors |
| Run changed models in CI | state selectors + --defer |
6. Architecture Patterns
Pattern 1 โ Clean source-to-mart DAG
Recommended flow:
- Define raw systems as sources.
- Build one staging model per source table for cleaning/casting/renaming.
- Build intermediate models for reusable joins and business logic.
- Build marts as facts/dimensions or reporting-ready tables.
- Add tests at source, staging, and mart layers.
- Add docs and exposures for downstream assets.
Why alternatives are wrong:
- Directly querying raw tables in marts hides dependencies and duplicates cleaning logic.
- Putting all business logic in one giant model is hard to test and debug.
- Hard-coded production schemas break dev/CI portability.
Pattern 2 โ High-volume event fact table
Scenario: events has billions of rows and only new events arrive daily.
Recommended solution:
- Use an incremental model.
- Filter new records using event timestamp or ingestion timestamp.
- Use
unique_keyif updates/deduplication are required. - Add tests on recent data and critical keys.
- Schedule periodic full validation or full refresh only when needed.
Wrong alternatives:
- Table materialization: full rebuild is expensive.
- View: dashboard users pay cost every query.
- Ephemeral: duplicates huge SQL downstream.
Pattern 3 โ Slowly changing customer attributes
Scenario: source customer table overwrites customer status and region, but analytics needs history.
Recommended solution:
- Use a snapshot with a reliable unique key and change detection strategy.
- Build current-state and historical models from the snapshot as needed.
Wrong alternatives:
- Seed: data is not static.
- View: shows only current state.
- Incremental alone: can append/update transformed rows but does not automatically preserve historical state unless designed for it.
Pattern 4 โ Shared business metric logic
Scenario: revenue calculation appears in many models.
Recommended solution:
- If it is a repeated expression, use a macro.
- If it is a reusable dataset, create an intermediate model.
- Test the resulting model/business rule.
Wrong alternatives:
- Copy-paste logic everywhere; leads to inconsistent metrics.
- Put all logic in dashboard SQL; bypasses dbt governance and testing.
Pattern 5 โ Safe breaking change to a public model
Scenario: downstream teams depend on dim_customer, but you need to rename/remove columns.
Recommended solution:
- Create a new model version.
- Keep old version available temporarily.
- Add/deploy contract where appropriate.
- Deprecate old version with docs and migration guidance.
- Update exposures and notify owners.
Wrong alternatives:
- Rename columns in place.
- Remove the old model immediately.
- Rely only on a Slack message.
Pattern 6 โ Pull request CI for changed models
Scenario: CI is too expensive because it rebuilds everything.
Recommended solution:
- Compare branch against production manifest.
- Select
state:modified+or equivalent impacted graph. - Use
--deferso unchanged parents resolve to production. - Run tests for selected models.
Wrong alternatives:
- Skip CI tests.
- Rebuild every model every time.
- Hard-code production relations in development models.
Pattern 7 โ Source data freshness SLA
Scenario: dashboard must fail/alert when upstream orders data is older than expected.
Recommended solution:
- Configure source freshness on the source table with warning/error thresholds.
- Run
dbt source freshnessbefore downstream build or as a separate job. - Add exposure so dashboard impact is visible.
Wrong alternatives:
- Use
not_nullonloaded_at; that checks nulls, not recency. - Add a dashboard note; it does not enforce the SLA.
Pattern 8 โ Debugging a failing model after macro/package change
Recommended solution:
- Confirm package versions and
dbt deps. - Run
dbt compile. - Inspect compiled SQL.
- Check adapter compatibility.
- Run targeted build/tests.
Wrong alternatives:
- Edit compiled SQL.
- Remove the macro without understanding generated SQL.
- Assume warehouse error means package is unrelated.
7. Exam Traps
Misleading wording patterns
| Wording | Trap | Better thinking |
|---|---|---|
| โThe table exists in the warehouse, so dbt should query it directlyโ | Hard-coded relation | Use source/ref abstraction |
| โThe scheduler can run models in orderโ | Scheduler dependency workaround | dbt DAG should encode order via refs |
| โThe dashboard is slow; make upstream models ephemeralโ | Misuse of ephemeral | Ephemeral can duplicate SQL; use table/incremental for query performance |
| โOnly a few records change, but table is hugeโ | Full rebuild | Use incremental |
| โNeed history of changed valuesโ | Incremental confusion | Snapshot is usually the signal |
| โModel contract exists, so no tests are neededโ | Contract overreach | Contracts validate shape, tests validate data assumptions |
โSource freshness failed; add not_null testโ |
Wrong quality dimension | Freshness is recency, not nullness |
| โPackage macro is popular, so no need to inspect outputโ | Blind trust | Review compiled SQL and test behavior |
| โCI is expensive; skip downstream testsโ | Weak CI | Use state selection/defer |
| โModel is public, so users can query itโ | Access vs grants confusion | Public controls dbt interface; grants control warehouse permissions |
Wrong-but-plausible answers
The best distractors usually contain one true idea but apply it to the wrong problem.
Examples:
dbt depsis necessary for packages, but it will not fix a bad SQL model.dbt compilehelps debug generated SQL, but it does not build data.store_failureshelps inspect failing rows, but it does not make the test pass or reduce scan scope by itself.- A table materialization improves query speed, but it may be too expensive for huge frequently changing data.
- A macro reduces repetition, but it can hide lineage if a model is more appropriate.
- Grants allow querying, but they do not define model governance boundaries.
Elimination strategy
When stuck, eliminate answers that:
- hard-code database/schema names where dbt abstraction is expected;
- bypass dbt DAG or lineage;
- remove tests/docs/governance to โfixโ failures;
- use a feature for the wrong layer, such as source freshness for model values;
- over-engineer a simple problem;
- ignore production consumers;
- require manual steps that dbt can automate reproducibly.
8. Quick Memory Rules
Rules of thumb
- Model depends on model โ
ref(). - Model depends on raw table โ
source(). - Need raw data timeliness โ source freshness.
- Need downstream dashboard lineage โ exposure.
- Need static lookup CSV โ seed.
- Need history of mutable source โ snapshot.
- Need stable public schema โ contract + versioning.
- Need warehouse permissions โ grants.
- Need changed-only CI โ state selectors + defer.
- Need generated SQL debugging โ compile and inspect compiled SQL.
- Need package installed โ
dbt deps. - Need build + test in dependency order โ
dbt build. - Need rerun previous failures โ
dbt retryor result selectors.
Fast mapping
| If you see... | Think... |
|---|---|
| โbuild order wrongโ | missing ref() |
| โraw object dependencyโ | source YAML + source() |
| โmany downstream dashboardsโ | public/protected model, docs, exposures, grants, contracts |
| โbreaking schema changeโ | model versions and deprecation |
| โcolumn removed unexpectedlyโ | contract |
| โlarge append-only factโ | incremental |
| โlarge model queried oftenโ | table or incremental |
| โreusable CTE onlyโ | ephemeral or macro |
| โmapping values in CSVโ | seed |
| โlate ingestionโ | source freshness |
| โfailed compiled SQLโ | inspect target/compiled |
| โCI should be cheaperโ | slim CI with state and defer |
| โfailed production run subsetโ | retry/result selectors |
| โpackage macro not foundโ | dbt deps, namespace, version |
Materialization memory aid
- View = live logic.
- Table = stored result.
- Incremental = stored result plus changed-row strategy.
- Ephemeral = CTE in someone elseโs query.
- Seed = tiny static data.
- Snapshot = history of change.
Test memory aid
- not_null: required field.
- unique: one row per key.
- relationships: child has parent.
- accepted_values: controlled category list.
- singular: special SQL assertion.
- custom generic: reusable special assertion.
- freshness: source arrived on time.
9. Final Revision Notes
Highest-yield review points
- Know the difference between
ref()andsource()perfectly. This is the root of DAG correctness. - Master materialization tradeoffs. Most scenario questions depend on performance vs freshness vs maintainability.
- Understand tests as assumptions. Pick the simplest test that expresses the assumption.
- Use compiled SQL for debugging. Never debug complex Jinja only by staring at source SQL.
- Govern shared models. Public interfaces need versions, contracts, docs, exposures, and sometimes grants.
- State is for comparing project artifacts. Freshness is for data arrival. Do not confuse them.
- CI should be selective but safe. Use state/defer rather than skipping validation.
- Packages require dependency management.
dbt deps, version pinning, adapter compatibility, and tests matter. - dbt build is often the production-friendly answer. It handles resources and tests in DAG-aware order.
- Avoid manual warehouse shortcuts. The exam rewards reproducible dbt-native solutions.
Last-day revision list
Review these without going deep:
- materialization table: view vs table vs incremental vs ephemeral;
- command table: debug/compile/run/test/build/deps/source freshness/docs/retry;
- governance table: access vs grants vs contracts vs versions;
- testing table: generic vs singular vs custom generic;
- state/defer workflow for slim CI;
- source freshness and exposures;
- common YAML indentation/property traps;
- package dependency and macro debugging;
- source/staging/intermediate/mart architecture;
- incremental model failure modes.
Mini practice scenarios
Use these to self-test:
| Scenario | Best answer |
|---|---|
Mart references analytics_dev.stg_orders directly |
Replace with ref('stg_orders') |
Staging reads raw.stripe.payments directly |
Define source and use source('stripe','payments') |
| Orders source loaded 5 hours late | Source freshness warning/error |
Dashboard depends on fct_revenue |
Add exposure and strong docs/tests |
| Rename column used by other teams | New model version, deprecate old version |
| Big append-only event table | Incremental model |
| Small country-code CSV | Seed |
| Need customer status history | Snapshot |
| Warehouse syntax error after macro expansion | Inspect compiled SQL |
| CI rebuilds entire project | State selection + defer |
10. Exam-Day Checklist
Must-know topics
Before exam day, you should be able to explain without notes:
- What
ref()does and why it controls model dependencies. - What
source()does and how source YAML is structured. - When to use view, table, incremental, ephemeral, seed, and snapshot.
- How
is_incremental()changes SQL behavior. - Why table models rebuild fully unless incremental behavior is used.
- How to test primary keys, foreign keys, allowed values, and business rules.
- How test configs like
where,severity,error_if,warn_if, andstore_failureschange behavior. - How source freshness differs from source tests.
- How to debug YAML, Jinja, SQL, connection, and package failures.
- Where compiled SQL lives and why it matters.
- What contracts enforce and what they do not enforce.
- Difference between model access and grants.
- When to use model versions and deprecation.
- How groups map to ownership/governance.
- What exposures represent.
- How packages are declared and installed.
- Why adapter/package compatibility matters.
- What state artifacts are used for.
- How
state:modified+and--defersupport slim CI. - When
dbt retryor result selectors help. - How
dbt builddiffers from separaterunandtestcommands.
Final confidence checklist
You are ready when you can quickly eliminate wrong answers that:
- hard-code production schemas inside model SQL;
- hide DAG dependencies;
- use seeds for dynamic operational data;
- use snapshots for simple performance optimization;
- use ephemeral for dashboard-facing heavy models;
- treat contracts as full data quality tests;
- treat source freshness as uniqueness/null validation;
- skip CI instead of narrowing CI;
- confuse grants with model access;
- change public models without versioning or migration;
- ignore compiled SQL during debugging;
- blame dbt for a pure warehouse SQL error without checking compiled SQL.
Appendix A โ Source Bank Pattern Extraction
The analyzed practice bank emphasized these recurring terms and ideas:
| Theme | Why it matters |
|---|---|
| Sources and freshness | Frequent scenarios around raw dependencies, source YAML, loaded timestamps, and stale data |
| CI and state | Many questions test selective builds, deferred refs, previous artifacts, and retry behavior |
| Tests | Questions repeatedly distinguish generic, singular, custom, source, and configured tests |
ref() and DAG |
Core dependency reasoning appears throughout development and troubleshooting |
| Materializations | View/table/incremental/ephemeral choices are central to performance scenarios |
| Seeds and snapshots | Common distractors when candidates confuse static data, mutable history, and transformed tables |
| Packages/macros | External dependencies and reusable logic create compatibility and debugging scenarios |
| Governance | Access, contracts, versions, and grants protect downstream consumers |
| Compiled SQL/YAML | Debugging questions often turn on where the error appears and what artifact to inspect |
Consolidated wrong-answer patterns from the bank
- Hard-code relations instead of using dbt abstractions.
- Use a scheduler to fix a missing DAG dependency.
- Use a materialization that optimizes the wrong thing.
- Treat every failure as a warehouse issue.
- Treat every warning as acceptable in production.
- Overuse public models and underuse governance.
- Ignore downstream dashboard impact.
- Rerun the entire project when state-aware selection is available.
- Install packages but fail to test package-generated behavior.
Appendix B โ Official Reference Notes
Official dbt pages checked while preparing this file:
- dbt Analytics Engineering Certification Exam page:
https://www.getdbt.com/certifications/analytics-engineer-certification-exam - dbt Certification page:
https://www.getdbt.com/dbt-certification - dbt Analytics Engineering Certification Study Guide PDF:
https://www.getdbt.com/dbt-assets/certifications/dbt-certificate-study-guide
The official certification page currently lists dbt 1.11 support and 7 exam coverage areas. The older PDF study guide still contains useful v1.7-style topic details and sample question styles; this course prioritizes the current domain structure while preserving still-relevant concepts from the older outline and the analyzed question bank.
Unlock the full course
All 20 modules with detailed explanations, code examples, and exam tips.
A Databricks project has a staging model stg_ad_spend and a mart model fct_ad_spend. The mart currently queries analytics_dev.stg_ad_spend directly. What is the best change before promoting to production?
This Question is Locked
You're viewing 35 of 1200 free questions.
trending_up Certified pros earn 20-30% more
Higher salary: IT certifications add $12,000-$25,000/year on average to your paycheck
Job security: 87% of hiring managers prefer candidates with certifications : you become irreplaceable
More opportunities: Freelance gigs, remote roles, and promotions open up instantly
Practice all questions: Comprehensive practice is the #1 predictor of passing
Real Exam : Upgrade to Unlock
Available in Q&A + Course + Mock Exam package
You've already started : one exam away from a career upgrade.