Cert-Pass
Log in Sign up
arrow_back Cert

dbt dbt Analytics Engineering

๐Ÿ”ฅ 0 streak
0%
timer Real Exam lock Pro menu_book Course download Free
menu_book

dbt Analytics Engineering

Compressed Course

dbt Analytics Engineering Certification โ€” Compressed Exam Course

Exam target: dbt Analytics Engineering Certification Exam
Current alignment used in this guide: dbt Core / dbt platform concepts around dbt 1.11, with legacy v1.7/v1.9 themes retained where still exam-relevant.
Source pattern summary: 1,100 scenario-style practice rows, strongest concentration in developing/optimizing models, testing, debugging, pipeline troubleshooting, governance, state, and external dependencies.


1. Exam Overview

What the exam is testing

The dbt Analytics Engineering Certification validates whether you can use dbt in a production analytics workflow, not just whether you can remember commands. The exam expects you to reason through realistic project situations such as:

  • building a clean model DAG from raw sources to marts;
  • choosing the right materialization for performance and maintainability;
  • using ref() and source() correctly instead of hard-coded object names;
  • testing assumptions about source data and transformed models;
  • debugging model, YAML, SQL, dependency, and pipeline failures;
  • applying model governance features such as access, groups, versions, contracts, and grants;
  • using state-aware workflows for CI, slim CI, retries, and production-safe deployments;
  • managing packages, source freshness, exposures, and documentation so downstream users trust the data.

The real exam is scenario-heavy. A typical question gives you a project problem, then asks for the dbt-native fix. The best answer is usually the option that improves lineage, repeatability, maintainability, and production safety without over-engineering.

Current official exam structure to know

The official dbt Analytics Engineering Certification page currently lists these logistics and domains:

Item Current detail
Duration 2 hours
Questions 65
Passing score 65%
Supported version dbt 1.11
Expected background SQL proficiency and practical dbt experience
Question style Practical and scenario-based; expect multiple-choice and interactive-style reasoning

Older official guides and older question banks often mention dbt Core 1.7 and include a separate documentation domain. The current public exam page lists 7 domains and no separate documentation domain. Documentation still matters because it appears across sources, model properties, lineage, exposures, and governance scenarios.

How to think like the exam

Think like a production analytics engineer:

  1. Use dbt abstractions before warehouse-specific shortcuts. Prefer ref(), source(), model configs, selectors, tests, contracts, and exposures over manual warehouse object names.
  2. Preserve the DAG. Anything that hides lineage, bypasses dependencies, or relies on manually ordered SQL is suspicious.
  3. Choose the simplest materialization that satisfies usage. Do not make everything incremental or table. Do not make a dashboard-facing mart ephemeral.
  4. Test assumptions, not implementation trivia. Use tests where they express business rules or data quality guarantees.
  5. Debug from dbt outward. Check logs, compiled SQL, YAML validity, dependencies, target/profile config, and then warehouse-specific SQL/errors.
  6. Separate development from production. Use state, defer, clone, CI jobs, and PR review to avoid rebuilding or querying expensive production tables unnecessarily.
  7. Govern public interfaces. If other teams depend on a model, use model access, versioning, contracts, docs, exposures, and deprecation instead of silently changing schemas.

How to use this course

Read Sections 1โ€“3 once to orient yourself. Then study each domain in Section 4 and use Sections 5โ€“10 as quick revision material. The course intentionally merges repeated CSV themes into decision frameworks so you can answer new scenarios rather than memorize answers.


2. Exam Domains

Current domain list

Priority from source bank Official/current-style domain Approx. share in analyzed source What to master
1 Developing and optimizing dbt models ~26% materializations, ref, source, sources, modular SQL, Jinja/macros, seeds, snapshots, configs, DAG, git workflow
2 Implementing dbt tests ~17% generic/singular/custom tests, source tests, test config, severity, filtering, assumptions, CI testing
3 Debugging data modeling errors ~14% compiled SQL, logs, YAML errors, SQL vs dbt issues, profiles, dependencies, model fixes
4 Troubleshooting and optimizing dbt pipelines ~13% DAG failures, selectors, retries, clone, scheduling, CI, orchestration boundaries, production failure handling
5 Managing dbt models governance ~12% access, groups, contracts, versions, deprecation, grants, stable public interfaces
6 Leveraging the dbt state ~10% state selectors, result selectors, defer, slim CI, manifest/run_results, modified nodes, retry
7 Implementing and Maintaining External Dependencies ~9% packages, dbt deps, package compatibility, exposures, source freshness, downstream dependency awareness

Priority notes

The largest share of the analyzed bank is model development and optimization. This makes sense because almost every other topic depends on a correct mental model of dbt resources, DAG lineage, materializations, and configuration precedence.

High-yield cross-domain concepts:

  • ref() vs source();
  • model materializations: view, table, incremental, ephemeral;
  • tests: generic vs singular vs custom generic;
  • source freshness and source testing;
  • compiled SQL for debugging;
  • YAML indentation and resource properties;
  • dbt build vs dbt run + dbt test;
  • model contracts, versions, access, and groups;
  • state selection, defer, and CI efficiency;
  • packages and macro compatibility.

What matters most

If the question says... The exam usually wants you to think about...
Hard-coded schema/table names inside models Replace with ref() for models or source() for raw tables
Model builds in wrong order Missing ref() dependencies
Raw table dependency is not documented/tested Define a source in YAML and use source()
Large append-only table Incremental model with correct is_incremental() logic
Small stable mapping file Seed
Point-in-time history of slowly changing source data Snapshot
Business users query it often Table or incremental, not ephemeral
Reusable CTE not directly queried Ephemeral or macro depending on purpose
Downstream teams depend on the model Public/protected access, versions, contracts, docs
Only changed models should run in CI State selectors and defer
Failed job should resume safely dbt retry / result selectors, not manual partial guessing
Source is late or stale Source freshness, not a generic model test
BI dashboard depends on model Exposure

3. Start-to-Finish Study Path

Foundation phase: build the dbt mental model

Learn first:

  • What a dbt project is: dbt_project.yml, models, macros, seeds, snapshots, tests, sources, packages.
  • How dbt builds a DAG from ref() and source().
  • Difference between development, CI, staging, and production environments.
  • Basic commands: dbt debug, dbt compile, dbt run, dbt test, dbt build, dbt docs generate, dbt deps, dbt seed, dbt snapshot, dbt source freshness.

Hands-on checklist:

  • Create a source YAML file with at least one source and table.
  • Create staging models using source().
  • Create intermediate/mart models using ref().
  • Run dbt compile and inspect target/compiled.
  • Generate docs and inspect lineage.

Intermediate phase: learn production patterns

Focus on:

  • materializations and their tradeoffs;
  • incremental models and is_incremental();
  • test types and test configuration;
  • model property YAML;
  • packages and macros;
  • git workflow and PR review;
  • docs and exposures.

Hands-on checklist:

  • Build one view, one table, one incremental model, one ephemeral model.
  • Add generic tests: not_null, unique, relationships, accepted_values.
  • Add a singular test for a business rule.
  • Add source freshness to a source.
  • Add an exposure for a dashboard.
  • Install a package and run dbt deps.

Advanced phase: governance, state, and debugging

Focus on:

  • model contracts and column-level constraints;
  • model versions and deprecation;
  • access levels and groups;
  • grants for warehouse permissions;
  • slim CI using state:modified+ and --defer;
  • result selectors and retries;
  • debugging YAML, SQL, package, and pipeline failures.

Hands-on checklist:

  • Add a contract to a model and intentionally break it.
  • Add a v2 model and deprecate v1.
  • Make a protected/public model and test dependency behavior.
  • Run state selection against a previous manifest.
  • Debug a failing test from the failure output and compiled SQL.

Final review phase

In the final review, do not reread everything equally. Focus on scenario triggers:

  • If the model is downstream-facing, think governance.
  • If the model is expensive and append-only, think incremental.
  • If dependencies are invisible, think ref()/source().
  • If only changed work should run, think state/defer.
  • If source data timeliness is the issue, think freshness.
  • If a dashboard breaks, think exposure, model contract/versioning, docs, and lineage.
  • If the error is unclear, inspect logs and compiled SQL before changing dbt configs.

4. Core Concepts by Domain

Domain 1 โ€” Developing and Optimizing dbt Models

Concepts

This is the highest-yield domain. You must know how dbt turns modular SQL files into a dependency graph and production data objects.

Key resource types:

Resource What it represents Common exam signal
Model A SQL or Python transformation managed by dbt Build clean staging/intermediate/mart layers
Source Raw data object loaded outside dbt Use when referring to raw tables
Seed Static CSV version-controlled in the project Small lookup/mapping/reference data
Snapshot Point-in-time history of mutable source records SCD-style history where source overwrites changes
Macro Reusable Jinja logic Repeated SQL pattern or generated logic
Test Data assertion Validate uniqueness, not null, relationships, accepted values, business rules
Exposure Downstream asset such as BI dashboard, notebook, ML job Show downstream dependency and ownership
Package External reusable dbt project Shared macros/models/tests from dbt Hub or Git

ref() and source()

ref() is for dbt models. source() is for raw objects loaded outside dbt.

Need Use Why
A mart depends on stg_orders {{ ref('stg_orders') }} Creates DAG dependency and environment-aware relation name
A staging model reads raw stripe.payments {{ source('stripe', 'payments') }} Documents raw dependency and supports source tests/freshness
A model directly queries analytics_prod.stg_orders Replace with ref() Hard-coding breaks lineage and environment portability
A model directly queries raw.shopify.orders Replace with source() Raw dependencies belong in source YAML

Exam trap: If the question says models build in the wrong order, do not choose a scheduler workaround. dbt order comes from ref() dependencies.

Materializations

Materialization Use when Avoid when Exam trap
View Logic should stay lightweight and always query fresh upstream data Heavy repeated dashboard queries need fast performance View does not store results; it can push cost to query time
Table Model is expensive to compute and queried often Data changes frequently and full rebuild is too expensive Table rebuilds entire relation each run
Incremental Large table, small new/changed subset per run Large percentage updates each run or logic cannot isolate changes Requires correct filter and unique key/strategy where needed
Ephemeral Reusable intermediate logic not queried directly Many downstream refs create repeated SQL; business users need to query it Ephemeral is inlined as CTE, not created as a database object
Seed Small static CSV controlled in git Large dynamic data or frequently updated operational data Seeds are not an ingestion system
Snapshot Track historical changes in mutable source records You only need latest state or immutable event data Snapshot is not a materialization for performance

Incremental models

Use incremental when a model has many rows and only a small subset is added or changed each run.

Core reasoning:

  • is_incremental() gates logic that should only run on incremental runs.
  • The SQL must be valid for both full-refresh and incremental runs.
  • Use a reliable event/update timestamp or high-water mark.
  • Use unique_key and an incremental strategy when records can update.
  • Use --full-refresh when logic changes require rebuilding historical rows.
  • Schema changes may require on_schema_change handling or full refresh depending on the warehouse and change type.

Common bad answers:

  • โ€œIncremental models are always rebuilt.โ€ False.
  • โ€œIncremental models are always best for large tables.โ€ Not if most rows change each run.
  • โ€œYou never need is_incremental().โ€ Usually false for selective processing.
  • โ€œUse ephemeral to make a large dashboard model faster.โ€ Usually wrong; ephemeral can duplicate heavy SQL.

Sources

A source maps to a raw data location, commonly database + schema, with tables underneath. Use sources to centralize raw object naming and document external dependencies.

Good source YAML includes:

  • source name;
  • database/schema where needed;
  • tables;
  • source and column descriptions;
  • tests on raw data assumptions;
  • freshness where timeliness matters;
  • loaded timestamp field for freshness checks.

Exam trap: If multiple raw tables are in the same database/schema, they are usually one source with multiple tables, not multiple sources.

Modularity and DRY SQL

Good dbt modeling decomposes SQL into layers:

Layer Typical purpose Typical materialization
Staging Clean, rename, cast, standardize one source View, sometimes ephemeral/table
Intermediate Reusable transformations and joins Ephemeral, view, table depending on cost
Mart Business-facing facts/dimensions Table/incremental for performance

Use macros for reusable logic patterns, not for hiding business-critical model lineage. Use models when you need DAG visibility and testable transformation steps.

Jinja, variables, and environment config

Feature Use case Trap
var() Project variables provided in dbt_project.yml or CLI Do not store secrets in vars
env_var() Environment-specific values and secrets Must be available in runtime environment
target Branch logic by target/profile Avoid excessive target-specific model logic that makes behavior hard to test
config() Model-level configs in SQL Remember config precedence
dbt_project.yml Default project/folder configs Bad indentation or wrong resource path breaks expectations

Python models

Python models are used for transformations easier in Python than SQL, such as advanced data science-style transformations. Exam reasoning remains dbt-native:

  • They are still models in the DAG.
  • They can use dbt.ref() and dbt.source().
  • They are not a replacement for simple SQL transformations.
  • Support depends on the adapter/platform.

Git workflow

Expected git skills:

  • create feature branches;
  • commit changes;
  • pull from main/head branch to stay updated;
  • resolve conflicts;
  • open pull requests;
  • use CI before merging.

Exam trap: If the question says your branch is behind main, the correct general action is to pull/reconcile with the head branch, not manually copy files or merge straight into production.

Patterns

  • Raw table reference โ†’ define source + use source().
  • Model-to-model dependency โ†’ use ref().
  • Repeated business logic โ†’ refactor into staging/intermediate models or macros.
  • Expensive dashboard model โ†’ table or incremental.
  • Append-only high-volume data โ†’ incremental.
  • Static mapping table โ†’ seed.
  • Source overwrites values but history needed โ†’ snapshot.
  • Direct warehouse object permissions required โ†’ grants.

Traps

  • Choosing a materialization only because it is โ€œfasterโ€ without considering freshness, cost, and query pattern.
  • Using hard-coded schema names in dbt models.
  • Making every model a table, which increases rebuild time/storage.
  • Making every staging model ephemeral, which can duplicate heavy SQL downstream.
  • Treating seeds as ingestion for operational data.
  • Using macros where a model would provide better lineage and testing.
  • Forgetting that a table model fully rebuilds by default.

Domain 2 โ€” Managing dbt Models Governance

Concepts

Governance is about protecting model consumers. The exam often frames this as: โ€œA downstream team relies on a model and the analytics team needs to change it safely.โ€

Key features:

Feature Purpose Typical scenario
Model access Controls whether models can be referenced across groups/projects Keep internal models private while exposing stable interfaces
Groups Assign ownership to resources Domain/team ownership and access control
Model versions Evolve a public model without breaking consumers Add/change columns while keeping old version available
Deprecation Signal old version should be migrated away from v1 still works but users should move to v2
Contracts Enforce model shape and column data types Prevent breaking schema changes
Grants Apply warehouse permissions Ensure analysts or BI tools can query produced relations
Documentation Explain ownership, meaning, usage, and downstream impact Make models discoverable and trustworthy

Model access

Typical access levels:

Access Use when Exam implication
Private Internal implementation detail Should not be depended on by other groups
Protected Reusable within defined boundary/project/group pattern Safer than public for internal shared layers
Public Stable interface for broad consumption Requires stronger docs, contracts, versioning discipline

If a model is used by many teams or BI assets, changing it silently is wrong. Use governance features.

Contracts

A model contract defines/enforces expected columns and types. It helps prevent a model from changing shape unexpectedly.

Use contracts when:

  • downstream consumers rely on a stable schema;
  • column presence/type matters;
  • breaking schema changes must fail early;
  • incremental loads should reject incompatible data before bad data lands.

Contracts are not a substitute for all data tests. They validate shape/type constraints, not every business rule.

Versions and deprecation

Use model versions when you need to introduce a breaking change while keeping existing consumers stable.

Pattern:

  1. Keep v1 available.
  2. Create v2 with changed interface.
  3. Mark v1 deprecated when ready.
  4. Update docs and communicate migration.
  5. Remove old version only after consumers migrate.

Exam trap: Do not โ€œjust rename the modelโ€ or โ€œchange the column in placeโ€ when external consumers depend on it.

Grants

Use grants to configure warehouse-level access to dbt-created relations. Grants answer the question: who can query this object after dbt builds it?

Grants are not the same as model access. Model access controls dbt dependency/interface behavior. Grants control database permissions.

Patterns

  • Downstream team broke after column rename โ†’ use model versioning and contracts.
  • Internal staging model should not be referenced by other teams โ†’ mark private/protected and expose a public mart.
  • Dashboard user cannot query model after deploy โ†’ check grants/warehouse permissions.
  • Team needs ownership boundaries โ†’ use groups and model access.
  • Need to enforce columns and data types โ†’ use contract.

Traps

  • Confusing access with grants.
  • Assuming contracts replace not_null, unique, or relationship tests.
  • Deprecating/removing a model before documenting migration.
  • Using public access for every model.
  • Versioning non-breaking changes unnecessarily.

Domain 3 โ€” Debugging Data Modeling Errors

Concepts

Debugging questions usually ask where to look first or what category of error it is.

Debugging order:

  1. Read the error message and identify phase: parsing, compilation, execution, test, dependency, connection.
  2. Run dbt compile if SQL/Jinja generation is suspicious.
  3. Inspect compiled SQL in target/compiled or target/run.
  4. Determine whether the failure is pure SQL/warehouse syntax or dbt/Jinja/YAML config.
  5. Fix locally, run targeted commands, test before merging.

Error categories

Symptom Likely area Best next action
YAML parsing error .yml indentation, invalid keys, wrong nesting Validate YAML and resource property structure
Compilation error Jinja, macro, ref, source, config Run dbt compile, inspect compiled SQL/error line
Database syntax error SQL generated by model Inspect compiled SQL and run/debug in warehouse if needed
Relation not found Missing ref/source, build order, environment, schema, permissions Check DAG, source config, target, grants
Test failure Data violates assertion Inspect failing rows, severity, test condition
Package macro missing Dependencies not installed or version mismatch Run/check dbt deps, package version, namespace
Connection/profile error profiles.yml, environment variables, credentials Run dbt debug

Compiled SQL

Compiled SQL is critical because dbt model SQL contains Jinja, macros, and adapter-specific rendering. If a warehouse error references SQL that does not look like your model file, inspect compiled SQL.

Use compiled SQL to:

  • see expanded ref() and source() relation names;
  • see macro output;
  • verify is_incremental() branches;
  • diagnose SQL syntax errors after Jinja rendering;
  • distinguish dbt compilation problems from warehouse execution problems.

YAML debugging

YAML traps:

  • wrong indentation;
  • putting tests under the wrong level;
  • misspelling columns, description, data_tests/tests depending on version style;
  • placing model properties under the wrong model name;
  • using tabs or invalid quoting;
  • confusing source table properties with model properties.

SQL vs dbt issue

Question clue More likely
Error mentions YAML parser, undefined macro, ref not found dbt project/config issue
Error mentions warehouse SQL syntax, invalid identifier, function not found SQL/warehouse issue after compilation
Error only appears in production target environment/target/schema/permissions issue
Error appears after package upgrade package compatibility or macro behavior
Error appears only on incremental runs incremental branch, unique key, schema change, or high-water logic

Fix and test before merge

Good workflow:

  1. Reproduce locally or in dev environment.
  2. Inspect logs/compiled SQL/failing rows.
  3. Apply minimal fix.
  4. Run targeted model and downstream tests.
  5. Open PR and let CI validate before merging.

Patterns

  • Failing model due to generated SQL โ†’ inspect compiled SQL.
  • source() relation not found โ†’ verify source YAML, database/schema/table, target config, and permissions.
  • YAML error after adding tests โ†’ fix indentation and property nesting.
  • Test failure after new source load โ†’ inspect failing rows; maybe source issue, not dbt code.
  • Error only in CI โ†’ compare CI target/profile/env vars/packages to local.

Traps

  • Editing compiled SQL instead of model SQL.
  • Assuming all SQL-looking errors are pure SQL; Jinja may have generated bad SQL.
  • Running all models when a targeted selector would prove the fix faster.
  • Ignoring package or adapter version compatibility.
  • Treating failing tests as โ€œdbt brokenโ€ instead of data quality signal.

Domain 4 โ€” Troubleshooting and Optimizing dbt Pipelines

Concepts

This domain tests production operation: jobs, DAG failures, retries, CI, clone, and orchestration boundaries.

Key idea: dbt transforms data inside the warehouse. External orchestrators schedule and coordinate broader workflows, but dbt should still own model dependencies and transformation correctness.

dbt build vs dbt run / dbt test

Command What it does Use when
dbt run Builds models You only want transformations
dbt test Runs tests You want to validate already-built resources
dbt build Runs seeds, models, snapshots, tests in DAG-aware order Production/CI workflow requiring build + validation
dbt compile Compiles project without executing transformations Debug generated SQL/Jinja
dbt debug Checks connection/profile/project validity Credential/profile/setup issues
dbt retry Retries failed nodes from previous invocation when supported Resume after transient/partial failure

DAG failure management

If an upstream model fails, downstream dependent models should not blindly run. The exam wants you to respect the DAG rather than bypass it.

Good actions:

  • fix the upstream failure first;
  • use selectors to rerun affected nodes;
  • use + operators to include parents/children as needed;
  • use dbt build for DAG-aware build/test ordering;
  • use retries or result selectors to avoid unnecessary rebuilds after transient failures.

Bad actions:

  • manually build downstream tables out of order;
  • remove tests to make the job pass;
  • hard-code production relations into dev models;
  • ignore failing upstream source freshness.

Clone

dbt clone can create environment copies by cloning relations where supported by the data platform. It is useful for efficient environment setup, especially when you do not want to rebuild all expensive models.

Use clone when:

  • creating a dev/CI environment from existing production objects;
  • supported by the adapter/data platform;
  • you need fast setup without full transformation cost.

Do not use clone as a substitute for fixing model logic or tests.

CI and deployment

Strong CI pattern:

  • feature branch opens PR;
  • CI runs only changed models and relevant downstream dependencies;
  • CI defers unchanged upstream references to production artifacts;
  • tests validate the changed surface area;
  • merge only after passing checks.

This reduces cost and catches issues without rebuilding the entire warehouse.

Orchestration boundaries

Tool responsibility Examples
dbt transform data, build DAG, test, document, select nodes, manage project dependencies
External orchestrator schedule workflows, run upstream ingestion, coordinate cross-system jobs, alerting/escalation
Warehouse execute SQL/Python, enforce permissions, store results, optimize physical execution
BI tool consume curated models, expose dashboards, not transform core warehouse logic

Patterns

  • Production job failed halfway โ†’ use dbt retry or result selectors after fixing transient issue.
  • CI too expensive โ†’ use state selection and defer.
  • Need dev schema quickly from prod โ†’ dbt clone if adapter supports it.
  • Downstream jobs fail because upstream source is stale โ†’ add or enforce source freshness before downstream build.
  • Pipeline has wrong order โ†’ fix DAG dependencies, not scheduler order.

Traps

  • Treating the orchestrator as the dependency graph instead of dbt DAG.
  • Rebuilding everything in CI when slim CI solves the scenario.
  • Using clone on unsupported platforms or expecting clone to transform data.
  • Retrying blindly without understanding failed node category.
  • Skipping tests in production because they slow the job.

Domain 5 โ€” Implementing dbt Tests

Concepts

Tests validate assumptions. The exam cares about selecting the right type of test and configuring it appropriately.

Test categories:

Test type What it is Use when
Generic test Reusable parameterized assertion in YAML not_null, unique, relationships, accepted_values, custom generic tests
Singular test SQL file that returns failing rows Complex business rule not easily expressed as generic test
Custom generic test Reusable test macro Repeated organization-specific assertion
Source test Test applied to raw source/table/column Validate assumptions at the ingestion boundary

Built-in generic tests

Test Validates Example scenario
not_null Column has no nulls Primary key cannot be null
unique Values are unique Order ID should appear once
relationships Foreign key values exist in parent model orders.customer_id exists in customers.customer_id
accepted_values Column values are in allowed set status must be placed, shipped, returned

Test configuration

Important configs:

Config Why it matters
severity Warn vs error behavior
where Restrict tested rows, often for recent/incremental data
limit Limit returned failures, not necessarily test scope semantics
store_failures Persist failing rows for inspection
error_if / warn_if Threshold-based pass/warn/error
tags Select tests for workflows

Exam trap: store_failures helps inspect failures; it does not prevent scanning the table or reuse old test results as a pass.

Testing incremental models

For large incremental models, do not always test all historical rows if the scenario requires efficiency. Options include:

  • use a where condition to test recent partitions/new rows;
  • use contracts/warehouse constraints where supported for column shape/type/not-null-like enforcement;
  • design incremental logic so new rows are validated during load;
  • run full validation periodically if required.

Testing sources

Use source tests for assumptions about raw data:

  • IDs not null;
  • source natural key uniqueness where expected;
  • accepted values for raw status codes;
  • relationships between raw source tables where useful;
  • freshness checks for timeliness.

Freshness is not the same as a generic test. Freshness measures whether data has arrived recently based on a timestamp.

Singular vs generic tests

Situation Better choice
Column should never be null Generic not_null
Column should be unique Generic unique
Status must be one of a list Generic accepted_values
Foreign key must exist in parent Generic relationships
Revenue should not be negative after refunds logic Singular or custom generic depending reuse
Same business rule applies to many models Custom generic test

Patterns

  • Need failing rows saved for debugging โ†’ store_failures.
  • Need test only recent incremental data โ†’ where config.
  • Need failure threshold โ†’ error_if / warn_if.
  • Need reusable organization rule โ†’ custom generic test.
  • Need one-off complex SQL assertion โ†’ singular test.
  • Need raw table timeliness โ†’ source freshness.

Traps

  • Over-testing every column without business value.
  • Confusing relationships with uniqueness.
  • Applying unique to a column that is only unique in combination with another column.
  • Using accepted_values when values change often and should come from a reference model.
  • Treating warnings as passing quality gates in production when the business requires failure.
  • Not testing sources before trusting staging models.

Domain 6 โ€” Leveraging the dbt State

Concepts

State lets dbt compare the current project to artifacts from a previous run, usually production. This powers slim CI and targeted rebuilds.

Key artifacts:

Artifact Use
manifest.json Project graph, resources, configs, dependencies; used for state comparison
run_results.json Results of prior invocations; used for result selectors/retry-like workflows
compiled/run artifacts Debugging and inspection

State selectors

Common patterns:

Selector Meaning
state:modified Resources changed compared with prior state
state:new Resources new compared with prior state
state:old Resources removed/changed relative to old state context
state:modified+ Modified resources and downstream children
+state:modified Parents of modified resources and modified resources
state:modified+ --defer Common slim CI pattern: build changed nodes/downstream while deferring unchanged refs to prod

Exact selector behavior can vary by context and dbt version, but the reasoning is stable: use state to avoid unnecessary full rebuilds while still validating impacted areas.

Defer

--defer tells dbt to resolve references to unselected nodes using a previous state/production environment when possible. It is useful in CI because changed models can reference stable production versions of unchanged upstream models.

Use defer when:

  • CI should not rebuild the entire upstream graph;
  • production artifacts are available;
  • unchanged parent models are trusted;
  • you still want correct references and realistic dependencies.

Avoid defer when:

  • you need to validate changes to upstream parents too;
  • production state artifacts are stale or missing;
  • environment parity is broken.

Result selectors and retry

Result selectors use prior run results to select nodes by outcome, such as failed or errored nodes. dbt retry helps rerun failed work from the previous invocation in supported contexts.

Use result-aware workflows when:

  • a production job failed on a subset of nodes;
  • transient warehouse issue was fixed;
  • you do not want to rerun everything.

Patterns

  • PR changed one staging model โ†’ run state:modified+ to include impacted downstream models.
  • CI should not rebuild unchanged parents โ†’ use --defer to production state.
  • Previous production run had failed nodes โ†’ retry failed subset after fixing transient issue.
  • Need compare current branch with production โ†’ use prior manifest as state.

Traps

  • Using state selectors without providing a valid state artifact.
  • Assuming state:modified automatically includes downstream models; use + when needed.
  • Deferring to production when the changed model depends on changed upstream logic that is not selected.
  • Treating state as data freshness; state compares project artifacts, not source arrival times.
  • Confusing source freshness with state freshness.

Domain 7 โ€” Implementing and Maintaining External Dependencies

Concepts

External dependencies include dbt packages, source freshness, and exposures. These features make a project more maintainable and connected to the broader data ecosystem.

Packages

Packages provide reusable macros, models, and tests. They are declared in package configuration and installed with dbt deps.

Use packages when:

  • a well-maintained package solves a common need;
  • your team wants standardized macros/tests;
  • compatibility with your adapter and dbt version is confirmed.

Package risks:

  • version incompatibility after dbt upgrade;
  • adapter-specific macro behavior;
  • hidden SQL complexity;
  • unmaintained dependencies;
  • namespace conflicts;
  • changing package behavior after upgrade.

Best practice:

  • pin versions intentionally;
  • review package changelogs before upgrades;
  • test package macros in CI;
  • do not blindly trust package output for critical logic;
  • run dbt deps when dependencies change or are missing.

Source freshness

Source freshness checks whether raw source data is recent enough. It is based on a timestamp column and configured thresholds.

Use source freshness when:

  • ingestion can be late;
  • downstream dashboards depend on timely data;
  • you need alerting/gating before transformations;
  • SLAs exist for source updates.

Freshness usually includes:

  • loaded_at_field;
  • warning threshold;
  • error threshold.

Exam trap: Freshness is about recency of loaded source data, not whether transformed model columns are unique/not null.

Exposures

Exposures document downstream dependencies such as dashboards, notebooks, ML jobs, or applications. They connect produced data models to business assets.

Use exposures when:

  • a BI dashboard depends on dbt models;
  • owners need to know impact before changing a model;
  • lineage should show downstream consumers;
  • failures should be assessed by business impact.

Exposures are documentation/lineage resources; they do not grant permissions or execute dashboards.

Patterns

  • Missing macro from package โ†’ run/check dbt deps, package name/version/namespace.
  • Package works locally but fails in CI โ†’ check dependency installation and version parity.
  • Package macro works on Snowflake but not BigQuery โ†’ check adapter compatibility/dispatch.
  • Raw data arrives late โ†’ configure source freshness.
  • Dashboard owner needs impact awareness โ†’ add exposure with owner and dependency refs.

Traps

  • Treating dbt deps as a model build command.
  • Upgrading packages without checking breaking changes.
  • Using source freshness to test transformed model logic.
  • Assuming exposures enforce access control.
  • Not documenting package-generated models/macros because they are โ€œexternal.โ€

Documentation Skills Across Domains

Although the current public exam page no longer lists documentation as its own domain, documentation remains a repeated source-bank theme and a practical exam skill.

Know how to document:

  • model descriptions;
  • column descriptions;
  • source descriptions;
  • source table/column metadata;
  • exposures;
  • owner information;
  • docs blocks where reusable long-form docs help;
  • lineage through ref(), source(), and exposures.

Commands:

  • dbt docs generate builds documentation artifacts;
  • dbt docs serve is commonly used locally to view docs;
  • dbt Cloud has its own docs hosting/viewing workflows.

Exam trap: Documentation does not create dependencies by itself. ref() and source() create lineage edges; descriptions explain them.


5. Service Selection Guide

In dbt, โ€œservice selectionโ€ mostly means choosing the right dbt feature/resource/command for the scenario.

Resource selection

Scenario Choose Not this Why
Raw table loaded by Fivetran/Airbyte/custom ingestion Source Model ref Raw objects should be declared as sources
Transform source into clean business table Model Seed Models are transformations
Small manually maintained mapping CSV Seed Source Seed is version-controlled static data
Track historical changes in mutable source Snapshot Incremental model only Snapshot records change history
Business dashboard consumes dataset Mart model + exposure Ephemeral model Needs stable queryable relation and downstream lineage
Reused SQL snippet across many models Macro Copy-paste SQL Macro supports DRY logic
Reusable intermediate transformation with lineage/tests Intermediate model Macro only Model is visible in DAG and testable
External package macro needed Package + dbt deps Manual copy Dependency management and versioning

Materialization selection

Question signal Best answer
โ€œAlways fresh, lightweight, not heavily queriedโ€ View
โ€œExpensive transformation, queried often, acceptable rebuildโ€ Table
โ€œHuge append/update dataset, only small changes per runโ€ Incremental
โ€œReusable CTE, not queried directly, lightweightโ€ Ephemeral
โ€œSmall static reference data in repoโ€ Seed
โ€œNeed SCD-like change history from mutable sourceโ€ Snapshot

Testing selection

Need Feature
Column must not be null not_null generic test
Column must be unique unique generic test
Child key must exist in parent model relationships generic test
Status values limited to list accepted_values generic test
Complex one-off assertion Singular test
Same custom assertion reused Custom generic test
Test recent partition only Test where config
Store failing records store_failures
Warn before failing severity, warn_if, error_if
Raw source arrival SLA Source freshness

Governance selection

Need Feature
Prevent breaking schema shape Contract
Release breaking change safely Model versioning
Mark old model version as not preferred Deprecation
Control which models can be referenced Access
Define team/domain ownership Groups
Apply database permissions Grants
Show dashboard dependency Exposure

Pipeline command selection

Need Command / approach
Check connection/project setup dbt debug
Render SQL/Jinja without execution dbt compile
Build models only dbt run
Run tests only dbt test
Build resources and run tests in DAG order dbt build
Load seeds dbt seed
Run snapshots dbt snapshot
Install packages dbt deps
Check source timeliness dbt source freshness
Generate docs dbt docs generate
Resume failed previous invocation dbt retry / result selectors
Run changed models in CI state selectors + --defer

6. Architecture Patterns

Pattern 1 โ€” Clean source-to-mart DAG

Recommended flow:

  1. Define raw systems as sources.
  2. Build one staging model per source table for cleaning/casting/renaming.
  3. Build intermediate models for reusable joins and business logic.
  4. Build marts as facts/dimensions or reporting-ready tables.
  5. Add tests at source, staging, and mart layers.
  6. Add docs and exposures for downstream assets.

Why alternatives are wrong:

  • Directly querying raw tables in marts hides dependencies and duplicates cleaning logic.
  • Putting all business logic in one giant model is hard to test and debug.
  • Hard-coded production schemas break dev/CI portability.

Pattern 2 โ€” High-volume event fact table

Scenario: events has billions of rows and only new events arrive daily.

Recommended solution:

  • Use an incremental model.
  • Filter new records using event timestamp or ingestion timestamp.
  • Use unique_key if updates/deduplication are required.
  • Add tests on recent data and critical keys.
  • Schedule periodic full validation or full refresh only when needed.

Wrong alternatives:

  • Table materialization: full rebuild is expensive.
  • View: dashboard users pay cost every query.
  • Ephemeral: duplicates huge SQL downstream.

Pattern 3 โ€” Slowly changing customer attributes

Scenario: source customer table overwrites customer status and region, but analytics needs history.

Recommended solution:

  • Use a snapshot with a reliable unique key and change detection strategy.
  • Build current-state and historical models from the snapshot as needed.

Wrong alternatives:

  • Seed: data is not static.
  • View: shows only current state.
  • Incremental alone: can append/update transformed rows but does not automatically preserve historical state unless designed for it.

Pattern 4 โ€” Shared business metric logic

Scenario: revenue calculation appears in many models.

Recommended solution:

  • If it is a repeated expression, use a macro.
  • If it is a reusable dataset, create an intermediate model.
  • Test the resulting model/business rule.

Wrong alternatives:

  • Copy-paste logic everywhere; leads to inconsistent metrics.
  • Put all logic in dashboard SQL; bypasses dbt governance and testing.

Pattern 5 โ€” Safe breaking change to a public model

Scenario: downstream teams depend on dim_customer, but you need to rename/remove columns.

Recommended solution:

  • Create a new model version.
  • Keep old version available temporarily.
  • Add/deploy contract where appropriate.
  • Deprecate old version with docs and migration guidance.
  • Update exposures and notify owners.

Wrong alternatives:

  • Rename columns in place.
  • Remove the old model immediately.
  • Rely only on a Slack message.

Pattern 6 โ€” Pull request CI for changed models

Scenario: CI is too expensive because it rebuilds everything.

Recommended solution:

  • Compare branch against production manifest.
  • Select state:modified+ or equivalent impacted graph.
  • Use --defer so unchanged parents resolve to production.
  • Run tests for selected models.

Wrong alternatives:

  • Skip CI tests.
  • Rebuild every model every time.
  • Hard-code production relations in development models.

Pattern 7 โ€” Source data freshness SLA

Scenario: dashboard must fail/alert when upstream orders data is older than expected.

Recommended solution:

  • Configure source freshness on the source table with warning/error thresholds.
  • Run dbt source freshness before downstream build or as a separate job.
  • Add exposure so dashboard impact is visible.

Wrong alternatives:

  • Use not_null on loaded_at; that checks nulls, not recency.
  • Add a dashboard note; it does not enforce the SLA.

Pattern 8 โ€” Debugging a failing model after macro/package change

Recommended solution:

  • Confirm package versions and dbt deps.
  • Run dbt compile.
  • Inspect compiled SQL.
  • Check adapter compatibility.
  • Run targeted build/tests.

Wrong alternatives:

  • Edit compiled SQL.
  • Remove the macro without understanding generated SQL.
  • Assume warehouse error means package is unrelated.

7. Exam Traps

Misleading wording patterns

Wording Trap Better thinking
โ€œThe table exists in the warehouse, so dbt should query it directlyโ€ Hard-coded relation Use source/ref abstraction
โ€œThe scheduler can run models in orderโ€ Scheduler dependency workaround dbt DAG should encode order via refs
โ€œThe dashboard is slow; make upstream models ephemeralโ€ Misuse of ephemeral Ephemeral can duplicate SQL; use table/incremental for query performance
โ€œOnly a few records change, but table is hugeโ€ Full rebuild Use incremental
โ€œNeed history of changed valuesโ€ Incremental confusion Snapshot is usually the signal
โ€œModel contract exists, so no tests are neededโ€ Contract overreach Contracts validate shape, tests validate data assumptions
โ€œSource freshness failed; add not_null testโ€ Wrong quality dimension Freshness is recency, not nullness
โ€œPackage macro is popular, so no need to inspect outputโ€ Blind trust Review compiled SQL and test behavior
โ€œCI is expensive; skip downstream testsโ€ Weak CI Use state selection/defer
โ€œModel is public, so users can query itโ€ Access vs grants confusion Public controls dbt interface; grants control warehouse permissions

Wrong-but-plausible answers

The best distractors usually contain one true idea but apply it to the wrong problem.

Examples:

  • dbt deps is necessary for packages, but it will not fix a bad SQL model.
  • dbt compile helps debug generated SQL, but it does not build data.
  • store_failures helps inspect failing rows, but it does not make the test pass or reduce scan scope by itself.
  • A table materialization improves query speed, but it may be too expensive for huge frequently changing data.
  • A macro reduces repetition, but it can hide lineage if a model is more appropriate.
  • Grants allow querying, but they do not define model governance boundaries.

Elimination strategy

When stuck, eliminate answers that:

  1. hard-code database/schema names where dbt abstraction is expected;
  2. bypass dbt DAG or lineage;
  3. remove tests/docs/governance to โ€œfixโ€ failures;
  4. use a feature for the wrong layer, such as source freshness for model values;
  5. over-engineer a simple problem;
  6. ignore production consumers;
  7. require manual steps that dbt can automate reproducibly.

8. Quick Memory Rules

Rules of thumb

  • Model depends on model โ†’ ref().
  • Model depends on raw table โ†’ source().
  • Need raw data timeliness โ†’ source freshness.
  • Need downstream dashboard lineage โ†’ exposure.
  • Need static lookup CSV โ†’ seed.
  • Need history of mutable source โ†’ snapshot.
  • Need stable public schema โ†’ contract + versioning.
  • Need warehouse permissions โ†’ grants.
  • Need changed-only CI โ†’ state selectors + defer.
  • Need generated SQL debugging โ†’ compile and inspect compiled SQL.
  • Need package installed โ†’ dbt deps.
  • Need build + test in dependency order โ†’ dbt build.
  • Need rerun previous failures โ†’ dbt retry or result selectors.

Fast mapping

If you see... Think...
โ€œbuild order wrongโ€ missing ref()
โ€œraw object dependencyโ€ source YAML + source()
โ€œmany downstream dashboardsโ€ public/protected model, docs, exposures, grants, contracts
โ€œbreaking schema changeโ€ model versions and deprecation
โ€œcolumn removed unexpectedlyโ€ contract
โ€œlarge append-only factโ€ incremental
โ€œlarge model queried oftenโ€ table or incremental
โ€œreusable CTE onlyโ€ ephemeral or macro
โ€œmapping values in CSVโ€ seed
โ€œlate ingestionโ€ source freshness
โ€œfailed compiled SQLโ€ inspect target/compiled
โ€œCI should be cheaperโ€ slim CI with state and defer
โ€œfailed production run subsetโ€ retry/result selectors
โ€œpackage macro not foundโ€ dbt deps, namespace, version

Materialization memory aid

  • View = live logic.
  • Table = stored result.
  • Incremental = stored result plus changed-row strategy.
  • Ephemeral = CTE in someone elseโ€™s query.
  • Seed = tiny static data.
  • Snapshot = history of change.

Test memory aid

  • not_null: required field.
  • unique: one row per key.
  • relationships: child has parent.
  • accepted_values: controlled category list.
  • singular: special SQL assertion.
  • custom generic: reusable special assertion.
  • freshness: source arrived on time.

9. Final Revision Notes

Highest-yield review points

  1. Know the difference between ref() and source() perfectly. This is the root of DAG correctness.
  2. Master materialization tradeoffs. Most scenario questions depend on performance vs freshness vs maintainability.
  3. Understand tests as assumptions. Pick the simplest test that expresses the assumption.
  4. Use compiled SQL for debugging. Never debug complex Jinja only by staring at source SQL.
  5. Govern shared models. Public interfaces need versions, contracts, docs, exposures, and sometimes grants.
  6. State is for comparing project artifacts. Freshness is for data arrival. Do not confuse them.
  7. CI should be selective but safe. Use state/defer rather than skipping validation.
  8. Packages require dependency management. dbt deps, version pinning, adapter compatibility, and tests matter.
  9. dbt build is often the production-friendly answer. It handles resources and tests in DAG-aware order.
  10. Avoid manual warehouse shortcuts. The exam rewards reproducible dbt-native solutions.

Last-day revision list

Review these without going deep:

  • materialization table: view vs table vs incremental vs ephemeral;
  • command table: debug/compile/run/test/build/deps/source freshness/docs/retry;
  • governance table: access vs grants vs contracts vs versions;
  • testing table: generic vs singular vs custom generic;
  • state/defer workflow for slim CI;
  • source freshness and exposures;
  • common YAML indentation/property traps;
  • package dependency and macro debugging;
  • source/staging/intermediate/mart architecture;
  • incremental model failure modes.

Mini practice scenarios

Use these to self-test:

Scenario Best answer
Mart references analytics_dev.stg_orders directly Replace with ref('stg_orders')
Staging reads raw.stripe.payments directly Define source and use source('stripe','payments')
Orders source loaded 5 hours late Source freshness warning/error
Dashboard depends on fct_revenue Add exposure and strong docs/tests
Rename column used by other teams New model version, deprecate old version
Big append-only event table Incremental model
Small country-code CSV Seed
Need customer status history Snapshot
Warehouse syntax error after macro expansion Inspect compiled SQL
CI rebuilds entire project State selection + defer

10. Exam-Day Checklist

Must-know topics

Before exam day, you should be able to explain without notes:

  • What ref() does and why it controls model dependencies.
  • What source() does and how source YAML is structured.
  • When to use view, table, incremental, ephemeral, seed, and snapshot.
  • How is_incremental() changes SQL behavior.
  • Why table models rebuild fully unless incremental behavior is used.
  • How to test primary keys, foreign keys, allowed values, and business rules.
  • How test configs like where, severity, error_if, warn_if, and store_failures change behavior.
  • How source freshness differs from source tests.
  • How to debug YAML, Jinja, SQL, connection, and package failures.
  • Where compiled SQL lives and why it matters.
  • What contracts enforce and what they do not enforce.
  • Difference between model access and grants.
  • When to use model versions and deprecation.
  • How groups map to ownership/governance.
  • What exposures represent.
  • How packages are declared and installed.
  • Why adapter/package compatibility matters.
  • What state artifacts are used for.
  • How state:modified+ and --defer support slim CI.
  • When dbt retry or result selectors help.
  • How dbt build differs from separate run and test commands.

Final confidence checklist

You are ready when you can quickly eliminate wrong answers that:

  • hard-code production schemas inside model SQL;
  • hide DAG dependencies;
  • use seeds for dynamic operational data;
  • use snapshots for simple performance optimization;
  • use ephemeral for dashboard-facing heavy models;
  • treat contracts as full data quality tests;
  • treat source freshness as uniqueness/null validation;
  • skip CI instead of narrowing CI;
  • confuse grants with model access;
  • change public models without versioning or migration;
  • ignore compiled SQL during debugging;
  • blame dbt for a pure warehouse SQL error without checking compiled SQL.

Appendix A โ€” Source Bank Pattern Extraction

The analyzed practice bank emphasized these recurring terms and ideas:

Theme Why it matters
Sources and freshness Frequent scenarios around raw dependencies, source YAML, loaded timestamps, and stale data
CI and state Many questions test selective builds, deferred refs, previous artifacts, and retry behavior
Tests Questions repeatedly distinguish generic, singular, custom, source, and configured tests
ref() and DAG Core dependency reasoning appears throughout development and troubleshooting
Materializations View/table/incremental/ephemeral choices are central to performance scenarios
Seeds and snapshots Common distractors when candidates confuse static data, mutable history, and transformed tables
Packages/macros External dependencies and reusable logic create compatibility and debugging scenarios
Governance Access, contracts, versions, and grants protect downstream consumers
Compiled SQL/YAML Debugging questions often turn on where the error appears and what artifact to inspect

Consolidated wrong-answer patterns from the bank

  • Hard-code relations instead of using dbt abstractions.
  • Use a scheduler to fix a missing DAG dependency.
  • Use a materialization that optimizes the wrong thing.
  • Treat every failure as a warehouse issue.
  • Treat every warning as acceptable in production.
  • Overuse public models and underuse governance.
  • Ignore downstream dashboard impact.
  • Rerun the entire project when state-aware selection is available.
  • Install packages but fail to test package-generated behavior.

Appendix B โ€” Official Reference Notes

Official dbt pages checked while preparing this file:

  • dbt Analytics Engineering Certification Exam page: https://www.getdbt.com/certifications/analytics-engineer-certification-exam
  • dbt Certification page: https://www.getdbt.com/dbt-certification
  • dbt Analytics Engineering Certification Study Guide PDF: https://www.getdbt.com/dbt-assets/certifications/dbt-certificate-study-guide

The official certification page currently lists dbt 1.11 support and 7 exam coverage areas. The older PDF study guide still contains useful v1.7-style topic details and sample question styles; this course prioritizes the current domain structure while preserving still-relevant concepts from the older outline and the analyzed question bank.

lock_open

Unlock the full course

All 20 modules with detailed explanations, code examples, and exam tips.

workspace_premium
You've answered 0 of 35 free questions 1165 questions locked : these will appear on exam day.
0/35
rocket_launch Unlock All
event_available
Day 1 of 17 72 questions/day Finish by Jun 16, 2026
Question 20 of 1200
Topic 1: Developing dbt models ยท 28%

A Databricks project has a staging model stg_inventory and a mart model fct_inventory. The mart currently queries analytics_dev.stg_inventory directly. What is the best change before promoting to production?

0 correct
0 wrong
1200 left
2% done