So you're going for the DP-700: Microsoft Fabric Data Engineer Associate exam. This is Microsoft's data engineering cert built entirely around Fabric, Lakehouses, Warehouses, Eventhouses, and the whole modern analytics stack. Whether you're migrating from Synapse, Databricks, or building fresh in Fabric, this guide covers what actually shows up on the exam. Let's get into it.
DP-700 Exam Quick Facts | Detail | Info | |
|
| | Exam Code | DP-700 | | Certification | Microsoft Fabric Data Engineer Associate | | Questions | ~45-60 | | Time | 100 minutes | | Cost | $165 USD | | Format | Multiple choice, multiple select |
The Fabric Data Map Every DP-700 question comes down to: given a scenario, which Fabric item and pattern is the best fit? | Scenario | Answer | Not This | |
|
|
| | Code-first complex transformations | Notebook (PySpark) | Dataflow Gen2 (low-code) | | Low-code ingestion/transformation | Dataflow Gen2 | Notebook (overkill for simple) | | Relational dimensional modeling | Warehouse + T-SQL | Lakehouse | | Raw file/Delta engineering | Lakehouse | Warehouse | | Real-time event analytics | Eventhouse + KQL | Lakehouse | | Event ingestion/routing | Eventstream | Pipeline | | Orchestrate multi-step workloads | Data Pipeline | Notebook as orchestrator | | Access data without copying | OneLake Shortcut | Copy activity | | Operational DB replication | Mirroring | Manual copy | | Version control + pull requests | Git Integration | Deployment pipeline | | Dev to prod promotion | Deployment Pipeline | Git integration |
Domain 1: Implement and Manage (30-35%) Git vs Deployment Pipelines is tested constantly. Git handles developer collaboration, branches, pull requests, code history. Deployment Pipelines handle environment promotion (dev > test > prod) with approval gates. The exam will mix these up. If the requirement is "promote items between environments with review," it's a Deployment Pipeline. If it's "branching and pull requests," it's Git. Workspace vs Item permissions: Use workspace roles for broad access (Admin, Member, Viewer, Contributor). Use item permissions for specific artifacts. Least privilege always. Don't grant Admin when the user only needs to read one report. Security layers:: SQL security = row-level, column-level, object-level for SQL access patterns: OneLake security = file/folder/table access for OneLake paths: Sensitivity labels = classification metadata, not row-level access control Pipelines as orchestrator: Pipelines call notebooks, Dataflows Gen2, copy activities, stored procedures as steps. They handle scheduling, dependencies, retries, parameters, dynamic expressions. A notebook should not be the orchestrator for scheduled multi-step workloads.
Domain 2: Ingest and Transform (30-35%) Loading patterns: | Pattern | When | How | |
|
|
| | Full load | Small/replaceable data | COPY INTO or Dataflow Gen2 | | Incremental load | Large changing data | Watermark-based, store last load timestamp | | Streaming | Continuous events | Eventstream + Spark structured streaming or KQL | | Mirroring | Operational DB replication | Minimal custom ETL, near real-time | Incremental loads with watermarks are heavily tested. Store the last successful high-water mark. On next load, select only rows newer than the watermark. This is the exam-favorite for "how do I load only new data efficiently?" OneLake Shortcuts vs Copy: Shortcuts provide virtual access to data in another location. No physical copy. Use when data should stay in place. Use copy when you need transformation during landing, isolation from source, or physical control over the data. Lakehouse vs Warehouse: Lakehouse = file/Delta-table oriented, open data layout, good for engineering and semi-structured. Warehouse = relational SQL, good for BI, dimensional models, T-SQL developers. The exam tests which is right for the workload. Eventstream + Eventhouse pattern: Eventstream ingests and routes events. Eventhouse stores and analyzes them with KQL. For real-time telemetry, clickstream, IoT data, this is the pattern. Not Lakehouse (batch-oriented), not warehouse (relational). Late-arriving data in streaming is tested. Use event-time windowing and proper watermarking logic, not processing-time windows alone.
Domain 3: Monitor and Optimize (30-35%) Diagnostic quick-reference: | Problem | Where to Look | Likely Fix | |
|
|
| | Pipeline failure | Run details + activity output | Fix parameter, connection, schema, permission | | Slow notebook | Spark UI, job metrics | Repartition, reduce shuffle, handle skew | | Many small files | Delta optimization tools | Compact/optimize table | | Dataflow Gen2 refresh fails | Refresh history + step errors | Fix transformation step, schema, credentials | | Eventhouse ingestion fails | Ingestion diagnostics | Fix schema mapping, format, permission | | Warehouse query slow | Query plan/performance | Reduce scans, improve joins, update statistics | Spark performance tuning is tested: data skew (repartition), excessive shuffle (reduce or broadcast joins when one side is small), small files (compaction), spilling (increase memory or reduce partition count). Lakehouse table maintenance: Vacuum retention, compaction, Delta log management. Know where to look when Delta files accumulate.