The DP-700 exam has an entire domain dedicated to monitoring and troubleshooting. You need to know where to look when things break and what fix matches each symptom. This guide covers every diagnostic pattern that shows up in the exam.
The Diagnostic Decision Tree When something fails in Fabric, the exam wants you to go to the right place first: What failed? โโโ Pipeline โ Pipeline run details โ Activity output โโโ Notebook โ Spark UI โ Stages, tasks, shuffle, skew โโโ Dataflow Gen2 โ Refresh history โ Step errors โโโ Eventstream โ Ingestion diagnostics โ Schema mapping โโโ Eventhouse โ Query diagnostics โ KQL design โโโ Warehouse โ Query plan โ Scans, joins, statistics โโโ OneLake shortcut โ Path, permission, source availability
Pipeline Failures Where to look: Pipeline run history โ click the failed run โ check each activity's output. Common pipeline failure causes: | Symptom | Likely Cause | Fix | |
|
|
| | Activity fails with auth error | Connection/credential expired | Update linked service credentials | | Copy activity fails | Source file not found or schema changed | Fix path or update schema mapping | | Dependency not met | Upstream activity failed or condition not met | Fix upstream, check dependency settings | | Parameter error | Wrong expression or missing parameter | Fix dynamic expression or add parameter | | Timeout | Large data or slow source | Increase timeout or optimize source query | The exam pattern: "A pipeline copy activity fails. The error says the source file path is invalid. What should you check first?" The answer is always the most direct cause: check the file path and connection in the linked service, not "restart the capacity" or "switch to a notebook."
Notebook and Spark Failures Where to look: Spark UI (accessible from notebook or job run) โ Stages tab โ Tasks. Key Spark UI metrics: | Metric | What It Means | Action | |
|
|
| | Data skew | One task processes most data | Repartition, use salting | | Shuffle read/write | Data moved between stages | Use broadcast joins for small tables | | Spill (memory/disk) | Executor ran out of memory | Increase memory or reduce partition count | | GC time high | Too much garbage collection | Cache less, reduce data shuffled | | Scheduler delay | Tasks waiting for resources | Check cluster size and concurrency | Broadcast joins are tested. When one table is small (<10 MB by default), broadcast it to all executors to avoid shuffling the large table. This is the exam-favorite fix for "join is slow and causes excessive shuffle." Caching: Cache a DataFrame when it's used multiple times in the notebook. But don't cache everything: that wastes memory. The exam tests whether you know when caching helps (repeated access) vs when it hurts (single pass).
Dataflow Gen2 Failures Where to look: Dataflow Gen2 refresh history โ click failed refresh โ check step-level errors. Common causes:: Schema mismatch between source and destination: Credential expiration on source connection: Transformation step logic error (e.g., division by zero): Destination mapping changed (column renamed at source) The exam pattern: "A Dataflow Gen2 refresh fails after a source column was renamed. What should you do?" Fix the column mapping in the transformation steps. Not "switch to a notebook" or "create a new pipeline."
Eventstream and Eventhouse Failures Eventstream diagnostic areas:: Source connection (is the event source reachable?): Schema mapping (do incoming fields match the destination schema?): Throughput (is the event rate within capacity?) Eventhouse diagnostic areas:: Ingestion failures (check ingestion logs and error details): Schema mapping errors (field type mismatches): KQL query performance (full table scans, missing time filters) KQL optimization tips tested on the exam:: Always filter by time first (where Timestamp > ago(1d)): Use summarize and make_list for aggregation: Avoid scan when you can use indexed columns: Use materialized views for repeated aggregations
Warehouse Query Performance Where to look: Query plan/performance view in the warehouse. Common warehouse problems: | Problem | Fix | |
|
| | Full table scan on large table | Add filters, consider partitioning | | Inefficient join | Check join order, use appropriate join type | | Outdated statistics | Update statistics so optimizer has current data | | Repeated expensive aggregation | Create a materialized view | Materialized views in warehouse are tested. They precompute and maintain results for repeated expensive queries. The exam uses this as the fix for "the same aggregation query runs 100 times per day and is slow."
OneLake Shortcut Failures Common causes:: Path doesn't exist or was renamed: Permissions changed on the source: Source storage account is unavailable: Schema changed at the source The exam pattern: "A OneLake shortcut to a lakehouse in another workspace returns an error. What should you check first?" Check the path and permissions. Not "switch to a copy activity": the shortcut is the right tool, it just needs fixing.
Lakehouse Table Maintenance Delta table problems and fixes: | Problem | Symptom | Fix | |
|
|
| | Too many small files | Slow queries, long file listing | OPTIMIZE (compaction) | | Old snapshots slow metadata ops | Slow DESCRIBE HISTORY, slow VACUUM | VACUUM with appropriate retention | | Table is large and queries scan everything | Slow filtered queries | Add partitioning or Z-ordering | VACUUM retention is tested. Delta keeps old file versions for time travel. VACUUM removes files older than the retention period. Setting retention too low risks breaking time travel. The default is 7 hours minimum (not 0).