If you're targeting the DP-700 Microsoft Fabric Data Engineer Associate exam, you need more than service names. You need to make architecture decisions the way the exam expects. This guide covers the planning, governance, and operational patterns that show up in the scenario questions.
Fabric Architecture Decisions Every DP-700 scenario question is an architecture decision. Here's the decision tree the exam uses: 1. What type of data? (Batch files, streaming events, relational operational) 2. Who is the user? (Data engineer writing code, analyst using SQL, real-time dashboard) 3. What's the access pattern? (Exploration, BI/reporting, real-time queries) 4. What are the operational needs? (CI/CD, governance, security, monitoring) Based on these, the exam picks the Fabric item. Here is the full map: ### When to use each Fabric item Lakehouse: Bronze/Silver/Gold medallion architecture: File-based or semi-structured data engineering: PySpark and notebook transformations: Delta tables with time travel and ACID: Open data layout for sharing across teams Warehouse: Curated reporting and BI workloads: Dimensional modeling and star schemas: T-SQL developer workflow: SQL-first analysts and dashboards Eventhouse: Real-time telemetry, clickstream, IoT data: KQL queries against time-series data: Sub-second query response on streaming data: Time-based windowing and aggregations Eventstream: Ingesting events from external sources: Routing events to Eventhouse, lakehouse, or other destinations: Real-time filtering and light transformation Notebook: Complex PySpark/SQL code transformations: Custom libraries, ML, advanced logic: Reusable engineering logic Dataflow Gen2: Low-code/no-code ingestion and transformation: Power Query experience for analysts: Simple scheduled refresh pipelines: Not for complex PySpark or custom libraries Data Pipeline: Orchestrating multi-step workloads: Scheduling with dependencies: Calling notebooks, Dataflows, stored procedures as steps: Parameters, expressions, retries ### Medallion Architecture in Fabric The Bronze/Silver/Gold pattern is tested on DP-700: | Layer | What Lives Here | Format | Users | |
|
|
|
| | Bronze | Raw ingestion, landing zone | Files, Delta | Data engineers | | Silver | Cleaned, deduplicated, conformed | Delta tables | Data engineers | | Gold | Business-ready, aggregated, dimensional | Delta/Warehouse | Analysts, BI | Bronze: Raw data lands from source. Minimal transformation. Keep original structure. Silver: Clean the data. Remove duplicates. Conform schemas. Join reference data. This is where the engineering happens (notebooks are common here). Gold: Business-ready models, aggregations, dimensional models. Optimized for BI consumption. Warehouse items often live here.
Workspace Management Workspace roles: | Role | Can Do | Should Not Do | |
|
|
| | Admin | Full workspace management, all items | Day-to-day development | | Member | Create and edit items, share | Manage workspace settings | | Contributor | Edit items they have access to | Share or manage permissions | | Viewer | Read-only access | Any editing | Least privilege mistake on the exam: Granting Admin to someone who only needs to run reports. The answer is always the most restrictive role that satisfies the requirement. Git Integration workflow: 1. Connect workspace to Azure DevOps or GitHub repo 2. Developers work in feature branches 3. Pull requests for code review and merge 4. Main branch is the stable version 5. Rollback to previous commits if needed Deployment Pipeline workflow: 1. Assign workspaces to stages (dev, test, prod) 2. Deploy items from dev to test to prod 3. Rules and approval gates between stages 4. Item comparison between stages before deploy 5. Environment-specific configuration (different connections per stage) The exam tests the difference between these two constantly:: Git = developer collaboration, version control, code review: Deployment pipeline = release management, environment promotion, governance
CI/CD and Deployment Real-world CI/CD pattern in Fabric: Dev workspace โ Git branch โ Pull request โ Merge to main โ Deploy from dev โ Test workspace โ Deploy to prod workspace The exam loves to ask: "A company wants to promote a lakehouse and its transformation notebook from development to production with approval gates." The answer is Deployment Pipeline (not Git alone, not manual copy). Parameters and dynamic expressions are tested for environment-specific values. Don't hardcode connection strings, file paths, or workspace IDs. Use pipeline parameters so the same notebook works across dev, test, and prod.
Security in Fabric SQL security = relational access control: Row-level security (filter rows per user/role): Column-level security (hide sensitive columns): Object-level security (grant SELECT on specific tables) OneLake security = file system access control: Folder-level and file-level ACLs: Works for Spark, OneLake shortcuts: Different from SQL security (different access path) Sensitivity labels = classification metadata: Classify data as Public, Confidential, Highly Confidential: Can integrate with Microsoft Purview: They do NOT restrict access by themselves (that's row/column security) Audit logs track who did what. Use them for compliance investigations. When to use which:: Analyst queries warehouse tables โ SQL security (row/column-level): Data engineer reads files through Spark โ OneLake security: Compliance department needs classification โ Sensitivity labels: Security team needs to trace actions โ Audit logs
Performance Optimization Spark notebook slow? Check these: 1. Data skew: one partition has most of the data. Repartition or use salting. 2. Too much shuffle: joins cause excessive data movement. Use broadcast joins for small tables. 3. Many small files: use compaction (OPTIMIZE in Delta). 4. Spilling to disk: executor memory is too small for the data. Increase or repartition. 5. No caching: repeatedly accessed intermediate results should be cached. Warehouse query slow? Check these: 1. Full table scans instead of selective filters 2. Missing statistics: the query optimizer doesn't have good cardinality estimates 3. Inefficient join order or join type 4. No materialization for repeated expensive aggregations Lakehouse table slow to query? 1. Too many small Delta files: run OPTIMIZE 2. Old snapshots accumulating: review VACUUM retention 3. Non-selective file layout: consider partitioning
The dp-700 study guide is a professional certification that validates your cloud skills. It is recognized by employers globally.
Exam costs vary: AWS exams range from 100 to 300 USD, Microsoft exams cost 165 USD, Google Cloud exams cost 200 USD.
Most candidates need 4 to 8 weeks. Hands-on experience reduces study time significantly.
Free practice questions are available at cert-pass.com. Full prep courses start at 49 EUR.