Cert-Pass
Log in Sign up
Databricks

Data Engineer Associate Practice Questions

1030 exam-accurate questions with explanations

info

Free Sample Questions

Showing 10 of 1030 questions. Get full access to all questions, detailed explanations, and study materials.

1
Databricks Intelligence Platform

For a retail analytics workload, during a platform design review, architects compare implementation choices. A team needs a reliable lakehouse foundation for CRM exports that supports rollback after bad writes, consistent access for SQL analysts, and strong Unity Catalog governance. Which approach best fits Databricks Data Intelligence Platform principles? Which platform capability best fits the requirement?

A Use temporary views for all curated data because views automatically provide physical rollback and audit history.
B Keep only Parquet files in object storage and grant users direct cloud IAM access.
C Use Delta Lake tables governed by Unity Catalog so ACID transactions, time travel, lineage, and shared access controls support BI and AI workloads. check_circle
D Store curated data as unmanaged CSV files in DBFS and rely on folder naming for versions.
lightbulb

Explanation

Delta Lake with Unity Catalog provides transactional reliability, governed discovery, lineage, and consistent access. The strongest distractor, raw Parquet or CSV in storage, misses transaction logs, platform governance, and simple rollback.

2
Databricks Intelligence Platform

For a retail analytics workload, analysts query curated invoices tables during the day while scheduled transformations run overnight. The platform team wants isolation and cost-aware compute. What should they choose?

A Use one all-purpose interactive cluster for every analyst query and production ETL job.
B Use a single-node cluster for dashboards because it has the lowest possible startup time.
C Use model serving compute because it is optimized for SQL dashboard concurrency.
D Use a SQL warehouse for analyst SQL workloads and job compute for scheduled ETL tasks. check_circle
lightbulb

Explanation

SQL warehouses are designed for SQL analytics concurrency, while job compute is suitable for scheduled workloads. A single all-purpose cluster mixes use cases and weakens isolation and cost control.

3
Databricks Intelligence Platform

For a retail analytics workload, a company is standardizing access to gold_sales_mart and wants discoverability, ownership, and governed permissions across teams. Which organization pattern is most appropriate?

A Create one workspace folder per business unit and store production tables as notebook outputs.
B Create a cluster per table and rely on cluster permissions to define table ownership.
C Use DBFS root paths as the main security boundary for governed datasets.
D Create catalogs and schemas in Unity Catalog and organize data by environment, domain, and sensitivity. check_circle
lightbulb

Explanation

Unity Catalog provides the catalog/schema/table hierarchy for governance and discovery. Workspace folders or cluster permissions do not govern data objects consistently.

4
Databricks Intelligence Platform

A BI team needs fast startup and concurrency for ad hoc SQL on gold_sales_mart. Which compute service is the best fit?

A Use a job cluster because job clusters are designed for long-running interactive BI sessions.
B Use Lakeflow Jobs as the query engine because Jobs replace SQL warehouses for ad hoc analytics.
C Use Databricks SQL warehouse/serverless SQL when the workload is interactive SQL analytics with many concurrent users. check_circle
D Use an all-purpose cluster with a single worker to guarantee the highest concurrency.
lightbulb

Explanation

SQL warehouses are optimized for SQL analytics and concurrent BI users. Job clusters are best for scheduled job runs, not always-on interactive analytics.

5
Databricks Intelligence Platform

For a retail analytics workload, a bad load introduced duplicate records into silver_events. The team needs to recover a previous valid state quickly. What platform feature is most relevant?

A Change the cluster runtime version and rerun the dashboard query.
B Use Delta time travel or restore capabilities after identifying the bad version. check_circle
C Grant SELECT to fewer users because permissions automatically remove corrupted rows.
D Delete all files in the table path and reload from a local CSV backup.
lightbulb

Explanation

Delta Lake transaction history enables time travel and restore workflows. Permissions or runtime changes do not revert data contents.

6
Databricks Intelligence Platform

For a retail analytics workload, a data engineer must select the safest implementation for the requirement. A dashboard based on gold_device_health suddenly changed after a production deployment. The engineer must identify upstream dependencies and recent modifications. What should be used first?

A Use Unity Catalog lineage and table history to trace upstream tables and recent writes. check_circle
B Inspect only the notebook revision history because it always shows every data write.
C Search DBFS file names because table lineage is encoded in file names.
D Ask every user to manually report which query they ran.
lightbulb

Explanation

Unity Catalog lineage and Delta history help trace dependencies and modifications. Notebook history alone may miss writes from jobs, SQL, or pipelines.

7
Databricks Intelligence Platform

For a retail analytics workload, a new security telemetry workload includes interactive exploration, scheduled ETL, and BI consumption. Which decision rule best reflects Databricks compute best practice?

A Always use serverless SQL for streaming ingestion because it replaces streaming compute.
B Always use the largest all-purpose cluster to avoid tuning decisions.
C Select compute based on workload characteristics such as interactivity, concurrency, scheduling, startup needs, and cost model. check_circle
D Always use single-node clusters for production because they are simpler to govern.
lightbulb

Explanation

Compute choice depends on workload type and cost/performance needs. The top wrong answer overprovisions and ignores workload isolation.

8
Databricks Intelligence Platform

For a retail analytics workload, the data engineering team is creating a governed source of truth for events data. Which storage and registration approach is most aligned to the lakehouse?

A Store only JSON files in the workspace Files area.
B Store data in notebook-scoped temporary views and export query results to CSV for consumers.
C Place all data in the DBFS root and manage sharing with cluster attach permissions.
D Store curated data in Delta tables registered in Unity Catalog. check_circle
lightbulb

Explanation

Delta tables in Unity Catalog are governed, discoverable, and reliable. DBFS root or workspace files are not the right governed source-of-truth pattern.

9
Databricks Intelligence Platform

A team needs a reliable lakehouse foundation for IoT sensor drops that supports rollback after bad writes, consistent access for SQL analysts, and minimal custom code. Which approach best fits Databricks Data Intelligence Platform principles?

A Use temporary views for all curated data because views automatically provide physical rollback and audit history.
B Keep only Parquet files in object storage and grant users direct cloud IAM access.
C Store curated data as unmanaged CSV files in DBFS and rely on folder naming for versions.
D Use Delta Lake tables governed by Unity Catalog so ACID transactions, time travel, lineage, and shared access controls support BI and AI workloads. check_circle
lightbulb

Explanation

Delta Lake with Unity Catalog provides transactional reliability, governed discovery, lineage, and consistent access. The strongest distractor, raw Parquet or CSV in storage, misses transaction logs, platform governance, and simple rollback.

10
Databricks Intelligence Platform

Analysts query curated transactions tables during the day while scheduled transformations run overnight. The platform team wants isolation and cost-aware compute. What should they choose?

A Use model serving compute because it is optimized for SQL dashboard concurrency.
B Use a SQL warehouse for analyst SQL workloads and job compute for scheduled ETL tasks. check_circle
C Use one all-purpose interactive cluster for every analyst query and production ETL job.
D Use a single-node cluster for dashboards because it has the lowest possible startup time.
lightbulb

Explanation

SQL warehouses are designed for SQL analytics concurrency, while job compute is suitable for scheduled workloads. A single all-purpose cluster mixes use cases and weakens isolation and cost control.

Get all 1030 questions

Full access includes all questions, detailed explanations, PDF downloads, and timed mock exams.