GCP Professional Data Engineer Practice Questions

1100 exam-accurate questions with explanations

info

Free Sample Questions

Showing 10 of 1100 questions. Get full access to all questions, detailed explanations, and study materials.

Designing data processing systems

Case-based: A travel marketplace is designing a Google Cloud data platform for payment transactions. The team must separate development and production access while still allowing curated analytics. What should the data engineer recommend?

A Create separate projects or datasets for environments, apply IAM at the narrowest practical scope, and expose curated BigQuery views with authorized views or policy tags. check_circle

B Put all tables in one dataset and ask teams to use naming conventions for access control.

C Grant BigQuery Admin to analysts and rely on audit logs to detect misuse.

D Export curated data to CSV files in Cloud Storage and share bucket object ACLs with analysts.

lightbulb

Explanation

Correct because separate environments, IAM, authorized views, and policy tags align governance with least privilege. The best wrong answer fails because naming conventions do not enforce access control.

Designing data processing systems

You need to choose the best architecture. A global energy utility must keep EU customer data in the EU and US customer data in the US. Analysts need aggregate reporting across both regions. Which design best satisfies the requirement?

A Store all raw data in a multi-region US dataset because BigQuery is globally available.

B Store regional datasets in matching BigQuery locations and create aggregate, non-sensitive derived datasets only where policy allows cross-region reporting. check_circle

C Replicate all EU raw data to US to simplify analyst access.

D Use one Cloud Storage bucket with dual-region placement and ignore dataset location.

lightbulb

Explanation

Correct because data residency is enforced by dataset location and only approved aggregates should cross boundaries. The best wrong answer fails because global service availability does not override residency requirements.

Designing data processing systems

A data engineer is reviewing a proposed solution. A team is migrating an on-premises warehouse containing 20 TB per month to BigQuery. They need to validate row counts and critical aggregates during cutover. What is the best migration approach?

A Use a one-time gsutil copy and immediately decommission the source system.

B Rewrite all dashboards first and validate only after production traffic is moved.

C Plan staged loads with BigQuery Data Transfer Service or Storage Transfer Service, run validation queries, reconcile counts and aggregates, then switch consumers. check_circle

D Use Pub/Sub because it is optimized for historical bulk warehouse migration.

lightbulb

Explanation

Correct because staged migration with validation reduces cutover risk. The best wrong answer fails because a one-time copy without reconciliation can hide missing or corrupted data.

Designing data processing systems

During production planning, a healthcare provider has pipelines that sometimes produce duplicate records after retries. The business requires accurate financial reporting. Which design principle should be emphasized?

A Disable all retries so duplicates cannot happen.

B Use only append-only tables and let dashboard users filter duplicates manually.

C Increase Dataflow worker count so retries complete faster.

D Design idempotent ingestion with deterministic keys, deduplication logic, and validation checks before publishing curated tables. check_circle

lightbulb

Explanation

Correct because idempotency and deduplication preserve fidelity despite retries. The best wrong answer fails because disabling retries lowers reliability and does not address partial failures.

Designing data processing systems

A data platform must support future migration to multiple clouds and minimize lock-in for raw data. Query performance in BigQuery is still important. What should the architect choose?

A Store raw data in open formats such as Parquet in Cloud Storage, govern it with Dataplex, and use BigLake or BigQuery external/native tables where appropriate. check_circle

B Store all data only in proprietary application exports because they are easiest to ingest.

C Use only Cloud SQL for all analytics so the schema is portable.

D Avoid metadata catalogs because they make the design Google-specific.

lightbulb

Explanation

Correct because open formats plus governance support portability while enabling Google Cloud analytics. The best wrong answer fails because Cloud SQL is not appropriate for large-scale analytical workloads.

Designing data processing systems

A healthcare provider wants automated data quality checks before data reaches trusted BigQuery tables. Which approach best fits Google Cloud best practices?

A Let Looker dashboards show NULL values so users can decide if data is valid.

B Add validation rules in Dataform or Dataflow, quarantine failed records, and publish only validated outputs to curated datasets. check_circle

C Use Cloud Monitoring only; metrics replace row-level validation.

D Grant analysts permission to edit production tables when they find bad records.

lightbulb

Explanation

Correct because automated validation and quarantine prevent bad data from reaching trusted tables. The best wrong answer fails because monitoring does not validate individual records or enforce quality gates.

Designing data processing systems

A regulated gaming company wants encryption keys controlled by its security team for BigQuery and Cloud Storage datasets. What should be used?

A Default Google-owned encryption keys only, because customer control is not possible.

B Hard-code encryption keys in Dataflow pipeline options.

C Customer-managed encryption keys in Cloud KMS with IAM separation and key rotation policies. check_circle

D Store keys in a BigQuery table and join them during processing.

lightbulb

Explanation

Correct because CMEK with Cloud KMS provides customer-controlled key management. The best wrong answer fails because default encryption protects data but does not give the customer direct key control.

Designing data processing systems

Case-based: A platform team must design project structure for many product teams. Each team owns data products but central governance must enforce policy. Which design is best?

A Put every workload in one project to make billing simple.

B Create one service account shared by every team to avoid permission complexity.

C Let each team create unmanaged buckets and datasets without central standards.

D Use separate projects or folders for teams, shared governance policies at folder or organization level, and Dataplex zones for discoverability and policy consistency. check_circle

lightbulb

Explanation

Correct because project/folder boundaries plus central policy support federated governance. The best wrong answer fails because one shared project makes isolation, quota management, and least privilege difficult.

Designing data processing systems

You need to choose the best architecture. A company wants to use generative AI to help analysts translate natural language to SQL, but the data contains sensitive fields. What is the safest design?

A Restrict model access to governed semantic layers or authorized views and mask sensitive fields using policy tags or data masking before query generation. check_circle

B Give the LLM service account BigQuery Admin so it can discover all schemas.

C Send raw tables including PII to a public prompt for better SQL quality.

D Disable IAM because the generated SQL will be reviewed by analysts.

lightbulb

Explanation

Correct because AI-assisted query generation must respect governed views and masking. The best wrong answer fails because broad admin access exposes sensitive data unnecessarily.

Designing data processing systems

A data engineer is reviewing a proposed solution. A migration plan must move an Oracle database to Google Cloud with minimal downtime and continuous replication during testing. Which service should be considered?

A Transfer Appliance because it continuously replicates database changes.

B Datastream for change data capture into Google Cloud, combined with validation and a controlled cutover plan. check_circle

C Cloud Composer because it is a database replication engine.

D BigQuery BI Engine because it migrates operational databases.

lightbulb

Explanation

Correct because Datastream supports CDC-style replication for migration patterns. The best wrong answer fails because Transfer Appliance is for offline bulk transfer, not continuous database change replication.

Get all 1100 questions

Full access includes all questions, detailed explanations, PDF downloads, and timed mock exams.

quiz Start Free Practice download Free PDF