Google Cloud Certified – Professional Data Engineer – Practice Exam (Question 50)
Question 1
You have an Apache Kafka cluster on-prem with topics containing web application logs.
You need to replicate the data to Google Cloud for analysis in Google BigQuery and Google Cloud Storage. The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins.
What should you do?
- A. Deploy a Kafka cluster on Google Compute Engine VM Instances. Configure your on-prem cluster to mirror your topics to the cluster running in Google Compute Engine. Use a Dataproc cluster or Dataflow job to read from Kafka and write to Google Cloud Storage.
- B. Deploy a Kafka cluster on Google Compute Engine VM Instances with the PubSub Kafka connector configured as a Sink connector. Use a Dataproc cluster or Dataflow job to read from Kafka and write to Google Cloud Storage.
- C. Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Source connector. Use a Dataflow job to read from PubSub and write to Google Cloud Storage.
- D. Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Sink connector. Use a Dataflow job to read from PubSub and write to Google Cloud Storage.
Correct Answer: A
Question 2
You have data pipelines running on Google BigQuery, Google Cloud Dataflow, and Google Cloud Dataproc.
You need to perform health checks and monitor their behavior, and then notify the team managing the pipelines if they fail. You also need to be able to work across multiple projects. Your preference is to use managed products of features of the platform.
What should you do?
- A. Export the information to Google Stackdriver, and set up an Alerting policy.
- B. Run a Virtual Machine in Google Compute Engine with Airflow, and export the information to Google Stackdriver.
- C. Export the logs to Google BigQuery, and set up Google App Engine to read that information and send emails if you find a failure in the logs.
- D. Develop an Google App Engine application to consume logs using GCP API calls, and send emails if you find a failure in the logs.
Correct Answer: B
Question 3
You have developed three data processing jobs.
One executes a Google Cloud Dataflow pipeline that transforms data uploaded to Google Cloud Storage and writes results to Google BigQuery. The second ingests data from on-premises servers and uploads it to Google Cloud Storage. The third is a Google Cloud Dataflow pipeline that gets information from third-party data providers and uploads the information to Google Cloud Storage. You need to be able to schedule and monitor the execution of these three workflows and manually execute them when needed.
What should you do?
- A. Create a Direct Acyclic Graph in Google Cloud Composer to schedule and monitor the jobs.
- B. Use Stackdriver Monitoring and set up an alert with a Webhook notification to trigger the jobs.
- C. Develop a Google App Engine application to schedule and request the status of the jobs using GCP API calls.
- D. Set up cron jobs in a Google Compute Engine instance to schedule and monitor the pipelines using GCP API calls.