Latest Mar 31, 2024 Professional-Machine-Learning-Engineer Brain Dump A Study Guide with Tips & Tricks for passing Exam [Q95-Q118]

Share

Latest Mar 31, 2024 Professional-Machine-Learning-Engineer Brain Dump: A Study Guide with Tips & Tricks for passing Exam

Professional-Machine-Learning-Engineer Question Bank: Free PDF Download Recently Updated Questions

NEW QUESTION # 95
You recently developed a deep learning model using Keras, and now you are experimenting with different training strategies. First, you trained the model using a single GPU, but the training process was too slow. Next, you distributed the training across 4 GPUs using tf.distribute.MirroredStrategy (with no other changes), but you did not observe a decrease in training time. What should you do?

  • A. Create a custom training loop.
  • B. Increase the batch size.
  • C. Use a TPU with tf.distribute.TPUStrategy.
  • D. Distribute the dataset with tf.distribute.Strategy.experimental_distribute_dataset

Answer: C


NEW QUESTION # 96
You work for a bank. You have created a custom model to predict whether a loan application should be flagged for human review. The input features are stored in a BigQuery table. The model is performing well and you plan to deploy it to production. Due to compliance requirements the model must provide explanations for each prediction. You want to add this functionality to your model code with minimal effort and provide explanations that are as accurate as possible What should you do?

  • A. Create a BigQuery ML deep neural network model, and use the ML. EXPLAIN_PREDICT method with the num_integral_steps parameter.
  • B. Upload the custom model to Vertex Al Model Registry and configure feature-based attribution by using sampled Shapley with input baselines.
  • C. Update the custom serving container to include sampled Shapley-based explanations in the prediction outputs.
  • D. Create an AutoML tabular model by using the BigQuery data with integrated Vertex Explainable Al.

Answer: B


NEW QUESTION # 97
You are an ML engineer at a global car manufacturer. You need to build an ML model to predict car sales in different cities around the world. Which features or feature crosses should you use to train city-specific relationships between car type and number of sales?

  • A. Two feature crosses as a element-wise product the first between binned latitude and one-hot encoded car type, and the second between binned longitude and one-hot encoded car type
  • B. One feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type
  • C. One feature obtained as an element-wise product between latitude, longitude, and car type
  • D. Three individual features binned latitude, binned longitude, and one-hot encoded car type

Answer: B

Explanation:
A feature cross is a synthetic feature that is obtained by combining two or more existing features, usually by taking their product or concatenation. A feature cross can help to capture the nonlinear and interaction effects between the original features, and improve the predictive performance of themodel. A feature cross can be applied to different types of features, such as numeric, categorical, or geospatial features1.
For the use case of building an ML model to predict car sales in different cities around the world, the best option is to use one feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type. This option involves creating a feature cross that combines three individual features: binned latitude, binned longitude, and one-hot encoded car type. Binning is a technique that transforms a continuous numeric feature into a discrete categorical feature by dividing its range into equal intervals, or bins. One-hot encoding is a technique that transforms a categorical feature into a binary vector, where each element corresponds to a possible category, and has a value of 1 if the feature belongs to that category, and 0 otherwise. By applying binning and one-hot encoding to the latitude, longitude, and car type features, the feature cross can capture the city-specific relationships between car type and number of sales, as each combination of bins and car types can represent a different city and its preference for a certain car type.
For example, the feature cross can learn that a city with a latitude bin of [40, 50], a longitude bin of [-80, -70], and a car type of SUV has a higher number of sales than a city with a latitude bin of [-10, 0], a longitude bin of
[10, 20], and a car type of sedan. Therefore, using one feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type is the best option for this use case.
References:
* Feature Crosses | Machine Learning Crash Course


NEW QUESTION # 98
You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables. You need to prioritize detection of fraudulent transactions while minimizing false positives. Which optimization objective should you use when training the model?

  • A. An optimization objective that minimizes Log loss
  • B. An optimization objective that maximizes the Precision at a Recall value of 0.50
  • C. An optimization objective that maximizes the area under the receiver operating characteristic curve (AUC ROC) value
  • D. An optimization objective that maximizes the area under the precision-recall curve (AUC PR) value

Answer: D

Explanation:
In this scenario, the goal is to create a custom fraud detection model using AutoML Tables. Fraud detection is a type of binary classification problem, where the model needs to predict whether a transaction is fraudulent or not. The optimization objective is a metric that defines how the model is trained and evaluated. AutoML Tables allows you to choose from different optimization objectives for binary classification problems, such as Log loss, Precision at a Recall value, AUC PR, and AUC ROC.
To choose the best optimization objective for fraud detection, we need to consider the characteristics of the problem and the data. Fraud detection is a problem where the positive class (fraudulent transactions) is very rare compared to the negative class (legitimate transactions). This means that the data is highly imbalanced, and the model needs to be sensitive to the minority class. Moreover, fraud detection is a problem where the cost of false negatives (missing a fraudulent transaction) is much higher than the cost of false positives (flagging a legitimate transaction as fraudulent). This means that the model needs to have high recall (the ability to detect all fraudulent transactions) while maintaining high precision (the ability to avoid false alarms).
Given these considerations, the best optimization objective for fraud detection is the one that maximizes the area under the precision-recall curve (AUC PR) value. The AUC PR value is a metric that measures the trade-off between precision and recall for different probability thresholds. A higher AUC PR value means that the model can achieve high precision and high recall at the same time. The AUC PR value is also more suitable for imbalanced data than the AUC ROC value, which measures the trade-off between the true positive rate and the false positive rate. The AUC ROC value can be misleading for imbalanced data, as it can give a high score even if the model has low recall or low precision.
Therefore, option C is the correct answer. Option A is not suitable, as Log loss is a metric that measures the difference between the predicted probabilities and the actual labels, and does not account for the trade-off between precision and recall. Option B is not suitable, as Precision at a Recall value is a metric that measures the precision at a fixed recall level, and does not account for thetrade-off between precision and recall at different thresholds. Option D is not suitable, as AUC ROC is a metric that can be misleading for imbalanced data, as explained above.
References:
* AutoML Tables documentation
* Optimization objectives for binary classification
* Precision-Recall Curves: How to Easily Evaluate Machine Learning Models in No Time
* ROC Curves and Area Under the Curve Explained (video)


NEW QUESTION # 99
You have trained a text classification model in TensorFlow using Al Platform. You want to use the trained model for batch predictions on text data stored in BigQuery while minimizing computational overhead. What should you do?

  • A. Use Dataflow with the SavedModel to read the data from BigQuery
  • B. Deploy and version the model on Al Platform.
  • C. Export the model to BigQuery ML.
  • D. Submit a batch prediction job on Al Platform that points to the model location in Cloud Storage.

Answer: D

Explanation:
This answer is correct because it allows you to use the trained TensorFlow model for batch predictions on text data stored in BigQuery without any additional processing or overhead. Al Platform provides a service for running batch prediction jobs that can take input data from BigQuery or Cloud Storage and write the output to BigQuery or Cloud Storage. You can use the SavedModel format to export your TensorFlow model to Cloud Storage and then submit a batch prediction job that points to the model location and the input data location. Al Platform will handle the scaling and distribution of the prediction requests and return the results in the specified output location. References:
* [Al Platform: Batch prediction overview]
* [Al Platform: Exporting a SavedModel for prediction]


NEW QUESTION # 100
You have developed an application that uses a chain of multiple scikit-learn models to predict the optimal price for your company's products. The workflow logic is shown in the diagram Members of your team use the individual models in other solution workflows. You want to deploy this workflow while ensuring version control for each individual model and the overall workflow Your application needs to be able to scale down to zero. You want to minimize the compute resource utilization and the manual effort required to manage this solution. What should you do?

  • A. Create a custom container endpoint for the workflow that loads each models individual files Track the versions of each individual model in BigQuery.
  • B. Expose each individual model as an endpoint in Vertex Al Endpoints. Use Cloud Run to orchestrate the workflow.
  • C. Expose each individual model as an endpoint in Vertex Al Endpoints. Create a custom container endpoint to orchestrate the workflow.
  • D. Load each model's individual files into Cloud Run Use Cloud Run to orchestrate the workflow Track the versions of each individual model in BigQuery.

Answer: B

Explanation:
The option C is the most efficient and scalable solution for deploying a machine learning workflow with multiple models while ensuring version control and minimizing compute resource utilization. By exposing each model as an endpoint in Vertex AI Endpoints, it allows for easy versioning and management of individual models. Using Cloud Run to orchestrate the workflow ensures that the application can scale down to zero, thus minimizing resource utilization when not in use. Cloud Run is a service that allows you to run stateless containers on a fully managed environment or on Google Kubernetes Engine. You can use Cloud Run to invoke the endpoints of each model in the workflow and pass the data between them. You can also use Cloud Run to handle the input and output of the workflow and provide an HTTP interface for the application.
References:
* Vertex AI Endpoints documentation
* Cloud Run documentation
* Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate


NEW QUESTION # 101
You work for an online grocery store. You recently developed a custom ML model that recommends a recipe when a user arrives at the website. You chose the machine type on the Vertex Al endpoint to optimize costs by using the queries per second (QPS) that the model can serve, and you deployed it on a single machine with 8 vCPUs and no accelerators.
A holiday season is approaching and you anticipate four times more traffic during this time than the typical daily traffic You need to ensure that the model can scale efficiently to the increased demand. What should you do?

  • A. 1 Change the machine type on the endpoint to have 32 vCPUs
    2. Set up a monitoring job and an alert for CPU usage
    3 If you receive an alert, scale the vCPUs further as needed
  • B. 1 Maintain the same machine type on the endpoint Configure the endpoint to enable autoscalling based on vCPU usage.
    2 Set up a monitoring job and an alert for CPU usage
    3 If you receive an alert investigate the cause
  • C. 1, Maintain the same machine type on the endpoint.
    2 Set up a monitoring job and an alert for CPU usage
    3 If you receive an alert add a compute node to the endpoint
  • D. 1 Change the machine type on the endpoint to have a GPU_ Configure the endpoint to enable autoscaling based on the GPU usage.
    2 Set up a monitoring job and an alert for GPU usage.
    3 If you receive an alert investigate the cause.

Answer: B


NEW QUESTION # 102
You deployed an ML model into production a year ago. Every month, you collect all raw requests that were sent to your model prediction service during the previous month. You send a subset of these requests to a human labeling service to evaluate your model'sperformance. After a year, you notice that your model's performance sometimes degrades significantly after a month, while other times it takes several months to notice any decrease in performance. The labeling service is costly, but you also need to avoid large performance degradations. You want to determine how often you should retrain your model to maintain a high level of performance while minimizing cost. What should you do?

  • A. Compare the cost of the labeling service with the lost revenue due to model performance degradation over the past year. If the lost revenue is greater than the cost of the labeling service, increase the frequency of model retraining; otherwise, decrease the model retraining frequency.
  • B. Identify temporal patterns in your model's performance over the previous year. Based on these patterns, create a schedule for sending serving data to the labeling service for the next year.
  • C. Train an anomaly detection model on the training dataset, and run all incoming requests through this model. If an anomaly is detected, send the most recent serving data to the labeling service.
  • D. Run training-serving skew detection batch jobs every few days to compare the aggregate statistics of the features in the training dataset with recent serving data. If skew is detected, send the most recent serving data to the labeling service.

Answer: D

Explanation:
The best option for determining how often to retrain your model to maintain a high level of performance while minimizing cost is to run training-serving skew detection batch jobs every few days. Training-serving skew refers to the discrepancy between the distributions of the features in the training dataset and the serving data. This can cause the model to perform poorly on the new data, as it is not representative of the data that the model was trained on. By running training-serving skew detection batch jobs, you can monitor the changes in the feature distributions over time, and identify when the skew becomes significant enough to affect the model performance. If skew is detected, you can send the most recent serving data to the labeling service, and use the labeled data to retrain your model. This option has the following benefits:
* It allows you to retrain your model only when necessary, based on the actual data changes, rather than on a fixed schedule or a heuristic. This can save you the cost of the labeling service and the retraining process, and also avoid overfitting or underfitting your model.
* It leverages the existing tools and frameworks for training-serving skew detection, such as TensorFlow
* Data Validation (TFDV) and Vertex Data Labeling. TFDV is a library that can compute and visualize descriptive statistics for your datasets, and compare the statistics across different datasets. Vertex Data Labeling is a service that can label your data with high quality and low latency, using either human labelers or automated labelers.
* It integrates well with the MLOps practices, such as continuous integration and continuous delivery (CI/CD), which can automate the workflow of running the skew detection jobs, sending the data to the labeling service, retraining the model, and deploying the new model version.
The other options are less optimal for the following reasons:
* Option A: Training an anomaly detection model on the training dataset, and running all incoming requests through this model, introduces additional complexity and overhead. This option requires building and maintaining a separate model for anomaly detection, which can be challenging and time-consuming. Moreover, this option requires running the anomaly detection model on every request, which can increase the latency and resource consumption of the prediction service. Additionally, this option may not capture the subtle changes in the feature distributions that can affect the model performance, as anomalies are usually defined as rare or extreme events.
* Option B: Identifying temporal patterns in your model's performance over the previous year, and creating a schedule for sending serving data to the labeling service for the next year, introduces additional assumptions and risks. This option requires analyzing the historical data and model performance, and finding the patterns that can explain the variations in the model performance over time. However, this can be difficult and unreliable, as the patterns may not be consistent or predictable, and may depend on various factors that are not captured by the data. Moreover, this option requires creating a schedule based on the past patterns, which may not reflect the future changes in the data or the environment. This can lead to either sending too much or too little data to the labeling service, resulting in either wasted cost or degraded performance.
* Option C: Comparing the cost of the labeling service with the lost revenue due to model performance degradation over the past year, and adjusting the frequency of model retraining accordingly, introduces additional challenges and trade-offs. This option requires estimating the cost of the labeling service and the lost revenue due to model performance degradation, which can be difficult and inaccurate, as they may depend on various factors that are not easily quantifiable or measurable. Moreover, this option requires finding the optimal balance between the cost and the performance, which can be subjective and variable, as different stakeholders may have different preferences and expectations. Furthermore, this option may not account for the potential impact of the model performance degradation on other aspects of the business, such as customer satisfaction, retention, or loyalty.


NEW QUESTION # 103
You have created a Vertex Al pipeline that automates custom model training You want to add a pipeline component that enables your team to most easily collaborate when running different executions and comparing metrics both visually and programmatically. What should you do?

  • A. Add a component to the Vertex Al pipeline that logs metrics to a BigQuery table Load the table into a pandas DataFrame to compare different executions of the pipeline Use Matplotlib to visualize metrics.
  • B. Add a component to the Vertex Al pipeline that logs metrics to Vertex ML Metadata Use Vertex Al Experiments to compare different executions of the pipeline Use Vertex Al TensorBoard to visualize metrics.
  • C. Add a component to the Vertex Al pipeline that logs metrics to a BigQuery table Query the table to compare different executions of the pipeline Connect BigQuery to Looker Studio to visualize metrics.
  • D. Add a component to the Vertex Al pipeline that logs metrics to Vertex ML Metadata Load the Vertex ML Metadata into a pandas DataFrame to compare different executions of the pipeline. Use Matplotlib to visualize metrics.

Answer: B


NEW QUESTION # 104
You work for an online publisher that delivers news articles to over 50 million readers. You have built an AI model that recommends content for the company's weekly newsletter. A recommendation is considered successful if the article is opened within two days of the newsletter's published date and the user remains on the page for at least one minute.
All the information needed to compute the success metric is available in BigQuery and is updated hourly. The model is trained on eight weeks of data, on average its performance degrades below the acceptable baseline after five weeks, and training time is 12 hours. You want to ensure that the model's performance is above the acceptable baseline while minimizing cost. How should you monitor the model to determine when retraining is necessary?

  • A. Schedule a cron job in Cloud Tasks to retrain the model every week before the newsletter is created.
  • B. Use Vertex AI Model Monitoring to detect skew of the input features with a sample rate of 100% and a monitoring frequency of two days.
  • C. Schedule a weekly query in BigQuery to compute the success metric.
  • D. Schedule a daily Dataflow job in Cloud Composer to compute the success metric.

Answer: D


NEW QUESTION # 105
You work at an ecommerce startup. You need to create a customer churn prediction model Your company's recent sales records are stored in a BigQuery table You want to understand how your initial model is making predictions. You also want to iterate on the model as quickly as possible while minimizing cost How should you build your first model?

  • A. Export the data to a Cloud Storage Bucket Load the data into a pandas DataFrame on Vertex Al Workbench and train a logistic regression model with scikit-learn.
  • B. Export the data to a Cloud Storage Bucket Create tf. data. Dataset to read the data from Cloud Storage Implement a deep neural network in TensorFlow.
  • C. Create a tf.data.Dataset by using the TensorFlow BigQueryChent Implement a deep neural network in TensorFlow.
  • D. Prepare the data in BigQuery and associate the data with a Vertex Al dataset Create an AutoMLTabuiarTrainmgJob to train a classification model.

Answer: D

Explanation:
BigQuery is a service that allows you to store and query large amounts of data in a scalable and cost-effective way. You can use BigQuery to prepare the data for your customer churn prediction model, such as filtering, aggregating, and transforming the data. You can then associate the data with a Vertex AI dataset, which is a service that allows you to store and manage your ML data on Google Cloud. By using a Vertex AI dataset, you can easily access the data from other Vertex AI services, such as AutoML. AutoML is a service that allows you to create and train ML models without writing code. You can use AutoML to create an AutoMLTabularTrainingJob, which is a type of job that trains a classification model for tabular data, such as customer churn. By using an AutoMLTabularTrainingJob, you can benefit from the automated feature engineering, model selection, and hyperparameter tuning that AutoML provides. You can also use Vertex Explainable AI to understand how your model is making predictions, such as which features are most important and how they affect the prediction outcome. By using BigQuery, Vertex AI dataset, and AutoMLTabularTrainingJob, you can build your first model as quickly as possible while minimizing cost and complexity. References:
* BigQuery documentation
* Vertex AI dataset documentation
* AutoMLTabularTrainingJob documentation
* Vertex Explainable AI documentation
* Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate


NEW QUESTION # 106
Your company stores a large number of audio files of phone calls made to your customer call center in an on-premises database. Each audio file is in wav format and is approximately 5 minutes long. You need to analyze these audio files for customer sentiment. You plan to use the Speech-to-Text API. You want to use the most efficient approach. What should you do?

  • A. 1 Upload the audio files to Cloud Storage
    2. Call the speech: Iongrunningrecognize API endpoint to generate transcriptions
    3. Call the predict method of an AutoML sentiment analysis model to analyze the transcriptions
  • B. 1 Upload the audio files to Cloud Storage
    2 Call the speech: Iongrunningrecognize API endpoint to generate transcriptions.
    3 Create a Cloud Function that calls the Natural Language API by using the analyzesentiment method
  • C. 1 Iterate over your local Tiles in Python
    2. Use the Speech-to-Text Python library to create a speech.RecognitionAudio object and set the content to the audio file data
    3. Call the speech: recognize API endpoint to generate transcriptions
    4. Call the predict method of an AutoML sentiment analysis model to analyze the transcriptions
  • D. 1 Iterate over your local files in Python
    2 Use the Speech-to-Text Python Library to create a speech.RecognitionAudio object, and set the content to the audio file data
    3. Call the speech: lengrunningrecognize API endpoint to generate transcriptions

Answer: B

Explanation:
4 Call the Natural Language API by using the analyzesenriment method


NEW QUESTION # 107
You work for an advertising company and want to understand the effectiveness of your company's latest advertising campaign. You have streamed 500 MB of campaign data into BigQuery. You want to query the table, and then manipulate the results of that query with a pandas dataframe in an Al Platform notebook. What should you do?

  • A. Export your table as a CSV file from BigQuery to Google Drive, and use the Google Drive API to ingest the file into your notebook instance
  • B. Download your table from BigQuery as a local CSV file, and upload it to your Al Platform notebook instance Use pandas. read_csv to ingest the file as a pandas dataframe
  • C. From a bash cell in your Al Platform notebook, use the bq extract command to export the table as a CSV file to Cloud Storage, and then use gsutii cp to copy the data into the notebook Use pandas. read_csv to ingest the file as a pandas dataframe
  • D. Use Al Platform Notebooks' BigQuery cell magic to query the data, and ingest the results as a pandas dataframe

Answer: B


NEW QUESTION # 108
A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric. This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours.
With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s).
Which visualization will accomplish this?

  • A. A scatter plot showing the performance of the objective metric over each training iteration.
  • B. A histogram showing whether the most important input feature is Gaussian.
  • C. A scatter plot showing the correlation between maximum tree depth and the objective metric.
  • D. A scatter plot with points colored by target variable that uses t-Distributed Stochastic Neighbor Embedding (t-SNE) to visualize the large number of input variables in an easier-to-read dimension.

Answer: D


NEW QUESTION # 109
You work for a retail company. You have been asked to develop a model to predict whether a customer will purchase a product on a given day. Your team has processed the company's sales data, and created a table with the following rows:
* Customer_id
* Product_id
* Date
* Days_since_last_purchase (measured in days)
* Average_purchase_frequency (measured in 1/days)
* Purchase (binary class, if customer purchased product on the Date)
You need to interpret your models results for each individual prediction. What should you do?

  • A. Create a Vertex Al tabular dataset Train an AutoML model to predict customer purchases Deploy the model to a Vertex Al endpoint. At each prediction enable L1 regularization to detect non-informative features.
  • B. Create a BigQuery table Use BigQuery ML to build a boosted tree classifier Inspect the partition rules of the trees to understand how each prediction flows through the trees.
  • C. Create a Vertex Al tabular dataset Train an AutoML model to predict customer purchases Deploy the model to a Vertex Al endpoint and enable feature attributions Use the "explain" method to get feature attribution values for each individual prediction.
  • D. Create a BigQuery table Use BigQuery ML to build a logistic regression classification model Use the values of the coefficients of the model to interpret the feature importance with higher values corresponding to more importance.

Answer: C


NEW QUESTION # 110
You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at a low latency. You discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?

  • A. Create a tf.data.Dataset.prefetch transformation
  • B. Convert the images to tf .Tensor Objects, and then run tf. data. Dataset. from_tensors ().
  • C. Convert the images Into TFRecords, store the images in Cloud Storage, and then use the tf. data API to read the images for training
  • D. Convert the images to tf .Tensor Objects, and then run Dataset. from_tensor_slices{).

Answer: C


NEW QUESTION # 111
You work for a pharmaceutical company based in Canad
a. Your team developed a BigQuery ML model to predict the number of flu infections for the next month in Canada Weather data is published weekly and flu infection statistics are published monthly. You need to configure a model retraining policy that minimizes cost What should you do?

  • A. Download the weather and flu data each month Configure Cloud Scheduler to execute a Vertex Al pipeline to retrain the model monthly.
  • B. Download the weather and flu data each week Configure Cloud Scheduler to execute a Vertex Al pipeline to retrain the model weekly.
  • C. Download the weather and flu data each week Configure Cloud Scheduler to execute a Vertex Al pipeline to retrain the model every month.
  • D. Download the weather data each week, and download the flu data each month Deploy the model to a Vertex Al endpoint with feature drift monitoring. and retrain the model if a monitoring alert is detected.

Answer: D


NEW QUESTION # 112
You are training a TensorFlow model on a structured data set with 100 billion records stored in several CSV files. You need to improve the input/output execution performance. What should you do?

  • A. Load the data into BigQuery and read the data from BigQuery.
  • B. Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS)
  • C. Load the data into Cloud Bigtable, and read the data from Bigtable
  • D. Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage

Answer: D


NEW QUESTION # 113
Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is performing poorly due to a change in the distribution of the input data. How should you address the input differences in production?

  • A. Perform feature selection on the model, and retrain the model with fewer features
  • B. Retrain the model, and select an L2 regularization parameter with a hyperparameter tuning service
  • C. Create alerts to monitor for skew, and retrain the model.
  • D. Perform feature selection on the model, and retrain the model on a monthly basis with fewer features

Answer: C

Explanation:
The performance of a DNN regression model can degrade over time due to a change in the distribution of the input data. This phenomenon is known as data drift or concept drift, and it can affect the accuracy and reliability of the model predictions. Data drift can be caused by various factors, such as seasonal changes, population shifts, market trends, or external events1 To address the input differences in production, one should create alerts to monitor for skew, and retrain the model. Skew is a measure of how much the input data in production differs from the input data used for training the model. Skew can be detected by comparing the statistics and distributions of the input features in the training and production data, such as mean, standard deviation, histogram, or quantiles. Alerts can be set up to notify the model developers or operators when the skew exceeds a certain threshold, indicating a significant change in the input data2 When an alert is triggered, the model should be retrained with the latest data that reflects the current distribution of the input features. Retraining the model can help the model adapt to the new data and improve its performance. Retraining the model can be done manually or automatically, depending on the frequency and severity of the data drift. Retraining the model can also involve updating the model architecture, hyperparameters, or optimization algorithm, if necessary3 The other options are not as effective or feasible. Performing feature selection on the model and retraining the model with fewer features is not a good idea, as it may reduce the expressiveness and complexity of the model, and ignore some important features that may affect the output. Retraining the model and selecting an L2 regularization parameter with a hyperparameter tuning service is not relevant, as L2 regularization is a technique to prevent overfitting, not data drift. Retraining the model on a monthly basis with fewer features is not optimal, as it may not capture the timely changes in the input data, and may compromise the model performance.
References: 1: Data drift detection for machine learning models 2: Skew and drift detection 3: Retraining machine learning models


NEW QUESTION # 114
You are pre-training a large language model on Google Cloud. This model includes custom TensorFlow operations in the training loop Model training will use a large batch size, and you expect training to take several weeks You need to configure a training architecture that minimizes both training time and compute costs What should you do?

  • A.
  • B.
  • C.
  • D.

Answer: B

Explanation:
According to the official exam guide1, one of the skills assessed in the exam is to "design, build, and productionalize ML models to solve business challenges using Google Cloud technologies". TPUs2 are Google's custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs are designed to handle large batch sizes, high dimensional data, and complex computations. TPUs can significantly reduce the training time and compute costs of large language models, especially when used with distributed training strategies, such as MultiWorkerMirroredStrategy3. Therefore, option D is the best way to configure a training architecture that minimizes both training time and compute costs for the given use case. The other options are not relevant or optimal for this scenario. References:
* Professional ML Engineer Exam Guide
* TPUs
* MultiWorkerMirroredStrategy
* Google Professional Machine Learning Certification Exam 2023
* Latest Google Professional Machine Learning Engineer Actual Free Exam Questions


NEW QUESTION # 115
You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests. You want to store the results for analytics and visualization. How should you configure the pipeline?

  • A. 1 = BigQuery, 2 = AutoML, 3 = Cloud Functions
  • B. 1 = DataProc, 2 = AutoML, 3 = Cloud Bigtable
  • C. 1 = Dataflow, 2 - Al Platform, 3 = BigQuery
  • D. 1 = BigQuery, 2 = Al Platform, 3 = Cloud Storage

Answer: C

Explanation:
* Dataflow is a fully managed service for executing Apache Beam pipelines that can process streaming or batch data1.
* Al Platform is a unified platform that enables you to build and run machine learning applications across Google Cloud2.
* BigQuery is a serverless, highly scalable, and cost-effective cloud data warehouse designed for business agility3.
These services are suitable for building an ML model to detect anomalies in real-time sensor data, as they can handle large-scale data ingestion, preprocessing, training, serving, storage, and visualization. The other options are not as suitable because:
* DataProc is a service for running Apache Spark and Apache Hadoop clusters, which are not optimized
* for streaming data processing4.
* AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs5. However, it does not support custom models or real-time predictions.
* Cloud Bigtable is a scalable, fully managed NoSQL database service for large analytical and operational workloads. However, it is not designed for ad hoc queries or interactive analysis.
* Cloud Functions is a serverless execution environment for building and connecting cloud services.
However, it is not suitable for storing or visualizing data.
* Cloud Storage is a service for storing and accessing data on Google Cloud. However, it is not a data warehouse and does not support SQL queries or visualization tools.


NEW QUESTION # 116
A large company has developed a BI application that generates reports and dashboards using data collected from various operational metrics. The company wants to provide executives with an enhanced experience so they can use natural language to get data from the reports. The company wants the executives to be able ask questions using written and spoken interfaces.
Which combination of services can be used to build this conversational interface? (Choose three.)

  • A. Amazon Lex
  • B. Alexa for Business
  • C. Amazon Transcribe
  • D. Amazon Polly
  • E. Amazon Comprehend
  • F. Amazon Connect

Answer: C,E,F


NEW QUESTION # 117
You work at a leading healthcare firm developing state-of-the-art algorithms for various use cases You have unstructured textual data with custom labels You need to extract and classify various medical phrases with these labels What should you do?

  • A. Use AutoML Entity Extraction to train a medical entity extraction model.
  • B. Use TensorFlow to build a custom medical entity extraction model.
  • C. Use the Healthcare Natural Language API to extract medical entities.
  • D. Use a BERT-based model to fine-tune a medical entity extraction model.

Answer: D

Explanation:
Medical entity extraction is a task that involves identifying and classifying medical terms or concepts from unstructured textual data, such as electronic health records, clinical notes, or research papers. Medical entity extraction can help with various use cases, such as information retrieval, knowledge discovery, decision support, and data analysis1.
One possible approach to perform medical entity extraction is to use a BERT-based model to fine-tune a medical entity extraction model. BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model that can capture the contextual information from both left and right sides of a given token2. BERT can be fine-tuned on a specific downstream task, such as medical entity extraction, by adding a task-specific layer on top of the pre-trained model and updating the model parameters with a small amount of labeled data3.
A BERT-based model can achieve high performance on medical entity extraction by leveraging the large-scale pre-training on general-domain corpora and the fine-tuning on domain-specific data. Forexample, Nesterov and Umerenkov4 proposed a novel method of doing medical entity extraction from electronic health records as a single-step multi-label classification task by fine-tuning a transformer model pre-trained on a large EHR dataset. They showed that their model can achieve human-level quality for most frequent entities.
References:
* 1: Medical Named Entity Recognition from Un-labelled Medical Records based on Pre-trained Language Models and Domain Dictionary | Data Intelligence | MIT Press
* 2: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
* 3: Fine-tuning BERT for Medical Entity Extraction
* 4: Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality


NEW QUESTION # 118
......

New Professional-Machine-Learning-Engineer Exam Dumps with High Passing Rate: https://pass4sure.examcost.com/Professional-Machine-Learning-Engineer-practice-exam.html