Challenge - 5 Problems

🎖️

AWS EMR Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Architecture

intermediate

2:00remaining

Choosing the Right EMR Cluster Configuration

You want to run a Spark job on AWS EMR that processes large datasets with high fault tolerance. Which cluster configuration best fits this need?

AMultiple master nodes with no core or task nodes.

BA single master node with multiple core nodes and no task nodes.

CA single master node, multiple core nodes, and multiple task nodes.

DMultiple task nodes only, no master or core nodes.

Attempts:

2 left

❓ security

intermediate

2:00remaining

Securing EMR Cluster Access

You want to restrict SSH access to your EMR cluster only from your office IP address. Which AWS feature should you configure?

AEnable EMR cluster encryption at rest.

BConfigure EMR bootstrap actions to disable SSH.

CUse IAM roles to restrict access to the EMR cluster.

DModify the EMR cluster's security group to allow SSH only from your office IP.

Attempts:

2 left

❓ service_behavior

advanced

2:00remaining

EMR Cluster Auto Scaling Behavior

Given an EMR cluster with auto scaling enabled, what happens when the workload decreases significantly?

AThe cluster automatically terminates task nodes but keeps core nodes running.

BThe cluster does not change node count automatically.

CThe cluster shuts down the master node to save costs.

DThe cluster automatically terminates core nodes to save costs.

Attempts:

2 left

❓ Configuration

advanced

2:00remaining

Configuring EMR to Use Spot Instances

You want to reduce EMR cluster costs by using Spot Instances for worker nodes. Which configuration is correct?

ASet task nodes as Spot Instances and core nodes as On-Demand.

BSet master node as Spot Instance and core nodes as On-Demand.

CSet all nodes as Spot Instances including master node.

DSet core nodes as Spot Instances and master node as On-Demand.

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

EMR Cluster Data Persistence After Termination

What happens to the data stored on HDFS of EMR core nodes when the cluster is terminated?

AData on HDFS remains available after cluster termination for 7 days.

BData on HDFS is lost unless backed up to S3 before termination.

CData on HDFS is automatically saved to S3 by EMR on termination.

DData on HDFS is replicated to other clusters automatically.

Attempts:

2 left