0
0
Apache Sparkdata~20 mins

AWS EMR setup in Apache Spark - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
AWS EMR Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Architecture
intermediate
2:00remaining
Choosing the Right EMR Cluster Configuration

You want to run a Spark job on AWS EMR that processes large datasets with high fault tolerance. Which cluster configuration best fits this need?

AMultiple master nodes with no core or task nodes.
BA single master node with multiple core nodes and no task nodes.
CA single master node, multiple core nodes, and multiple task nodes.
DMultiple task nodes only, no master or core nodes.
Attempts:
2 left
💡 Hint

Think about fault tolerance and workload distribution in EMR clusters.

security
intermediate
2:00remaining
Securing EMR Cluster Access

You want to restrict SSH access to your EMR cluster only from your office IP address. Which AWS feature should you configure?

AEnable EMR cluster encryption at rest.
BConfigure EMR bootstrap actions to disable SSH.
CUse IAM roles to restrict access to the EMR cluster.
DModify the EMR cluster's security group to allow SSH only from your office IP.
Attempts:
2 left
💡 Hint

Think about network-level access control.

service_behavior
advanced
2:00remaining
EMR Cluster Auto Scaling Behavior

Given an EMR cluster with auto scaling enabled, what happens when the workload decreases significantly?

AThe cluster automatically terminates task nodes but keeps core nodes running.
BThe cluster does not change node count automatically.
CThe cluster shuts down the master node to save costs.
DThe cluster automatically terminates core nodes to save costs.
Attempts:
2 left
💡 Hint

Consider which nodes store data and which nodes are ephemeral.

Configuration
advanced
2:00remaining
Configuring EMR to Use Spot Instances

You want to reduce EMR cluster costs by using Spot Instances for worker nodes. Which configuration is correct?

ASet task nodes as Spot Instances and core nodes as On-Demand.
BSet master node as Spot Instance and core nodes as On-Demand.
CSet all nodes as Spot Instances including master node.
DSet core nodes as Spot Instances and master node as On-Demand.
Attempts:
2 left
💡 Hint

Think about which nodes are critical for cluster stability.

🧠 Conceptual
expert
2:00remaining
EMR Cluster Data Persistence After Termination

What happens to the data stored on HDFS of EMR core nodes when the cluster is terminated?

AData on HDFS remains available after cluster termination for 7 days.
BData on HDFS is lost unless backed up to S3 before termination.
CData on HDFS is automatically saved to S3 by EMR on termination.
DData on HDFS is replicated to other clusters automatically.
Attempts:
2 left
💡 Hint

Consider the nature of ephemeral storage on EMR core nodes.