Challenge - 5 Problems

🎖️

Rack Awareness Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

Why is rack awareness important in HDFS?

Rack awareness helps HDFS decide where to place data blocks. What is the main benefit of using rack awareness?

AIt reduces network traffic by placing replicas on different racks to avoid data loss if a rack fails.

BIt speeds up data processing by storing all replicas on the same rack for faster access.

CIt simplifies the file system by ignoring network topology during data placement.

DIt increases storage capacity by duplicating data on the same node multiple times.

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

How does HDFS use rack awareness during block placement?

When HDFS writes a block, how does rack awareness influence where replicas are stored?

AAll replicas are stored on the same node to maximize speed.

BOne replica is stored on the local node, another on a different node in the same rack, and the third on a node in a different rack.

CReplicas are randomly placed without considering racks.

DTwo replicas are stored on the same rack and one on a different data center.

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Output of rack-aware block placement simulation

Given the following simplified Python simulation of HDFS rack-aware block placement, what is the output?

Hadoop

nodes = {'rack1': ['node1', 'node2'], 'rack2': ['node3', 'node4']}

import random

# Place one replica on node1
replicas = ['node1']

# Place second replica on different node in same rack
same_rack_nodes = [n for n in nodes['rack1'] if n != 'node1']
replicas.append(random.choice(same_rack_nodes))

# Place third replica on a node in a different rack
other_rack = 'rack2'
replicas.append(random.choice(nodes[other_rack]))

print(sorted(replicas))

A['node2', 'node3', 'node4']

B['node1', 'node2', 'node1']

C['node1', 'node2', 'node3'] or ['node1', 'node2', 'node4']

D['node1', 'node3', 'node4']

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in rack-aware replica placement code

What error will this Python code raise when simulating rack-aware replica placement?

nodes = {'rack1': ['node1', 'node2'], 'rack2': ['node3', 'node4']}

replicas = []

# Place one replica on node1
replicas.append('node1')

# Place second replica on different node in same rack
same_rack_nodes = [n for n in nodes['rack1'] if n != 'node1']
replicas.append(same_rack_nodes[1])

# Place third replica on a node in a different rack
other_rack = 'rack2'
replicas.append(nodes[other_rack][0])

print(replicas)

ANo error, code runs and prints the replica list.

BKeyError because 'rack3' does not exist in nodes dictionary.

CTypeError because replicas list cannot append strings.

DIndexError because same_rack_nodes has only 1 element but index 2 is accessed.

Attempts:

2 left

🚀 Application

expert

2:30remaining

Analyzing network traffic reduction using rack awareness

Suppose an HDFS cluster has 3 racks with 4 nodes each. Without rack awareness, replicas are placed randomly. With rack awareness, replicas are placed as per the standard policy (one local, one same rack, one different rack). Which option best describes the expected impact on cross-rack network traffic during data reads?

ARack awareness reduces cross-rack traffic by ensuring at least two replicas are on the same rack, minimizing data transfer across racks.

BRack awareness has no effect on network traffic since data is always read from the closest node regardless of rack.

CRack awareness increases cross-rack traffic because replicas are spread across all racks evenly.

DRack awareness causes all replicas to be on the same rack, eliminating cross-rack traffic completely.

Attempts:

2 left