0
0
Hadoopdata~5 mins

Rack awareness in HDFS in Hadoop - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is rack awareness in HDFS?
Rack awareness is a feature in HDFS that helps the system know which data nodes are located on which racks. This helps HDFS place data blocks in a way that improves fault tolerance and network bandwidth usage.
Click to reveal answer
beginner
Why does HDFS use rack awareness when storing data?
HDFS uses rack awareness to reduce the risk of data loss by storing copies of data blocks on different racks. This way, if one rack fails, data is still available from another rack.
Click to reveal answer
intermediate
How does rack awareness improve network efficiency in HDFS?
By knowing the rack location of data nodes, HDFS can place replicas to minimize cross-rack network traffic. It prefers to read data from the same rack to reduce network congestion and improve speed.
Click to reveal answer
intermediate
What happens if rack awareness is not configured in HDFS?
If rack awareness is not configured, HDFS treats all nodes as if they are on the same rack. This can lead to placing all replicas on the same rack, increasing risk of data loss if that rack fails.
Click to reveal answer
advanced
How does HDFS decide where to place replicas using rack awareness?
HDFS places one replica on the local node, another on a different node in a different rack, and the third on a different node in the same rack as the first replica. This balances fault tolerance and network efficiency.
Click to reveal answer
What is the main purpose of rack awareness in HDFS?
ATo encrypt data for security
BTo speed up data processing by using GPUs
CTo improve fault tolerance by spreading replicas across racks
DTo compress data blocks for storage
If rack awareness is not set, what risk increases in HDFS?
AData will be duplicated too many times
BData will be encrypted incorrectly
CData nodes will run out of memory
DData blocks may be lost if a rack fails
How many racks does HDFS try to use when placing three replicas of a block?
ATwo racks
BOne rack only
CThree racks
DFour racks
Which of these is NOT a benefit of rack awareness?
AReduced cross-rack network traffic
BFaster data encryption
CImproved fault tolerance
DBetter data locality
What information does HDFS need to implement rack awareness?
AMapping of data nodes to racks
BCPU speed of each node
CDisk size of each node
DUser access permissions
Explain how rack awareness helps improve fault tolerance and network efficiency in HDFS.
Think about how spreading data copies across racks helps keep data safe and speeds up access.
You got /4 concepts.
    Describe what could happen if rack awareness is not configured in an HDFS cluster.
    Consider the risks of not knowing where nodes are physically located.
    You got /4 concepts.