Hadoop is a popular tool in data science. What is its main use?
Think about handling very big data that one computer cannot manage alone.
Hadoop helps store and process huge amounts of data by spreading it over many computers working together.
Hadoop has parts that handle storage and parts that handle processing. Which part stores the data?
Think about the file system that spreads data across computers.
HDFS is the storage system in Hadoop that splits and saves data across many machines.
Hadoop uses YARN as one of its components. What is YARN's main job?
Think about how Hadoop decides which computer does what work.
YARN manages resources and schedules tasks so Hadoop can process data efficiently across many machines.
MapReduce is a key part of Hadoop. What does it do?
Think about how Hadoop breaks down big jobs into smaller pieces to process.
MapReduce splits data processing into two steps: mapping (filtering and sorting) and reducing (summarizing results).
Hadoop is known for fault tolerance. What feature makes it fault-tolerant?
Think about how Hadoop protects data if one machine breaks.
Hadoop keeps several copies of data on different machines so if one fails, data is not lost.