Which statement best describes the primary difference between HBase and HDFS?
Think about how each system handles data access and storage format.
HBase is a NoSQL database built on top of HDFS to provide fast random access to large datasets. HDFS is a distributed file system optimized for storing large files and batch processing.
Given the following HBase table snapshot, what will be the output of retrieving the value for row key 'user1' and column 'info:name'?
Row key: user1 Column Family: info Columns: name: Alice age: 30 Row key: user2 Column Family: info Columns: name: Bob age: 25
Focus on the value stored at the specified row and column.
The value stored at row 'user1' and column 'info:name' is 'Alice'.
What will be the output of the following command showing the replication factor of a file in HDFS?
hdfs dfs -stat %r /user/data/file1.txt
By default, what is the replication factor for files in HDFS?
The default replication factor in HDFS is 3, meaning each file block is stored on three different nodes for fault tolerance.
Which use case is best suited for HBase rather than HDFS?
Consider which system supports fast random reads and writes.
HBase supports real-time read/write access and is ideal for use cases requiring frequent updates and quick queries, such as user profiles.
What error will the following Python code produce when trying to insert data into HBase using the HappyBase library?
import happybase connection = happybase.Connection('localhost') table = connection.table('users') table.put('user1', {'info:name': 'Alice', 'info:age': b'30'})
Check the data types required by HappyBase for values.
HappyBase requires values to be bytes, but the age value is an integer, causing a TypeError.