0
0
Hadoopdata~10 mins

Bucketing for sampling in Hadoop - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a bucketed table with 4 buckets.

Hadoop
CREATE TABLE user_data_bucketed (user_id INT, name STRING) CLUSTERED BY (user_id) INTO [1] BUCKETS STORED AS ORC;
Drag options to blanks, or click blank then click option'
A4
B2
C6
D8
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong number of buckets that doesn't match the sampling requirement.
Forgetting to specify the CLUSTERED BY column.
2fill in blank
medium

Complete the code to sample 25% of data from a bucketed table.

Hadoop
SELECT * FROM user_data_bucketed TABLESAMPLE(BUCKET [1] OUT OF 4 ON user_id);
Drag options to blanks, or click blank then click option'
A2
B4
C3
D1
Attempts:
3 left
💡 Hint
Common Mistakes
Using bucket number 0 which is invalid.
Using bucket number equal to total buckets which returns all data.
3fill in blank
hard

Fix the error in the bucketing clause to correctly bucket by user_id.

Hadoop
CREATE TABLE user_logs (user_id INT, action STRING) CLUSTERED BY ([1]) INTO 8 BUCKETS STORED AS PARQUET;
Drag options to blanks, or click blank then click option'
Auser
Buser_id
Cuserid
Did
Attempts:
3 left
💡 Hint
Common Mistakes
Using a column name that does not exist in the table.
Misspelling the column name.
4fill in blank
hard

Fill both blanks to create a bucketed table and sample bucket 3 out of 5.

Hadoop
CREATE TABLE sales_data (sale_id INT, amount FLOAT) CLUSTERED BY ([1]) INTO [2] BUCKETS STORED AS TEXTFILE;
SELECT * FROM sales_data TABLESAMPLE(BUCKET 3 OUT OF 5 ON sale_id);
Drag options to blanks, or click blank then click option'
Asale_id
Bamount
C5
D3
Attempts:
3 left
💡 Hint
Common Mistakes
Mismatching bucket count between table and sampling.
Using wrong column for bucketing.
5fill in blank
hard

Fill all three blanks to create a bucketed table, insert data, and sample bucket 2 out of 4.

Hadoop
CREATE TABLE employee_data (emp_id INT, name STRING) CLUSTERED BY ([1]) INTO [2] BUCKETS STORED AS ORC;
INSERT INTO employee_data VALUES (1, 'Alice'), (2, 'Bob');
SELECT * FROM employee_data TABLESAMPLE(BUCKET [3] OUT OF 4 ON emp_id);
Drag options to blanks, or click blank then click option'
Aemp_id
B4
C2
Dname
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong column for bucketing.
Mismatching bucket count and sampling bucket number.