Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to create a bucketed table with 4 buckets.
Hadoop
CREATE TABLE user_data_bucketed (user_id INT, name STRING) CLUSTERED BY (user_id) INTO [1] BUCKETS STORED AS ORC; Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong number of buckets that doesn't match the sampling requirement.
Forgetting to specify the CLUSTERED BY column.
✗ Incorrect
The code creates a bucketed table with 4 buckets using the CLUSTERED BY clause.
2fill in blank
mediumComplete the code to sample 25% of data from a bucketed table.
Hadoop
SELECT * FROM user_data_bucketed TABLESAMPLE(BUCKET [1] OUT OF 4 ON user_id);
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using bucket number 0 which is invalid.
Using bucket number equal to total buckets which returns all data.
✗ Incorrect
Sampling bucket 1 out of 4 buckets gives approximately 25% of the data.
3fill in blank
hardFix the error in the bucketing clause to correctly bucket by user_id.
Hadoop
CREATE TABLE user_logs (user_id INT, action STRING) CLUSTERED BY ([1]) INTO 8 BUCKETS STORED AS PARQUET;
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a column name that does not exist in the table.
Misspelling the column name.
✗ Incorrect
The column name is user_id, so it must be used exactly in the CLUSTERED BY clause.
4fill in blank
hardFill both blanks to create a bucketed table and sample bucket 3 out of 5.
Hadoop
CREATE TABLE sales_data (sale_id INT, amount FLOAT) CLUSTERED BY ([1]) INTO [2] BUCKETS STORED AS TEXTFILE; SELECT * FROM sales_data TABLESAMPLE(BUCKET 3 OUT OF 5 ON sale_id);
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Mismatching bucket count between table and sampling.
Using wrong column for bucketing.
✗ Incorrect
The table is bucketed by sale_id into 5 buckets to match the sampling of bucket 3 out of 5.
5fill in blank
hardFill all three blanks to create a bucketed table, insert data, and sample bucket 2 out of 4.
Hadoop
CREATE TABLE employee_data (emp_id INT, name STRING) CLUSTERED BY ([1]) INTO [2] BUCKETS STORED AS ORC; INSERT INTO employee_data VALUES (1, 'Alice'), (2, 'Bob'); SELECT * FROM employee_data TABLESAMPLE(BUCKET [3] OUT OF 4 ON emp_id);
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong column for bucketing.
Mismatching bucket count and sampling bucket number.
✗ Incorrect
The table is bucketed by emp_id into 4 buckets, and sampling bucket 2 out of 4 selects 50% of data.