Given the following Pig Latin script, what will be the content of filtered_data?
data = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, age:int, city:chararray);
filtered_data = FILTER data BY age > 30;
DUMP filtered_data;Assume input.txt contains:
John,25,New York
Mary,35,Chicago
Bob,40,Seattle
data = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, age:int, city:chararray); filtered_data = FILTER data BY age > 30; DUMP filtered_data;
Think about the filter condition age > 30 and which rows satisfy it.
The filter keeps only rows where age is greater than 30. John is 25, so excluded. Mary and Bob are included.
Consider this Pig Latin script:
records = LOAD 'data.csv' USING PigStorage(',') AS (id:int, score:int);
passed = FILTER records BY score >= 50;
STORE passed INTO 'passed_output';If data.csv has 5 records with scores: 45, 50, 60, 30, 55, how many records will be stored in passed_output?
records = LOAD 'data.csv' USING PigStorage(',') AS (id:int, score:int); passed = FILTER records BY score >= 50; STORE passed INTO 'passed_output';
Count how many scores are 50 or more.
Scores 50, 60, and 55 meet the condition, so 3 records remain.
What error will this Pig Latin script produce?
data = LOAD 'file.txt' USING PigStorage(',') AS (name:chararray, age:int);
filtered = FILTER data BY age > '30';
DUMP filtered;data = LOAD 'file.txt' USING PigStorage(',') AS (name:chararray, age:int); filtered = FILTER data BY age > '30'; DUMP filtered;
Check the data types used in the filter condition.
The filter compares an integer field age with a string '30', causing a type mismatch error.
You want to load a CSV file, filter rows where status is 'active', and save the result. Which command correctly stores the filtered data?
data = LOAD 'users.csv' USING PigStorage(',') AS (user_id:int, status:chararray);
active_users = FILTER data BY status == 'active';data = LOAD 'users.csv' USING PigStorage(',') AS (user_id:int, status:chararray); active_users = FILTER data BY status == 'active';
Remember to specify the storage format when using STORE.
Option A correctly stores the filtered relation active_users using PigStorage.
Consider this Pig Latin script:
data = LOAD 'records.txt' USING PigStorage(',') AS (id:int, value:int);
STORE data INTO 'output_dir' USING PigStorage();What will be the content of output_dir?
data = LOAD 'records.txt' USING PigStorage(',') AS (id:int, value:int); STORE data INTO 'output_dir' USING PigStorage();
Think about what STORE does with a loaded relation.
STORE saves all data in the relation. Filtering is optional. Without filtering, all loaded records are stored.