What is Pig Latin basics in Hadoop?

Hadoopdata~5 mins

Pig Latin basics in Hadoop

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Pig Latin helps you write simple steps to process big data easily. It makes working with large data sets faster and clearer.

You want to clean or filter large data files quickly.

You need to join two big data tables to find connections.

You want to group data to find totals or averages.

You want to sort data to see top or bottom results.

You want to load data from files and save results back.

Syntax

Hadoop

alias = LOAD 'datafile' USING loader AS (field1:type, field2:type, ...);
filtered = FILTER alias BY condition;
grouped = GROUP alias BY field;
result = FOREACH grouped GENERATE group, COUNT(alias);
STORE result INTO 'output';

Each step creates a new alias (name) for the data.

Use LOAD to read data, FILTER to select rows, GROUP to collect by key, FOREACH to process groups, and STORE to save results.

Examples

Load a CSV file with user info, defining each column's name and type.

Hadoop

data = LOAD 'users.csv' USING PigStorage(',') AS (name:chararray, age:int, city:chararray);

Select only users who are 18 or older.

Hadoop

adults = FILTER data BY age >= 18;

Group users by their city.

Hadoop

grouped_by_city = GROUP adults BY city;

Count how many adults are in each city.

Hadoop

count_by_city = FOREACH grouped_by_city GENERATE group, COUNT(adults);

Sample Program

This program loads user data, filters adults, groups them by city, counts adults per city, and shows the result.

Hadoop

users = LOAD 'users.csv' USING PigStorage(',') AS (name:chararray, age:int, city:chararray);
adults = FILTER users BY age >= 18;
grouped = GROUP adults BY city;
count_by_city = FOREACH grouped GENERATE group AS city, COUNT(adults) AS adult_count;
DUMP count_by_city;

OutputSuccess

Important Notes

Always define the schema (field names and types) when loading data for clarity.

Use DUMP to see results immediately during testing, STORE to save results permanently.

Pig Latin commands run in order, each step uses the previous step's output.

Summary

Pig Latin is a simple language to process big data step-by-step.

Use LOAD, FILTER, GROUP, FOREACH, and STORE to handle data easily.

It helps you clean, group, count, and save big data results quickly.