Hadoopdata~10 mins

User-defined functions (UDFs) in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - User-defined functions (UDFs)

Define UDF class

↓

Implement evaluate() method

↓

Compile and add UDF to Hadoop

↓

Use UDF in Hive query

↓

Hive calls evaluate() for each row

↓

Return processed result

↓

Query outputs transformed data

This flow shows how you create a UDF by defining a class with an evaluate method, compile it, use it in Hive, and get transformed results.

Execution Sample

Hadoop

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class MyUpper extends UDF {
  public Text evaluate(Text input) {
    if (input == null) return null;
    return new Text(input.toString().toUpperCase());
  }
}

This UDF converts input text to uppercase when called from Hive.

Execution Table

Step	Action	Input	evaluate() Output	Hive Query Output
1	Call evaluate() with 'hello'	'hello'	'HELLO'	'HELLO'
2	Call evaluate() with 'world'	'world'	'WORLD'	'WORLD'
3	Call evaluate() with null	null	null	null
4	No more rows	-	-	-

💡 All input rows processed, UDF calls complete.

Variable Tracker

Variable	Start	After 1	After 2	After 3	Final
input	null	'hello'	'world'	null	null
output	null	'HELLO'	'WORLD'	null	null

Key Moments - 3 Insights

Why does evaluate() return null when input is null?

How does Hive use the UDF for each row?

What happens if you forget to compile and add the UDF to Hive?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, what is the output of evaluate() when input is 'world'?

A'world'

B'WORLD'

Cnull

D'World'

Concept Snapshot

User-defined functions (UDFs) in Hadoop allow custom processing in Hive queries.
Define a Java class extending UDF with an evaluate() method.
evaluate() processes one input row and returns a result.
Compile and add the UDF jar to Hive before use.
Hive calls evaluate() for each row during query execution.
Always handle null inputs to avoid errors.

Full Transcript

User-defined functions (UDFs) let you add your own code to process data in Hive queries. You write a Java class that extends UDF and implement an evaluate() method. This method takes input from each row and returns a result. After compiling your UDF into a jar file and adding it to Hive, you can call it in your queries. Hive runs evaluate() on every row, transforming data as you want. It's important to check for null inputs in evaluate() to prevent errors. The execution table shows how evaluate() is called with different inputs and returns uppercase strings or null safely.