0
0
Hadoopdata~10 mins

User-defined functions (UDFs) in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - User-defined functions (UDFs)
Define UDF class
Implement evaluate() method
Compile and add UDF to Hadoop
Use UDF in Hive query
Hive calls evaluate() for each row
Return processed result
Query outputs transformed data
This flow shows how you create a UDF by defining a class with an evaluate method, compile it, use it in Hive, and get transformed results.
Execution Sample
Hadoop
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class MyUpper extends UDF {
  public Text evaluate(Text input) {
    if (input == null) return null;
    return new Text(input.toString().toUpperCase());
  }
}
This UDF converts input text to uppercase when called from Hive.
Execution Table
StepActionInputevaluate() OutputHive Query Output
1Call evaluate() with 'hello''hello''HELLO''HELLO'
2Call evaluate() with 'world''world''WORLD''WORLD'
3Call evaluate() with nullnullnullnull
4No more rows---
💡 All input rows processed, UDF calls complete.
Variable Tracker
VariableStartAfter 1After 2After 3Final
inputnull'hello''world'nullnull
outputnull'HELLO''WORLD'nullnull
Key Moments - 3 Insights
Why does evaluate() return null when input is null?
Because the code checks if input is null and returns null immediately (see execution_table step 3), avoiding errors.
How does Hive use the UDF for each row?
Hive calls the evaluate() method once per row with that row's input (shown in execution_table steps 1-3).
What happens if you forget to compile and add the UDF to Hive?
Hive will not recognize the UDF and will give an error when you try to use it in queries.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the output of evaluate() when input is 'world'?
A'world'
B'WORLD'
Cnull
D'World'
💡 Hint
Check execution_table row 2 under 'evaluate() Output'
At which step does evaluate() receive a null input?
AStep 3
BStep 2
CStep 1
DStep 4
💡 Hint
Look at execution_table row 3 under 'Input'
If you remove the null check in evaluate(), what will happen when input is null?
AIt will return null safely
BIt will return empty string
CIt will throw an error
DIt will convert null to 'NULL'
💡 Hint
Refer to key_moments about null input handling and execution_table step 3
Concept Snapshot
User-defined functions (UDFs) in Hadoop allow custom processing in Hive queries.
Define a Java class extending UDF with an evaluate() method.
evaluate() processes one input row and returns a result.
Compile and add the UDF jar to Hive before use.
Hive calls evaluate() for each row during query execution.
Always handle null inputs to avoid errors.
Full Transcript
User-defined functions (UDFs) let you add your own code to process data in Hive queries. You write a Java class that extends UDF and implement an evaluate() method. This method takes input from each row and returns a result. After compiling your UDF into a jar file and adding it to Hive, you can call it in your queries. Hive runs evaluate() on every row, transforming data as you want. It's important to check for null inputs in evaluate() to prevent errors. The execution table shows how evaluate() is called with different inputs and returns uppercase strings or null safely.