User-defined functions (UDFs) let you add your own special commands to process data in Hadoop. They help you do tasks that built-in commands cannot do easily.
User-defined functions (UDFs) in Hadoop
import org.apache.hadoop.hive.ql.exec.UDF; public class MyCustomUDF extends UDF { public String evaluate(String input) { if (input == null) return null; // Your custom logic here return input.toUpperCase(); } }
The class must extend UDF from Hadoop Hive.
The method evaluate is where you write your code. It can have different input types and return types.
import org.apache.hadoop.hive.ql.exec.UDF; public class ToUpperCaseUDF extends UDF { public String evaluate(String input) { if (input == null) return null; return input.toUpperCase(); } }
import org.apache.hadoop.hive.ql.exec.UDF; public class AddPrefixUDF extends UDF { public String evaluate(String input) { if (input == null) return null; return "prefix_" + input; } }
import org.apache.hadoop.hive.ql.exec.UDF; public class SafeDivideUDF extends UDF { public Double evaluate(Double numerator, Double denominator) { if (denominator == null || denominator == 0) return null; return numerator / denominator; } }
import org.apache.hadoop.hive.ql.exec.UDF; public class NullInputUDF extends UDF { public String evaluate(String input) { if (input == null) return "empty"; return input; } }
This program defines a UDF that reverses a string. It then tests the UDF with a normal string and a null input, printing the results.
import org.apache.hadoop.hive.ql.exec.UDF; public class ReverseStringUDF extends UDF { public String evaluate(String input) { if (input == null) return null; return new StringBuilder(input).reverse().toString(); } } // Sample usage in Hive: // CREATE TEMPORARY FUNCTION reverse_string AS 'ReverseStringUDF'; // SELECT reverse_string(name) FROM users; class TestReverseStringUDF { public static void main(String[] args) { ReverseStringUDF reverseStringUDF = new ReverseStringUDF(); String original = "hadoop"; System.out.println("Original: " + original); String reversed = reverseStringUDF.evaluate(original); System.out.println("Reversed: " + reversed); String nullInput = null; System.out.println("Null input reversed: " + reverseStringUDF.evaluate(nullInput)); } }
Time complexity is usually O(n) where n is the input size, depending on your logic.
Space complexity depends on what you store; simple UDFs use little extra space.
Common mistake: forgetting to handle null inputs, which can cause errors.
Use UDFs when built-in functions do not meet your needs or for reusable custom logic.
User-defined functions let you add your own data processing steps in Hadoop.
They must extend the UDF class and implement an evaluate method.
Always handle null inputs and test your UDF with different cases.