What is Container in YARN in Hadoop: Explained Simply
container is a resource allocation unit that bundles CPU, memory, and other resources for running a specific task. It acts like a small workspace where YARN runs application code, managing resources efficiently across the cluster.How It Works
Think of a container in YARN as a reserved workspace in a shared office. When you have a task to do, YARN allocates a container with the right amount of resources like CPU and memory. This container is where your task runs isolated from others, ensuring it has what it needs without interference.
YARN's ResourceManager decides how many containers to give based on the cluster's available resources and the application's needs. Each container runs on a NodeManager, which manages containers on a single machine. This setup helps Hadoop run many tasks in parallel efficiently, balancing load and resource use.
Example
import org.apache.hadoop.yarn.api.records.ContainerLaunchContext; import org.apache.hadoop.yarn.api.records.Resource; import org.apache.hadoop.yarn.client.api.YarnClient; import org.apache.hadoop.yarn.client.api.YarnClientApplication; import org.apache.hadoop.yarn.util.Records; public class YarnContainerExample { public static void main(String[] args) throws Exception { YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(new org.apache.hadoop.conf.Configuration()); yarnClient.start(); YarnClientApplication app = yarnClient.createApplication(); Resource capability = Records.newRecord(Resource.class); capability.setMemory(1024); // 1 GB memory capability.setVirtualCores(1); // 1 CPU core ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class); ctx.setCommands(java.util.Collections.singletonList("echo Hello from YARN container")); System.out.println("Requested container with 1GB memory and 1 CPU core."); System.out.println("Container will run command: echo Hello from YARN container"); yarnClient.stop(); } }
When to Use
Use containers in YARN when you want to run distributed tasks on a Hadoop cluster efficiently. Containers help manage resources like CPU and memory for each task, preventing one job from using too much and slowing others down.
Real-world uses include running MapReduce jobs, Spark applications, or any big data processing tasks that need controlled resource allocation. Containers make sure your tasks run smoothly and the cluster stays balanced.
Key Points
- A container is a resource bundle (CPU, memory) for running a task in YARN.
- YARN’s ResourceManager allocates containers based on resource availability.
- Containers run on NodeManagers, which manage resources on each node.
- Containers isolate tasks to prevent resource conflicts.
- They enable efficient, parallel processing in Hadoop clusters.