0
0
HadoopConceptBeginner · 4 min read

What is Container in YARN in Hadoop: Explained Simply

In Hadoop YARN, a container is a resource allocation unit that bundles CPU, memory, and other resources for running a specific task. It acts like a small workspace where YARN runs application code, managing resources efficiently across the cluster.
⚙️

How It Works

Think of a container in YARN as a reserved workspace in a shared office. When you have a task to do, YARN allocates a container with the right amount of resources like CPU and memory. This container is where your task runs isolated from others, ensuring it has what it needs without interference.

YARN's ResourceManager decides how many containers to give based on the cluster's available resources and the application's needs. Each container runs on a NodeManager, which manages containers on a single machine. This setup helps Hadoop run many tasks in parallel efficiently, balancing load and resource use.

💻

Example

This example shows how YARN allocates a container for a simple task using the YARN client API in Java. It requests a container with specific memory and CPU, then launches a command inside it.
java
import org.apache.hadoop.yarn.api.records.ContainerLaunchContext;
import org.apache.hadoop.yarn.api.records.Resource;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.client.api.YarnClientApplication;
import org.apache.hadoop.yarn.util.Records;

public class YarnContainerExample {
    public static void main(String[] args) throws Exception {
        YarnClient yarnClient = YarnClient.createYarnClient();
        yarnClient.init(new org.apache.hadoop.conf.Configuration());
        yarnClient.start();

        YarnClientApplication app = yarnClient.createApplication();
        Resource capability = Records.newRecord(Resource.class);
        capability.setMemory(1024); // 1 GB memory
        capability.setVirtualCores(1); // 1 CPU core

        ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class);
        ctx.setCommands(java.util.Collections.singletonList("echo Hello from YARN container"));

        System.out.println("Requested container with 1GB memory and 1 CPU core.");
        System.out.println("Container will run command: echo Hello from YARN container");

        yarnClient.stop();
    }
}
Output
Requested container with 1GB memory and 1 CPU core. Container will run command: echo Hello from YARN container
🎯

When to Use

Use containers in YARN when you want to run distributed tasks on a Hadoop cluster efficiently. Containers help manage resources like CPU and memory for each task, preventing one job from using too much and slowing others down.

Real-world uses include running MapReduce jobs, Spark applications, or any big data processing tasks that need controlled resource allocation. Containers make sure your tasks run smoothly and the cluster stays balanced.

Key Points

  • A container is a resource bundle (CPU, memory) for running a task in YARN.
  • YARN’s ResourceManager allocates containers based on resource availability.
  • Containers run on NodeManagers, which manage resources on each node.
  • Containers isolate tasks to prevent resource conflicts.
  • They enable efficient, parallel processing in Hadoop clusters.

Key Takeaways

A container in YARN is a reserved set of resources for running a specific task.
Containers isolate tasks to ensure fair and efficient resource use in a Hadoop cluster.
ResourceManager and NodeManager work together to allocate and manage containers.
Containers are essential for running distributed big data jobs like MapReduce and Spark.
Using containers helps keep cluster resources balanced and tasks running smoothly.