Overview - Puma server configuration

What is it?

Puma is a web server designed to run Ruby on Rails applications efficiently. It handles incoming web requests and sends responses back to users. Configuring Puma means setting up how it manages threads, workers, and connections to optimize performance and reliability.

Why it matters

Without proper Puma configuration, a Rails app can become slow, unresponsive, or crash under load. Good configuration ensures the app can handle many users at once, use system resources wisely, and recover gracefully from errors. This improves user experience and keeps the app stable in real-world use.

Where it fits

Before learning Puma configuration, you should understand basic Ruby on Rails app structure and how web servers work. After mastering Puma setup, you can explore advanced deployment techniques, monitoring, and scaling Rails apps in production.

Mental Model

Core Idea

Puma configuration controls how many threads and worker processes handle web requests to balance speed, resource use, and reliability.

Think of it like...

Imagine a restaurant kitchen where chefs (workers) and assistants (threads) prepare meals. Configuring Puma is like deciding how many chefs and assistants work together to serve customers quickly without overcrowding the kitchen.

┌───────────────┐
│   Puma Server │
├───────────────┤
│ Workers (processes) │
│  ┌───────────┐  │
│  │ Thread 1  │  │
│  │ Thread 2  │  │
│  │ ...       │  │
│  └───────────┘  │
│ Workers handle requests in parallel
│ Threads inside workers handle multiple requests concurrently
└───────────────┘

Build-Up - 7 Steps

1

FoundationWhat is Puma and its role

Concept: Introducing Puma as a web server for Rails apps and its basic function.

Puma is a server that listens for web requests and sends back responses. It runs your Rails app code when users visit your site. Unlike simpler servers, Puma can handle many requests at once using threads and workers.

Result

You understand Puma is the middleman between users and your Rails app, managing multiple requests efficiently.

Knowing Puma's role helps you see why configuring it affects your app's speed and stability.

2

FoundationBasic Puma configuration file

3

IntermediateThreads vs Workers explained

4

IntermediateConfiguring Puma for production

5

IntermediateUsing environment variables in config

6

AdvancedHandling Puma worker restarts gracefully

7

ExpertPuma internal threading and event loop

Under the Hood

Puma runs as one or more worker processes. Each worker has a thread pool. When a request arrives, Puma's Reactor thread detects it and assigns it to a free worker thread. This allows multiple requests to be processed in parallel without waiting. Preloading the app before forking workers shares memory pages, saving RAM. Worker restarts use hooks to reconnect resources like databases.

Why designed this way?

Puma was designed to be fast and concurrent for Ruby apps, which are often single-threaded. Using multiple workers and threads allows better CPU and IO utilization. The Reactor pattern avoids blocking on slow network calls. Preloading reduces memory use. Alternatives like single-threaded servers or multi-threaded only servers had limitations in concurrency or stability.

┌───────────────┐
│   Master Process   │
│  (Preloads app)   │
└───────┬─────────┘
        │ forks
┌───────▼─────────┐
│   Worker Process │
│ ┌─────────────┐ │
│ │ Reactor     │ │
│ │ Thread      │ │
│ └─────┬───────┘ │
│       │ assigns │
│ ┌─────▼───────┐ │
│ │ Worker      │ │
│ │ Threads     │ │
│ └─────────────┘ │
└─────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does increasing threads always improve Puma performance? Commit to yes or no.

Common Belief:More threads always make Puma faster because it can handle more requests.

Tap to reveal reality

Quick: Does setting workers to 1 mean Puma is single-threaded? Commit to yes or no.

Common Belief:One worker means Puma runs only one thread and can't handle multiple requests at once.

Tap to reveal reality

Quick: Does preload_app! always reduce memory usage? Commit to yes or no.

Common Belief:Using preload_app! always saves memory by sharing code between workers.

Tap to reveal reality

Quick: Can Puma restart workers instantly without affecting users? Commit to yes or no.

Common Belief:Puma restarts workers immediately, dropping any ongoing requests.

Tap to reveal reality

Expert Zone

1

Puma's thread pool size should consider Ruby's Global Interpreter Lock (GIL) which limits true parallel Ruby code execution but allows IO concurrency.

2

Preloading the app before forking workers can cause issues if your app holds connections or state that must be reinitialized per worker.

3

Puma's Reactor thread is single-threaded and must not block; blocking operations should be offloaded to worker threads to maintain responsiveness.

When NOT to use

Puma is not ideal for CPU-heavy Ruby tasks because Ruby's GIL limits parallel execution. For such cases, background job processors like Sidekiq or multi-process architectures are better. Also, for simple apps with very low traffic, simpler servers like WEBrick may suffice.

Production Patterns

In production, Puma is often run behind a reverse proxy like Nginx for SSL termination and load balancing. Configurations use environment variables for flexibility. Workers are set to match CPU cores, threads tuned for expected concurrency, and preload_app! enabled for memory efficiency. Restart hooks ensure database connections are fresh after worker restarts.

Connections

Operating System Processes and Threads

Puma's workers map to OS processes and threads map to OS threads.

Understanding OS-level processes and threads clarifies how Puma manages concurrency and resource isolation.

Event-driven Programming

Puma uses an event loop (Reactor pattern) to handle IO efficiently.

Knowing event-driven design helps understand how Puma handles many connections without blocking.

Restaurant Kitchen Workflow

Puma's workers and threads are like chefs and assistants managing meal orders.

This analogy helps grasp resource allocation and concurrency in server design.

Common Pitfalls

#1Setting too many threads causing memory exhaustion

Wrong approach:threads 16, 16 workers 4

Correct approach:threads 5, 5 workers 4

Root cause:Misunderstanding that more threads always improve performance without considering memory limits.

#2Not reconnecting database after worker restart

Wrong approach:on_worker_boot do # no database reconnect end

Correct approach:on_worker_boot do ActiveRecord::Base.establish_connection end

Root cause:Ignoring that worker processes need fresh DB connections after forking.

#3Hardcoding port and threads without environment variables

Wrong approach:port 3000 threads 5, 5

Correct approach:port ENV.fetch('PORT') { 3000 } threads_count = ENV.fetch('RAILS_MAX_THREADS') { 5 }.to_i threads threads_count, threads_count

Root cause:Lack of flexibility for different deployment environments.

Key Takeaways

Puma is a multi-threaded, multi-worker web server that efficiently handles concurrent Rails requests.

Configuring threads and workers properly balances performance and resource use based on your server and app needs.

Preloading the app before forking workers saves memory but requires careful handling of connections.

Phased restarts and worker boot hooks ensure smooth deployments without downtime or errors.

Understanding Puma's internal Reactor pattern and concurrency model helps optimize and troubleshoot production apps.