0
0
PythonHow-ToBeginner · 4 min read

How to Use ProcessPoolExecutor in Python for Parallel Processing

Use concurrent.futures.ProcessPoolExecutor to run functions in parallel across multiple CPU cores by submitting tasks with submit() or mapping with map(). It creates separate processes, making it ideal for CPU-bound tasks.
📐

Syntax

The basic syntax to use ProcessPoolExecutor involves creating an executor object and submitting tasks to it. You can use submit() to schedule a single function call or map() to apply a function to many inputs.

  • ProcessPoolExecutor(max_workers=None): Creates a pool of worker processes. max_workers sets how many processes run in parallel (default is number of CPU cores).
  • submit(fn, *args, **kwargs): Schedules fn to run with given arguments, returns a Future object.
  • map(fn, *iterables): Runs fn on each item from the iterables, returns results in order.
  • shutdown(wait=True): Cleans up the executor, waiting for tasks to finish if wait is True.
python
from concurrent.futures import ProcessPoolExecutor

def task(x):
    return x * x

with ProcessPoolExecutor(max_workers=4) as executor:
    future = executor.submit(task, 5)  # Run task(5) in a separate process
    result = future.result()  # Wait and get the result
    print(result)

    results = list(executor.map(task, [1, 2, 3, 4]))  # Run task on multiple inputs
    print(results)
Output
25 [1, 4, 9, 16]
💻

Example

This example shows how to use ProcessPoolExecutor to calculate squares of numbers in parallel. It demonstrates submitting single tasks and mapping a function over a list.

python
from concurrent.futures import ProcessPoolExecutor
import time

def square(n):
    time.sleep(1)  # Simulate a time-consuming task
    return n * n

if __name__ == '__main__':
    with ProcessPoolExecutor(max_workers=3) as executor:
        # Submit single tasks
        futures = [executor.submit(square, i) for i in range(5)]
        for future in futures:
            print(future.result())

        # Use map to process multiple inputs
        results = list(executor.map(square, range(5, 10)))
        print(results)
Output
0 1 4 9 16 [25, 36, 49, 64, 81]
⚠️

Common Pitfalls

Common mistakes when using ProcessPoolExecutor include:

  • Not protecting the entry point with if __name__ == '__main__': on Windows, which causes infinite process spawning.
  • Passing non-picklable objects (like open files or lambda functions) to worker processes, which raises errors.
  • Expecting shared memory between processes; each process has its own memory space.
  • Not calling shutdown() or using with block, which can leave processes running.

Example of a wrong and right way to protect the main block:

python
from concurrent.futures import ProcessPoolExecutor

def f(x):
    return x * 2

# Wrong way (may cause issues on Windows)
# executor = ProcessPoolExecutor()
# future = executor.submit(f, 10)
# print(future.result())

# Right way
if __name__ == '__main__':
    with ProcessPoolExecutor() as executor:
        future = executor.submit(f, 10)
        print(future.result())
Output
20
📊

Quick Reference

MethodDescription
ProcessPoolExecutor(max_workers=None)Create a pool of worker processes
submit(fn, *args, **kwargs)Schedule a single function call, returns a Future
map(fn, *iterables)Apply function to each item in iterables, returns results in order
shutdown(wait=True)Clean up the executor, optionally wait for tasks to finish
Future.result()Get the result of the task, blocking if needed

Key Takeaways

Use ProcessPoolExecutor to run CPU-bound tasks in parallel using multiple processes.
Always protect your code with if __name__ == '__main__' when using ProcessPoolExecutor on Windows.
Submit tasks with submit() for single calls or map() for multiple inputs.
Avoid passing non-picklable objects to worker processes.
Use the with statement to ensure proper cleanup of processes.