How to Use ProcessPoolExecutor in Python for Parallel Processing
Use
concurrent.futures.ProcessPoolExecutor to run functions in parallel across multiple CPU cores by submitting tasks with submit() or mapping with map(). It creates separate processes, making it ideal for CPU-bound tasks.Syntax
The basic syntax to use ProcessPoolExecutor involves creating an executor object and submitting tasks to it. You can use submit() to schedule a single function call or map() to apply a function to many inputs.
ProcessPoolExecutor(max_workers=None): Creates a pool of worker processes.max_workerssets how many processes run in parallel (default is number of CPU cores).submit(fn, *args, **kwargs): Schedulesfnto run with given arguments, returns aFutureobject.map(fn, *iterables): Runsfnon each item from the iterables, returns results in order.shutdown(wait=True): Cleans up the executor, waiting for tasks to finish ifwaitis True.
python
from concurrent.futures import ProcessPoolExecutor def task(x): return x * x with ProcessPoolExecutor(max_workers=4) as executor: future = executor.submit(task, 5) # Run task(5) in a separate process result = future.result() # Wait and get the result print(result) results = list(executor.map(task, [1, 2, 3, 4])) # Run task on multiple inputs print(results)
Output
25
[1, 4, 9, 16]
Example
This example shows how to use ProcessPoolExecutor to calculate squares of numbers in parallel. It demonstrates submitting single tasks and mapping a function over a list.
python
from concurrent.futures import ProcessPoolExecutor import time def square(n): time.sleep(1) # Simulate a time-consuming task return n * n if __name__ == '__main__': with ProcessPoolExecutor(max_workers=3) as executor: # Submit single tasks futures = [executor.submit(square, i) for i in range(5)] for future in futures: print(future.result()) # Use map to process multiple inputs results = list(executor.map(square, range(5, 10))) print(results)
Output
0
1
4
9
16
[25, 36, 49, 64, 81]
Common Pitfalls
Common mistakes when using ProcessPoolExecutor include:
- Not protecting the entry point with
if __name__ == '__main__':on Windows, which causes infinite process spawning. - Passing non-picklable objects (like open files or lambda functions) to worker processes, which raises errors.
- Expecting shared memory between processes; each process has its own memory space.
- Not calling
shutdown()or usingwithblock, which can leave processes running.
Example of a wrong and right way to protect the main block:
python
from concurrent.futures import ProcessPoolExecutor def f(x): return x * 2 # Wrong way (may cause issues on Windows) # executor = ProcessPoolExecutor() # future = executor.submit(f, 10) # print(future.result()) # Right way if __name__ == '__main__': with ProcessPoolExecutor() as executor: future = executor.submit(f, 10) print(future.result())
Output
20
Quick Reference
| Method | Description |
|---|---|
| ProcessPoolExecutor(max_workers=None) | Create a pool of worker processes |
| submit(fn, *args, **kwargs) | Schedule a single function call, returns a Future |
| map(fn, *iterables) | Apply function to each item in iterables, returns results in order |
| shutdown(wait=True) | Clean up the executor, optionally wait for tasks to finish |
| Future.result() | Get the result of the task, blocking if needed |
Key Takeaways
Use ProcessPoolExecutor to run CPU-bound tasks in parallel using multiple processes.
Always protect your code with if __name__ == '__main__' when using ProcessPoolExecutor on Windows.
Submit tasks with submit() for single calls or map() for multiple inputs.
Avoid passing non-picklable objects to worker processes.
Use the with statement to ensure proper cleanup of processes.