py4u guide

Python Threading and Multiprocessing: An In-Depth Tutorial

In today’s world of computing, performance and responsiveness are critical. Whether you’re building a web scraper, a data processing pipeline, or a real-time application, the ability to handle multiple tasks efficiently can make or break your software. Python, a versatile and widely used language, offers two primary tools for concurrent task execution: **threading** and **multiprocessing**. But what’s the difference between them? When should you use one over the other? And how do they interact with Python’s unique Global Interpreter Lock (GIL)? This tutorial demystifies Python’s threading and multiprocessing capabilities. We’ll start with core concepts like concurrency vs. parallelism, dive into the GIL’s role, explore practical examples for both threading and multiprocessing, and compare their use cases. By the end, you’ll be equipped to choose the right tool for your task and avoid common pitfalls.

Table of Contents

  1. Concurrency vs. Parallelism: What’s the Difference?
  2. The Global Interpreter Lock (GIL): Python’s Hidden Constraint
  3. Threading in Python: Lightweight Concurrency
  4. Multiprocessing in Python: True Parallelism
  5. Simpler Concurrency with concurrent.futures
  6. Threading vs. Multiprocessing: When to Use Which?
  7. Best Practices
  8. References

1. Concurrency vs. Parallelism: What’s the Difference?

Before diving into Python’s tools, it’s critical to distinguish between concurrency and parallelism—two often-confused terms:

  • Concurrency: Managing multiple tasks overlapping in time. Tasks may start, run, and complete in any order, but they don’t necessarily execute simultaneously. Think of a chef juggling chopping vegetables, boiling water, and seasoning a dish—tasks are interleaved but not done at the exact same time.

  • Parallelism: Executing multiple tasks simultaneously (e.g., on separate CPU cores). This requires hardware support (multiple cores/processors). Think of two chefs working side-by-side: one chops, the other boils water—tasks run in parallel.

Python supports both, but the choice between threading and multiprocessing depends on whether your task is I/O-bound (waiting for input/output, e.g., network calls, file reads) or CPU-bound (intensive computations, e.g., mathematical modeling).

2. The Global Interpreter Lock (GIL): Python’s Hidden Constraint

To understand why threading and multiprocessing behave differently in Python, we must first discuss the Global Interpreter Lock (GIL).

The GIL is a mutex (mutual exclusion lock) in CPython (Python’s default interpreter) that ensures only one thread executes Python bytecode at a time. This simplifies memory management (e.g., reference counting) but limits true parallelism for CPU-bound tasks:

  • Effect on Threading: Even with multiple threads, only one can execute Python code at a time. For CPU-bound tasks, this means threading won’t speed up execution (threads take turns running). For I/O-bound tasks, however, threads spend most of their time waiting (e.g., for a network response), so the GIL is released, allowing other threads to run.

  • Effect on Multiprocessing: Each process gets its own Python interpreter and memory space, bypassing the GIL. Thus, multiprocessing enables true parallelism for CPU-bound tasks.

Key Takeaway: The GIL is a per-interpreter lock. Threads share the same interpreter (and GIL), while processes do not.

3. Threading in Python: Lightweight Concurrency

Threading is ideal for I/O-bound tasks (e.g., web scraping, API calls) where tasks spend little time using the CPU and much time waiting. Python’s threading module provides a high-level interface for creating and managing threads.

3.1 The threading Module

The threading module simplifies thread management with the Thread class. Here’s a basic workflow:

  1. Define a task (function) to run in a thread.
  2. Create a Thread object, passing the task and arguments.
  3. Start the thread with start().
  4. Join the thread with join() to wait for it to finish (optional).

Example: Basic Thread Creation

import threading
import time

def print_numbers(name, delay):
    """Print numbers 1-5 with a delay."""
    for i in range(1, 6):
        time.sleep(delay)
        print(f"Thread {name}: {i}")

# Create threads
thread1 = threading.Thread(target=print_numbers, args=("A", 1))
thread2 = threading.Thread(target=print_numbers, args=("B", 1.5))

# Start threads
thread1.start()
thread2.start()

# Wait for threads to finish
thread1.join()
thread2.join()

print("Main thread finished.")

Output (order may vary):

Thread A: 1
Thread B: 1
Thread A: 2
Thread A: 3
Thread B: 2
Thread A: 4
Thread B: 3
Thread A: 5
Thread B: 4
Thread B: 5
Main thread finished.

Daemon Threads

By default, the main thread waits for all non-daemon threads to finish. To create a background thread that exits when the main thread exits, set daemon=True:

daemon_thread = threading.Thread(target=print_numbers, args=("Daemon", 1), daemon=True)
daemon_thread.start()
time.sleep(3)  # Main thread sleeps; daemon runs
print("Main thread exiting (daemon thread may be killed).")

3.2 Thread Safety and Race Conditions

Threads share the same memory space, so they can access shared variables. This can lead to race conditions—when multiple threads modify a shared resource simultaneously, causing unexpected behavior.

Example: Race Condition

import threading

counter = 0

def increment_counter():
    global counter
    for _ in range(100000):
        counter += 1  # Not thread-safe!

# Create 10 threads
threads = [threading.Thread(target=increment_counter) for _ in range(10)]

# Start threads
for thread in threads:
    thread.start()

# Wait for threads to finish
for thread in threads:
    thread.join()

print(f"Expected counter: 1,000,000. Actual: {counter}")  # Often less than 1M!

Why? The operation counter += 1 isn’t atomic (it’s temp = counter; temp += 1; counter = temp). If two threads read counter simultaneously, both increment to temp+1, overwriting each other’s work.

Fixing Race Conditions with Locks

Use threading.Lock to enforce mutual exclusion—only one thread can hold the lock at a time:

counter = 0
lock = threading.Lock()  # Create a lock

def safe_increment():
    global counter
    for _ in range(100000):
        with lock:  # Acquire lock; release automatically when done
            counter += 1

threads = [threading.Thread(target=safe_increment) for _ in range(10)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

print(f"Expected counter: 1,000,000. Actual: {counter}")  # Now correct!

3.3 Example: Threading for I/O-Bound Tasks

Let’s scrape multiple URLs concurrently to demonstrate threading’s value for I/O-bound tasks. We’ll use requests for HTTP calls and measure execution time.

import threading
import requests
import time

def fetch_url(url):
    """Fetch a URL and return its status code."""
    response = requests.get(url)
    return f"{url}: {response.status_code}"

# List of URLs to scrape
urls = [
    "https://www.google.com",
    "https://www.github.com",
    "https://www.python.org",
    "https://www.stackoverflow.com"
]

# Sequential execution
start = time.time()
for url in urls:
    print(fetch_url(url))
print(f"Sequential time: {time.time() - start:.2f}s")

# Threaded execution
start = time.time()
threads = []
results = []

def thread_task(url):
    results.append(fetch_url(url))

for url in urls:
    thread = threading.Thread(target=thread_task, args=(url,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

for result in results:
    print(result)
print(f"Threaded time: {time.time() - start:.2f}s")  # ~2-3x faster!

Why Faster? Threads wait for network responses, so the GIL is released, allowing other threads to fetch URLs concurrently.

4. Multiprocessing in Python: True Parallelism

Multiprocessing is ideal for CPU-bound tasks (e.g., data processing, mathematical computations) where tasks require heavy CPU usage. The multiprocessing module spawns separate processes, each with its own Python interpreter and memory space.

4.1 The multiprocessing Module

The multiprocessing module mirrors threading in many ways but uses Process instead of Thread. Key differences:

  • Processes don’t share memory by default (no race conditions, but harder to share data).
  • Higher overhead than threads (separate memory spaces, slower startup).

Example: Basic Process Creation

import multiprocessing
import time

def square_numbers(name, numbers):
    """Square a list of numbers and print results."""
    for num in numbers:
        time.sleep(0.5)
        print(f"Process {name}: {num}^2 = {num**2}")

# Split work into two processes
process1 = multiprocessing.Process(
    target=square_numbers,
    args=("A", [1, 2, 3])
)
process2 = multiprocessing.Process(
    target=square_numbers,
    args=("B", [4, 5, 6])
)

# Start processes
process1.start()
process2.start()

# Wait for processes to finish
process1.join()
process2.join()

print("Main process finished.")

4.2 Inter-Process Communication (IPC)

Since processes don’t share memory, use these tools to pass data:

  • Queue: Thread/process-safe FIFO queue for sending data between processes.
  • Pipe: Two-way communication channel between two processes.
  • Shared Memory: Value and Array for sharing primitive data types (e.g., integers, arrays).

Example: Using Queue for IPC

import multiprocessing

def producer(queue):
    """Add items to the queue."""
    for i in range(5):
        queue.put(i)
        print(f"Produced: {i}")

def consumer(queue):
    """Remove items from the queue."""
    while True:
        item = queue.get()
        if item is None:  # Sentinel to exit
            break
        print(f"Consumed: {item}")

# Create a queue
queue = multiprocessing.Queue()

# Start producer and consumer
producer_process = multiprocessing.Process(target=producer, args=(queue,))
consumer_process = multiprocessing.Process(target=consumer, args=(queue,))

producer_process.start()
consumer_process.start()

# Wait for producer to finish
producer_process.join()

# Send sentinel to consumer
queue.put(None)
consumer_process.join()

4.3 Example: Multiprocessing for CPU-Bound Tasks

Let’s compute prime numbers (a CPU-heavy task) using multiprocessing to leverage multiple cores.

import multiprocessing
import time

def is_prime(n):
    """Check if a number is prime."""
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def count_primes_in_range(start, end):
    """Count primes between start and end."""
    count = 0
    for num in range(start, end):
        if is_prime(num):
            count += 1
    return count

# Define a large range to process
start_range = 1
end_range = 1_000_000

# Sequential execution
start = time.time()
sequential_count = count_primes_in_range(start_range, end_range)
print(f"Sequential primes: {sequential_count}")
print(f"Sequential time: {time.time() - start:.2f}s")

# Multiprocessing execution (split range across CPU cores)
num_processes = multiprocessing.cpu_count()  # Use all available cores
chunk_size = (end_range - start_range) // num_processes
ranges = [
    (start_range + i * chunk_size, start_range + (i+1) * chunk_size)
    for i in range(num_processes)
]

# Create processes
processes = []
results = multiprocessing.Queue()  # To collect results

def process_task(start, end, queue):
    queue.put(count_primes_in_range(start, end))

for start, end in ranges:
    process = multiprocessing.Process(
        target=process_task,
        args=(start, end, results)
    )
    processes.append(process)
    process.start()

# Wait for processes and sum results
total_primes = 0
for _ in processes:
    total_primes += results.get()

for process in processes:
    process.join()

print(f"Multiprocessing primes: {total_primes}")
print(f"Multiprocessing time: {time.time() - start:.2f}s")  # ~3-4x faster on 4-core CPU!

5. Simpler Concurrency with concurrent.futures

The concurrent.futures module (introduced in Python 3.2) provides a high-level interface for threading and multiprocessing via ThreadPoolExecutor and ProcessPoolExecutor. These abstractions simplify task submission and result handling.

5.1 ThreadPoolExecutor

For I/O-bound tasks, ThreadPoolExecutor manages a pool of worker threads.

Example: Fetch URLs with ThreadPoolExecutor

import concurrent.futures
import requests
import time

def fetch_url(url):
    response = requests.get(url)
    return f"{url}: {response.status_code}"

urls = [
    "https://www.google.com",
    "https://www.github.com",
    "https://www.python.org",
    "https://www.stackoverflow.com"
]

start = time.time()

# Use ThreadPoolExecutor with 4 workers
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    # Map URLs to fetch_url (returns results in order)
    results = executor.map(fetch_url, urls)

for result in results:
    print(result)

print(f"ThreadPoolExecutor time: {time.time() - start:.2f}s")

5.2 ProcessPoolExecutor

For CPU-bound tasks, ProcessPoolExecutor manages a pool of worker processes.

Example: Prime Counting with ProcessPoolExecutor

import concurrent.futures
import multiprocessing
import time

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def count_primes(start, end):
    return sum(1 for num in range(start, end) if is_prime(num))

start_range, end_range = 1, 1_000_000
num_processes = multiprocessing.cpu_count()
chunk_size = (end_range - start_range) // num_processes
ranges = [
    (start_range + i * chunk_size, start_range + (i+1) * chunk_size)
    for i in range(num_processes)
]

start = time.time()

with concurrent.futures.ProcessPoolExecutor() as executor:
    # Submit tasks and collect futures
    futures = [executor.submit(count_primes, start, end) for start, end in ranges]
    # Wait for all futures to complete and sum results
    total_primes = sum(future.result() for future in concurrent.futures.as_completed(futures))

print(f"ProcessPoolExecutor primes: {total_primes}")
print(f"ProcessPoolExecutor time: {time.time() - start:.2f}s")

6. Threading vs. Multiprocessing: When to Use Which?

FactorThreadingMultiprocessing
Use CaseI/O-bound tasks (e.g., web requests)CPU-bound tasks (e.g., data processing)
ParallelismNo (GIL limits Python execution)Yes (separate interpreters)
Memory SharingShared (use locks for safety)Not shared (use IPC for communication)
OverheadLow (lightweight threads)High (separate memory spaces)
DebuggingEasier (shared memory)Harder (separate processes)

7. Best Practices

  • Avoid Global Variables in Threads: Use function arguments or thread-local storage (threading.local()) to avoid race conditions.
  • Use concurrent.futures for Simplicity: Prefer ThreadPoolExecutor/ProcessPoolExecutor over raw threading/multiprocessing for cleaner code.
  • Limit Process/Thread Count: For threads, avoid creating more than ~1000 (overhead). For processes, use os.cpu_count() to match available cores.
  • Use Locks Sparingly: Overusing locks causes bottlenecks. Design code to minimize shared state.
  • Test Both Approaches: Benchmark threading and multiprocessing for your specific task—real-world performance may surprise you!

8. References


By mastering threading and multiprocessing, you’ll unlock Python’s full potential for building fast, responsive applications. Remember: choose threading for I/O, multiprocessing for CPU, and concurrent.futures for simplicity! 🚀