Threading is a technique in programming where tasks can be run concurrently. This is particularly useful for I/O-bound tasks, where the program often has to wait for input/output operations (like network activities or disk reads/writes) to complete.
Python provides the threading module as a part of the standard library to implement multi-threading. However, it’s worth mentioning that Python’s Global Interpreter Lock (GIL) may limit the performance gain from threading, especially for CPU-bound tasks.
Understanding Threads
A thread is the smallest unit of execution in a process. They share memory space and efficiently read and write to the same variables, which differs from processes that do not share these resources.
Python’s threading
module allows for the creation and management of threads, offering synchronization primitives like locks and semaphores to coordinate or limit access to resources.
Implementing Threads in Python
The simplest way to use a Thread is to instantiate it with a target function and call start() to let it begin working.
Here’s an example of a simple Python program with threads:
import threading
import time
def print_square(num):
time.sleep(1) # simulate a delay
print("Square: %s" % (num ** 2))
def print_cube(num):
time.sleep(1) # simulate a delay
print("Cube: %s" % (num ** 3))
t1 = threading.Thread(target=print_square, args=(10,))
t2 = threading.Thread(target=print_cube, args=(10,))
t1.start() # start thread 1
t2.start() # start thread 2
t1.join() # wait until thread 1 is completely executed
t2.join() # wait until thread 2 is completely executed
print("Completed!")
In the example above, we’re using two threads to calculate and print the square and cube of a number concurrently.
Use Cases for Threads
Threads can be especially useful in scenarios where you have a lot of I/O-bound tasks, like:
- When you’re fetching data from multiple URLs, where you spend a lot of time waiting for server responses.
- When you’re reading or writing many files, where you spend time waiting for disk I/O.
The advantage of threading is that it allows your program to continue doing other work while waiting for these I/O operations to complete.
Let’s consider a real-world example of fetching data from multiple URLs:
import threading
import urllib.request
def fetch_url(url):
response = urllib.request.urlopen(url)
print(f"{url}: {response.status}")
urls = [
"http://www.wikipedia.com",
"http://www.alibaba.com",
"http://www.cnn.com",
]
threads = []
for url in urls:
thread = threading.Thread(target=fetch_url, args=(url,))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
print("Completed")
This script starts a thread for each URL in the list. Each thread fetches the data from its assigned URL. The program can start fetching the next URL even before the previous one has finished, leading to potentially significant time savings.