Beginning Python Programming — Part 14An introduction to multi-threadingBob RoeblingBlockedUnblockFollowFollowingJun 20Photo by Franck V.
on UnsplashIn part 13 of Beginning Python Programming, we covered asyncio and just hit the surface of asynchronous code.
Today, we are going to keep moving in the direction of async by looking at another method called multi-threading.
If you haven’t read the previous piece, I strongly suggest you read the introduction as a primer.
Let’s zoom out for a second and think about the big picture again.
Process — an individual program that runs to perform work.
, Google Chrome, Firefox)Thread — a worker queue to which a program sends tasks.
Thread Queue — a list of instructions that the processor will handle in a first-in-first-out (FIFO) way.
(Think of a factory line)Stack — all of the expressions you have called that will send an instruction to a thread queue.
These are created in a last-in-first-out (LIFO) way.
(Think of how you would tear down a tower of blocks one by one without knocking them down)Memory Space — the space in memory that is used by a single process, all data stored in a memory space can be accessed by a process, and all of the threads the process owns.
Let’s dive in.
ThreadingWhile asyncio might be suitable for web servers, sometimes it is not the best tool for the job.
If you refer back to the previous article, asyncio is useful for when you have many connections to multiple resources (aka web server routes).
But what happens when you need a few connections to a single resource, such as manipulating files on a hard drive simultaneously?Threads can share memory and resources because they belong to (are owned by) the same process.
Processes cannot share memory or resources because they belong to their own memory space.
If you do need to share data between two processes, then you need to store this data in a database or a cache.
This is where the threading library comes in to play.
threading spawns processes on different threads.
Not only can we use threading to perform simultaneous file manipulations, but we can also use it to perform multiple download requests or numerous API calls.
One more thing to cover before we go into an example is something called a race condition.
Race conditions are when two separate processes or threads try to manipulate the same data at the same time.
When this happens, one process may get unexpected data.
If you read my previous article and ran the last example, you’d see a race condition in the console output.
Some lines seemed to be merged.
While it didn’t crash our program, in many situations your program will crash.
When using the threading library, it is important that you keep track of what is being modified and when.
If it helps, write it down on a sheet of paper.
Let’s look at a simple example:Here we import threading at the top of the file to use the library.
Then we create a function that returns a list that contains every value between a min and max argument that we pass in.
To better visualize this, we also include a thread argument where we will pass in the name of the thread that is currently running.
In our count function, we first print out that the thread has started.
We create a this_list variable to hold our values in and initialize our index variable to 0.
We then loop through all of the values using our min and max values for the range and insert them into the array while incrementing our index.
Then we print that the thread is completed before adding my_list the outer result variable for printing later.
Next, we create our two threads and provide each with a target of count.
Target is what we want to do on this thread.
Since count takes three arguments, we also pass in three arguments as a tuple to the args keyword.
This is where the two threads differ.
One counts from zero to ten million while the other counts from zero to one million.
In my testing, these were the ideal numbers to prove that these two threads run at the same time without taking too long to do it.
Feel free to add or remove a zero from each if it is too quick or taking too long on your computer.
Just because we create the threads doesn’t mean they start, we have to call start() on each thread for it to run.
This helps with race conditions.
Sometimes you might want to enumerate all of the threads you need to use before starting them all at once (e.
storing the threads in their own list then looping through the list calling start on each).
Finally, we use thread.
join() to block the main queue to ensure we have our results.
Then we print the length of the first list inside of our results list.
Please note that if you try to use multi-threading on single-core CPUs, you will not see a performance boost.
Instead, you might see a performance drop from the overhead of a single processor trying to handle the synchronization of your async code.
Photo by Raphaël Biscaldi on UnsplashThreadPoolExecutorSometimes you need to customize how many threads should be used for a task, such as limiting the number of concurrent threads that should be used over a series of tasks.
To illustrate this, look at the following example, which is just a refactor of the example above:Here we end up creating four different threads to process the same data.
I just decided to use loops to manage the complexity of re-writing the same code over and over again (DRY principle).
Let’s say we have multiple tasks that need to run on a background thread.
Each of these tasks requires some preliminary information that must be completed before the respective task begins.
A basic web crawler would be a great example.
On small websites, we might be able to get by with threading but on large sites, we definitely need to limit the number of threads we are using so we don’t run into performance issues.
You can only go so fast.
Here is a refactor using ThreadPoolExecutor based on an example found here:We have a few more imports at the top.
We include threading because we need to get the active_count() of threads when our program starts and we need to be able to figure out which thread we are currently using for our task.
We include time because we want to calculate the run time of the application when we adjust how many threads we are using.
Last, we use from concurrent.
futures import ThreadPoolExecutor this allows us to import the ThreadPoolExecutor exclusively, instead of the entire concurrent library.
This is what we will call to set up an executor later to run all of our tasks.
I start off creating a list of numbers that will contain values between 100,000 and 1,000,000 in increments of 10,000.
We will use this to pass consistent values to each task between runs.
I initially started off using randint from the random library, but realized that one run might get nothing but low values while another might get nothing but high values.
This would skew our results.
We then create our task.
This is similar to the count function above except we pass in the number we want to use as our max number.
We let everyone know we started a task on a new thread and provide the name.
Since the name is ThreadPoolExecutor-x-y, where x is the current thread pool and yis the current thread, I only use the last three values of the string.
We initialize result to 0 and start our loop to sum all of the numbers.
We print the result before printing that the current_thread() has finished.
Our main function comes next.
In it, we first create an executor object from ThreadPoolExecutor passing in the maximum number of workers we wish to use.
We can also do this by using the named parameter max_workers=10.
If you leave this blank, you end up with the number of processors on the machine multiplied by five.
On my computer, I have 8 logical processors * 5 threads = 40 threads.
Normally, we’d call executor.
submit(task) for each task, but in this scenario, we are passing a variable to the function, in this case, we are using executor.
map(task, numbers) to do all of this work for us.
What happens is the Interpreter first checks to get the length of numbers then for each number in numbers it passes the value to the number parameter of a task.
Once it has finished, it will then submit the task using the executor.
We then have our if __name__ == "__main__” statement, which checks to see if this file was invoked.
When I initially created this program, I didn’t include the next line, which gets the current number of active threads, and my program never finished.
When I printed the number of active threads, I found that I was already using three threads for this program.
Because of this, I needed to get the active number of threads so I could keep the program from running the time elapsed line at the bottom immediately.
Think of this as a basic semaphore, which is used for ensuring synchronization of multiple threads.
Python has these built-in, but the example is more comfortable to digest this way.
Moving down, I get the current time so we have something to calculate the elapsed time against later.
Then I call main() which kicks off the business logic of our application.
Once all of the work is done, we determine the elapsed time in seconds and print it out to the screen.
After some testing with various numbers, I did some data science on the numbers to show a correlation between the number of threads used and the amount of time it took to run.
Run times (in seconds) for the total number of threads used.
In the graph above, you can see how we start at six seconds to run the example above with one thread.
As we increase the thread count, we quickly see our run time shrink.
At four threads, we see a worse time, I ran this example several times to ensure there wasn’t an anomaly, and I kept getting consistent results.
As we continued to increase the number of threads we saw steady boosts in milliseconds before leveling out between 10 and 30 threads.
Jumping up to 50, we see another boost, and at 100 threads, we see the best run time.
Using 100 threads is not ideal, so we should back off and consider something in the range of 30 to 50.
Interesting enough, when we pass 1000 threads, our time got worse.
The most likely reason for this is the amount of time it took to spin up all 1000 threads.
More is not better in this situation.
Finally, I use the default number of worker threads by not passing in any number to ThreadPoolExecutor() , and we see 3.
This is worse than 50 threads, but in our case, it seems to work out just fine.
Depending on how many thread pools I wanted to create, I might use 40–50 as a baseline for splitting up the number of worker threads.
I’d also refer back to this graph to determine the best number of threads to use per pool.
If I needed three pools, I might give each ten worker threads.
If I used 12 pools, I might opt for three threads per pool instead of four because I had a better run time at three.
From another perspective, I could provide more threads to pools that require more work to be done and fewer to those that do basic tasks.
I could also allocate threads by priority assigning more to tasks that the user has to wait on while tasks that run as daemons, or background tasks run with one or two threads.
Referring to the chart above, I could use three threads for all of my background tasks while I could use 5 or 10 for all tasks initiated by the user.
I’d need to ensure that I keep this under a more global limit which could either be 50 or 100.
SummaryWe had an introduction to multi-threading in Python using the threading module and the concurrent module.
We briefly looked at how to create and manage threads and how to wait on our threads to finish before continuing the execution of our program.
There are plenty more details about threading and thread pools that I didn’t cover here.
I just wanted you to get your feet wet in threading, so you don’t feel lost while going through the documentation.
Suggested ReadingThreading from the Python docs.
threading — Thread-based parallelism — Python 3.
4rc1 documentationReturn the ‘thread identifier’ of the current thread.
This is a nonzero integer.
Its value has no direct meaning; it is…docs.
orgConcurrent Futures from the Python docs.
Don’t worry about Multi-processing just yet.
futures — Launching parallel tasks — Python 3.
4rc1 documentationWhen using , this method chops iterables into a number of chunks which it submits to the pool as separate tasks.
orgWhat’s Next?Multi-processing is next.
Multi-processing is like multi-threading except we use multiple cores to perform work.
We will also cover differences between asyncio, threading, and multi-processing near the end.
It is crucial to understand which one to use for each task.
Until then, keep practicing.