Multi-threading API Requests in Python

Speeding up python using multi-threading

When making hundred or thousands of API calls things can quickly get really slow in a single threaded application.

No matter how well your own code runs you’ll be limited by network latency and response time of the remote server. Making 10 calls with a 1 second response is maybe OK but now try 1000. Not so fun.

For a recent project I needed to make almost 50.000 API calls and the script was taking hours to complete. Now looking into multi-threading applications was no longer an option, it was required.

Classic Single Threaded Code

This is the boilerplate way to make an API request and save the contents as a file. The code simply loops through a list of URLs to call and downloads each one as a JSON file giving it a unique name.

import requests
import uuid
url_list = ['url1', 'url2']
for url in url_list:
    html = requests.get(url, stream=True)
    file_name = uuid.uuid1()
    open(f'{file_name}.json', 'wb').write(html.content)

Multi Threaded Code

For comparison here is the same code running multi-threaded.

import requests
import uuid
from concurrent.futures import ThreadPoolExecutor, as_completed

url_list = ['url1', 'url2']

def download_file(url, file_name):
    try:
        html = requests.get(url, stream=True)
        open(f'{file_name}.json', 'wb').write(html.content)
        return html.status_code
    except requests.exceptions.RequestException as e:
       return e

def runner():
    threads= []
    with ThreadPoolExecutor(max_workers=20) as executor:
        for url in url_list:
            file_name = uuid.uuid1()
            threads.append(executor.submit(download_file, url, file_name))
           
        for task in as_completed(threads):
            print(task.result()) 
      
runner()

Breaking it down you first need to import ThreadPoolExecutor and as_completed from concurrent.futures. This is a built-in python library so no need to install anything here.

Next you must encapsulate you downloading code into its own function. The function download_file does this in the above example, this is called with the URL to download and a file name to use when saving the downloaded contents.

The main part comes in the runner() function. First create an empty list of threads.

threads = []

Then create your pool of threads with your chosen number of workers (threads). This number is up to you but for most APIs I would not go crazy here otherwise you risk being blocked by the server. For me 10 to 20 works well.

 with ThreadPoolExecutor(max_workers=20) as executor:

Next loop through your URL list and append a new thread as shown below. Here it’s clear why you need to encapsulate your download code into a function since the first argument is the name of the function you wish to run in a new thread. The arguments after that are the arguments being passed to the download function.

You can think of this as making multiple copies or forks of the downloading function and then running each one in parallel in different threads.

threads.append(executor.submit(download_file, url, file_name)

Finally we print out the return value from each thread (in this case we returned the status code fro the API call)

for task in as_completed(processes):
        print(task.result())

That’s it. Easy to implement and gives a huge speedup. In my case I ended up with this performance.

Time taken: 1357 seconds (22 minutes)
49980 files
1.03 Gb

This works out at almost 37 files a second or 2209 files per minute. This is at least a 10x improvement in performance.

The full python docs are here, https://docs.python.org/3/library/concurrent.futures.html

Join the conversation

10 Comments

AK says:

5th August 2020 at 16:55

Thanks for an excellent article. I was able to get my code working with multiple threads downloading files via API calls.

SP says:

20th August 2020 at 09:14

I am getting syntax error while using this code

1. Bob Peers says:
  
  21st August 2020 at 10:07
  
  Can you tell show me what the error is?
  
Cem says:

9th September 2020 at 11:33

Hi Bob!

Thanks for this article, this was really super duper helpful!

Just one thing I noticed – you may want to indent:
for task in as_completed(threads):
print(task.result())

This prints the return value to the console as each thread is completed.

Cheers,
C

1. Bob Peers says:
  
  14th September 2020 at 12:26
  
  Hi Cem,
  thanks for this, post is edited 👍
  Cheers,
  Bob
  
Jan says:

14th October 2020 at 21:12

Thanks a lot!
Btw, in line:
threads.append(executor.submit(download_file, url, file_name)
you are missing “)” at the end

1. Bob Peers says:
  
  23rd October 2020 at 16:15
  
  🙄 Thanks again, post is corrected.
  
Mamtha Venkatesh says:

12th November 2020 at 12:51

Hi Bob!!
Wonderful explanation, really helpful but can you tell me if I need to save all api downloaded data in a single json file(like multiple objects in a array within a json file), How can I accomplish this?

Alexander Batishchev says:

10th March 2021 at 08:14

It’s -> its

1. Bob Peers says:
  
  10th March 2021 at 08:23
  
  Fixed 🙏

Multi-threading API Requests in Python

Classic Single Threaded Code

Multi Threaded Code

Join the conversation

Leave a comment

Cancel reply