Python code speed testing

Python code speed testing

Often, when writing code, we have to choose between multiple approaches. For example, one option might rely on for loops and dictionaries, while another uses list comprehensions and functional programming concepts. Both approaches yield the correct result, but their execution speeds can vary significantly.

The best solution here is to run performance tests, or at least do some basic benchmarking.

Consider another scenario: a program is already deployed, we know a specific part is running slowly, and we want to optimize it. After writing the new code, we once again need to compare the execution speed of two or three different options.

Python provides the built-in timeit module for this exact purpose, which is designed to test the performance of small code snippets.
There are several ways to use this module, including directly from the command line. I'll show you the approach I use regularly, along with some of its features and common pitfalls.

For our example, let's look at two ways to calculate the sum of a list's elements: the first uses a standard for loop, and the second uses the built-in sum() function:

lst = [1, 2, 3, 4, 5]

def f1(numbers):
    total = 0
    for value in numbers:
        total += value
    return total

def f2(numbers):
    return sum(numbers)

The second option is clearly more concise, but does it actually run faster?
To run the tests, you first need to wrap both logic variants inside functions—in this case, f1() and f2(). I've also defined a five-element list at the top of the script.

Next, add the following statement below the functions:

assert f1(lst) == f2(lst)

This assert statement does a dry run of the functions and verifies that their outputs match. If the results are identical, nothing happens, and the program simply continues.

If the results don't match, it raises an AssertionError and halts the script. We add this check to ensure that optimizing a function doesn't actually break its logic. It's common to tweak something and suddenly have a function run twice as fast, but yield the wrong result. It's best to catch these issues right away.

Finally, add the benchmarking code:

import timeit
print(timeit.timeit('f1(lst)', globals=globals()))
print(timeit.timeit('f2(lst)', globals=globals()))

Here, we import the timeit module and call its timeit function to calculate the performance.

The function's first parameter is the code snippet we want to test, passed as a string. Essentially, it will execute f1() and f2() using the lst list.

Because we are passing the code as a string, timeit needs access to the definitions of f1, f2, and lst. To provide this access, we pass the global namespace using the globals argument. This ensures timeit knows about the functions and variables it needs to run.

Let's run the script in our IDE:

0.175016834
0.11417374999999999

As you can see, the for loop approach is roughly 1.5 times slower than using sum(). These results are from Python 3.9.

However, if you run the exact same code in Python 3.14, the second function becomes almost twice as fast as the first.

Moreover, both methods run significantly faster overall compared to Python 3.9:

0.10523716604802758
0.0612475840607658

This shows that to maximize performance, you should not only prefer the built-in function but also ensure you're using the latest version of Python.

How does timeit work?

Let's get back to timeit and look at how it actually works. By default, it takes your code snippet, runs it one million times in a row, and calculates the total execution time.

However, one million iterations isn't always practical. Sometimes, code runs so efficiently that even a million passes are nearly instantaneous. In our example, the functions took fractions of a second to complete entirely.

I prefer aiming for a test duration of roughly 1 to 3 seconds. To achieve this, you can pass the number parameter to timeit to control the exact number of iterations:

print(timeit.timeit('f1(lst)', globals=globals(), number=10_000_000))
print(timeit.timeit('f2(lst)', globals=globals(), number=10_000_000))

After running this updated code, our execution times fall right around the one-second mark:

1.430678042001091
0.9172235419973731

Testing Extreme Data Sizes

When testing performance, it's crucial to check how your functions handle data of varying sizes. If your input list might range from 5 to 500 elements, you should definitely benchmark the upper limits. You can do this by generating a larger list at the top of your script using range():

lst = list(range(1, 501))

If you run the benchmark code as-is now, the script will seem to freeze. That's because running 10 million iterations on a 500-element list takes roughly 100 times longer than on a 5-element list.

You'll likely need to manually abort the execution and drop the number parameter back down to 1 million. Often, you have to tweak the number argument a few times to find the sweet spot for execution time—not too long, but not too short.

This time, the test completes properly. Notice that with the larger dataset, the second function runs almost 5 times faster than the first:

7.006657332996838
1.359301874996163

This proves that the scale of the input data matters significantly, even though both functions technically have an O(N) time complexity.

Using print() Inside timeit

Now, let's see what happens if we replace return with print() inside our test functions:

def f1(numbers):
    total = 0
    for value in numbers:
        total += value
    print(total)

def f2(numbers):
    print(sum(numbers))

Even if we reduce the number of test iterations by a factor of 10, running this code will flood your console with the printed output—100,000 times for each function.

This is the first reason why you shouldn't use print() statements inside performance tests: the sheer volume of console spam makes it impossible to cleanly read the benchmarking results.

The second reason is that calling print() introduces massive I/O overhead.

In other words, the standard output mechanism isn't the actual algorithmic bottleneck you are trying to measure, but pushing text to the console artificially slows down the entire test.

Even if your final production code requires terminal output, always replace it with a return statement when benchmarking.

Testing File I/O with timeit

You also need to be careful when measuring file operations. Disk I/O—reading from and writing to files—is orders of magnitude slower than operations done in RAM. Reading a list strictly in memory is entirely different from reading a file from a disk, and writing to a disk is even more expensive.

When testing file handling operations, you should drastically reduce the number of iterations, normally down to 1,000 runs or even just 100.

For these types of slow operations, 100 passes are usually more than enough to accurately compare different code approaches.

Final Code

# Data
lst = list(range(1, 501))

# Functions
def f1(numbers):
    total = 0
    for value in numbers:
        total += value
    return total

def f2(numbers):
    return sum(numbers)

# Verify that functions yield identical results
assert f1(lst) == f2(lst)

# Benchmark execution speed
import timeit
print(timeit.timeit('f1(lst)', globals=globals(), number=1_000_000))
print(timeit.timeit('f2(lst)', globals=globals(), number=1_000_000))

Author

Nikita Shultais

Nikita Shultais

Professional web developer with 10+ years of commercial development experience. Teacher, author of IT courses and articles.

  • Full-stack developer specializing in Python/Django
  • Author of Python courses, SQL, and Algorithms
  • Participant in math and programming olympiads
  • Has taught IT skills to over 5,000 students
  • Winner of the Edcrunch Award for educational projects
  • Contributing author for Linux Format magazine