Python Optimizations

Python OptimizationsInteringGuillermo Martínez EspinaBlockedUnblockFollowFollowingJan 28Have you ever heard about the concept of interning in Python?Interning is the way Python optimizes the memory used by a program by referencing the same space in memory of a numeric (with some exceptions) or string variable with the same data as a previous created variable.

We will dig in a bit deeper in a while.

VariablesBefore we start digging a bit deeper about interning I would like just to recap what a variable is.

A variable is just a pointer to a space in memory where some object is stored.

For instance, if you declare a variable assigning to it an integer, the variable will be just a pointer to the space in memory where that integer is saved.

When you declare a variable x = 5 , it doesn’t actually mean that x is literally 5 , what it happens is that the value of xis some address where the number 5 is stored.

If you write the following code:x, y = 5, 500# hex(id(x)) – returns the memory address where the variable is pointing to in hexadecimalprint("The id for x is: {}".

format(hex(id(x))))print("The id for y is: {}".

format(hex(id(y))))In my case it printed:The id for x is: 0x106d7d060The id for y is: 0x105897f90If you compile the code above you will notice that the variable x will have the same address in your computer but the variable y won't.

I will explain why this happens in the next section.

So now that we know that variables are pointers to memory addresses we can take a deeper look about interning and how it Python handles it.

Integers interningBasically what Python does with integers is that it automatically saves the most common integers in memory, those integers are between [-5, 256].

Whenever you declare a variable with a number between that range then Python will just point to that pre allocated space in memory, for instance, if you have this code:a = 5b = 5c = ad = bprint("The id for a is: {}".

format(hex(id(a))))print("The id for b is: {}".

format(hex(id(b))))print("The id for c is: {}".

format(hex(id(c))))print("The id for d is: {}".

format(hex(id(d))))The output will be this one:The id for a is: 0x105e13060The id for b is: 0x105e13060The id for c is: 0x105e13060The id for d is: 0x105e13060As you can see all 4 variable point to the same address, but what happens if you change the value of one of those variables like this:a = 5b = aprint("The id for a is: {}".

format(hex(id(a))))print("The id for b is: {}".

format(hex(id(b))))b = 340print("The id for b is: {}".

format(hex(id(b))))Then it will first printThe id for a is: 0x105e13060The id for b is: 0x105e13060And then it will allocate space in memory to store that 340 and it will make the variable point to that address.

In my case the output was the following:The id for b is: 0x105525fd0If we had another variable with the same value 340 then Python would make that new variable point to the same address to reuse that space in memory.

String interningString intering is almost the same as integer intering.

What Python does is that each time a new variable is declared if the string is the exact same string as a previous one then it will just assign the new variable the address of the previous created string.

a = 'hello_world'b = 'hello_world'c = 'Hello_world'd = 'hello world this is a long long string that python will intern to make the app work faster'e = 'hello world this is a long long string that python will intern to make the app work faster'print("The id for a is: {}".

format(hex(id(a))))print("The id for b is: {}".

format(hex(id(b))))print("The id for c is: {}".

format(hex(id(c))))print("The id for d is: {}".

format(hex(id(d))))print("The id for e is: {}".

format(hex(id(e))))This produced in my case the following output:The id for a is: 0x107702930The id for b is: 0x107702930The id for c is: 0x1077029f0The id for d is: 0x10764f930The id for e is: 0x10764f930As you can see if the strings match completely the memory address will be the same.

An advantage specifically with strings is that when comparing two strings instead of comparing them with the == operator you can use the keyword is.

The difference relay that the == will compare character with character and if every character of the first string matches the characters of second string then it will return true, but with the is keyword it will just compare the memory address, which is much faster.

Maybe in small texts you wouldn’t even notice a difference, but if you had to work with a really large text then you could start seeing a big difference.

import timedef compare_strings(): d = 'hello world this is a long long string that python will intern to make the code work faster'*10000 e = 'hello world this is a long long string that python will intern to make the code work faster'*10000 for i in range(100000): if d == e: passstart = time.

perf_counter()compare_strings()end = time.

perf_counter()print("Elapsed time {}".

format(end-start))In the previous code we are comparing two strings using == and in my computer it took 4.

73 seconds, but in this case knowing that the strings are the same and knowing that python would intern them we could use theis keyword to make the comparison like this:import timedef compare_strings(): d = 'hello world this is a long long string'*10000 e = 'hello world this is a long long string'*10000 for i in range(100000): if d is e: passstart = time.

perf_counter()compare_strings()end = time.

perf_counter()print("Elapsed time {}".

format(end-start))In this case the elapsed time on my computer was just 0.

004 seconds, so it does make a huge difference when comparing really long strings.

ConclusionIntering is a way to optimize a program by making references to the same memory address.

Specially string intering is helpful because you can optimize your code a lot changing a simple statement.

An application of string intering would be natural language processing, where you have to make a lot of evaluations of words and phrases.

.