Naked Python listYasufumi TaniguchiBlockedUnblockFollowFollowingJan 14I often use Python, but I really don’t care about the way how Python works internally.
So today I focus on Python list and explain inside implementation of it.
append and list.
pop change the list size dynamically, which make them run fast in O(1) time.
Please note that list.
pop for the last item only takes constant time.
In this post, I’ll show the reason why it has become possible.
We call the data structure like Python list the dynamic array, and call normal array the static array.
This post is structured as follows.
What is dynamic array?The way how dynamic array appends or pops an itemHow to balance append and popI explain these based on MIT’ Table Doubling, Karp-Rabin lecture.
What is dynamic array?Dynamic array resizes its size when we append or pop an item.
Of course, resizing isn’t always the case.
Dynamic array adjusts the balance of append and expand, pop and shrink for efficient resizing.
Dynamic array generates a new array and copies items from an old array when an item fills up an old array.
The figure below shows an example of expanding an array.
I separate append and copy for easily understood.
Both appending an item and copying an item have the same achieve the same effect, thought.
When expanding array size, the efficiency gets worse if you increase only a little or a lot.
As shown in figure above, when you expand size by only one, it’s a waste of operations.
This is because every time you append an item to an old full array, all items in that array have to be copied to a new array.
On the other hand, if you expand size a lot, it’s a waste of space which is empty because no item is stored.
In dynamic array, when you pop an item from the state below, the dynamic array shrink its size.
From the figure above, you can see the space is wasted.
When shrinking the size, the efficiency gets worse if you decrease only a little or a lot.
If you decrease a little, wasted space will increase.
On the other hand, if decreasing a lot, you need to expand the size when you append a new item.
In next section, I’ll explain how dynamic array avoids these wastefulness.
The way how dynamic array appends or pops an item2.
1 Appending an itemFirst, let us consider the situation that appending an item or expanding the size in the dynamic array.
The dynamic array expand its size when it lacks space to append a new item.
The dynamic array use the growth factor to decides the size of new array.
Here the growth factor means how many times the size of the new array compared to the size of the old array.
Each programming language has a different growth factor.
This article of Wikipedia says the growth factor is 2 in C or Golang, and 1.
1125 in Python.
I use 2 as the growth factor in this post for simplicity.
Take a look at the time complexity that you append n items to dynamic array with the growth factor, 2.
I will describe the time complexity appending an item later.
The size of array varies below every time you append an item, assuming the initial array size is 2.
I show the time of appending and copying at the left of the array.
The time complexity of appending n items becomes as follow.
The first half of the equation above stands the cost of appending n items, and the other half stands the cost of copying.
The cost of copying equals the cost of expanding the array size.
The time complexity becomes the total number of operations, because appending an item or copying runs in O(1) time.
The equation below stands the geometric sequence, so we can deform it and the time complexity will be as follow.
I ignore constant terms during deforming.
Finally we get the time complexity of appending n items to a dynamic array as O(n) time.
This means appending items to dynamic array runs in linear time while expanding its size at the same time.
We can think appending an item runs in linear time on average because the time complexity of appending n items is O(n).
This kind of time complexity is called amortized complexity time.
Also, the way to analyze the whole operation rather than each operation is called amortized analysis.
By amortized analysis, list.
append in Python take linear time on average.
In general, we only care about the time to complete a whole task rather than the time of several operations.
This is the reason why we use amortized analysis.
For instance, in the task to store each line in a CSV file to an array, it is more important to store all lines than one line.
2 Popping an itemDynamic array shrink the array size by 1/(growth factor) along with popping an item.
The wasted space will increase if the number of the items is much less against the array size.
This operation correspond to list.
pop in Python.
Following the growth factor in appending an item, we assume dynamic array shrink the array size by half when popping an item.
Popping an item from a dynamic array behaves as bellow, supposing the initial array size as 16.
Thinking the same way as appending an item, the time complexity of popping n items becomes O(n) time.
So amortized time complexity is O(1) time, and popping an item from a dynamic array run in linear time while shrinking the array size at the same time.
How to balance appending and poppingIn section 2, I explain the both behavior of appending and popping separately, we should take the efficiency of executing both operations into account.
Appending and popping an item repeatedly in the below case will increase the wasted operations.
We can handle this problem to adjust the timing of expanding and shrinking.
For example, we shrink the array size by half when the number of items equals to the quarter of the array size.
The expanding stays the same.
Even if you append or pop an item repeatedly, it won’t increase the operation times.
This is because the timing of expanding and shrinking are different.
That’s all for explanation.
In this post, I explained how a dynamic array works of the implementation in Python.
By amortized analysis, Python’ s list.
append and list.
pop run in linear time.
If you are interested in other operations in list or in other data structures in Python, I recommend you to visit TimeComplexity in Python wiki.
Thank you for reading.