How big of a list in Real World Applications would be considered too much?

Question

I understand Python is inherently not a database-specific application. However, if it's possible how big could it be until it starts to affect the application?

Answer 1 · 2020-06-15T22:02:01Z

June 15, 2020 10:02pm

That's a great question!

A list is a native in-memory structure-- thus, a list can grow until your program runs out of memory. On my desktop computer with an IDE and 6 browser tabs in Chrome, etc., I could store 47 million integers, on the Treehouse workspace I could store 12 million integers in memory before my process was terminated. (see code below)

Fortunately, when Python cannot allocate new memory it throws an exception and returns the memory to the OS heap rather than crashing the computer! In some scenarios like the Treehouse workspace, there is a supervisor process which will kill the Python process when it exceeds permitted limits of memory storage.

Some data structures are better suited to numbers, faster, and consume less memory than the Python list-- such as a numpy array-- the Python list is built for generic objects so has more overhead per element.

It is not too difficult to design a data structure that could store even more using disk as memory. Basically you pick how many values you want in memory at one time, and then when you go above that limit, you write out the list to disk. You have two values stored internally to the class beginning and end indices, and if the program calls for an item in memory then you have it, if not, then you read the block that contains it from disk. Deleting and adding elements into the middle is where you have to start thinking about using hashing tables...

But rather than re-invent the wheel, you would want to use a good database program!

memtest.py

# a very simple memory test
# this program will test the approximate number of integers that
# can be stored in a list using a Treehouse workspace
my_array = []
i = 0
while True:
    i = i + 1
    if i % 10000 == 0:
        # print once for every 10000 iterations
        print(i)
    my_array.append(i)

treehouse:~/workspace $ python memtest.py
10000
20000
30000
40000
50000
... stuff omitted ...                                                                                       
12260000                                                                                               
12270000                                                                                               
12280000                                                                                               
12290000                                                                                               
12300000                                                                                               
Killed
treehouse:~/workspace $

Welcome to the Treehouse Community

Looking to learn something new?

Jeremy Hall

Jeremy Hall

How big of a list in Real World Applications would be considered too much?

1 Answer

Jeff Muday

Jeff Muday