Python Memory Leak Using Binascii, Zlib, Struct, And Numpy
I have a python script which is processing a large amount of data from compressed ASCII. After a short period, it runs out of memory. I am not constructing large lists or dicts.
Solution 1:
Through comments, we figured out what was going on:
The main issue is that variables declared in a for
loop are not destroyed once the loop ends. They remain accessible, pointing to the value they received in the last iteration:
>>>for i inrange(5):... a=i...>>>print a
4
So here's what's happening:
- First iteration: The
print
is showing 45MB, which the memory before instantiatingbyte_array
anda
. - The code instantiates those two lengthy variables, making the memory go to 51MB
- Second iteration: The two variables instantiated in the first run of the loop are still there.
- In the middle of the second iteration,
byte_array
anda
are overwritten by the new instantiation. The initial ones are destroyed, but substituted by equally lengthy variables. - The
for
loop ends, butbyte_array
anda
are still accessible in the code, therefore, not destroyed by the secondgc.collect()
call.
Changing the code to:
foriinxrange(2):
[ . . . ]
byte_array = None
a = None
gc.collect()
made the memory resreved by byte_array
and a
unaccessible, and therefore, freed.
There's more on Python's garbage collection in this SO answer: https://stackoverflow.com/a/4484312/289011
Also, it may be worth looking at How do I determine the size of an object in Python?. This is tricky, though... if your object is a list pointing to other objects, what is the size? The sum of the pointers in the list? The sum of the size of the objects those pointers point to?
Post a Comment for "Python Memory Leak Using Binascii, Zlib, Struct, And Numpy"