Skip to content Skip to sidebar Skip to footer

Python Memory Leak Using Binascii, Zlib, Struct, And Numpy

I have a python script which is processing a large amount of data from compressed ASCII. After a short period, it runs out of memory. I am not constructing large lists or dicts.

Solution 1:

Through comments, we figured out what was going on:

The main issue is that variables declared in a for loop are not destroyed once the loop ends. They remain accessible, pointing to the value they received in the last iteration:

>>>for i inrange(5):...    a=i...>>>print a
4

So here's what's happening:

  • First iteration: The print is showing 45MB, which the memory before instantiating byte_array and a.
  • The code instantiates those two lengthy variables, making the memory go to 51MB
  • Second iteration: The two variables instantiated in the first run of the loop are still there.
  • In the middle of the second iteration, byte_array and a are overwritten by the new instantiation. The initial ones are destroyed, but substituted by equally lengthy variables.
  • The for loop ends, but byte_array and a are still accessible in the code, therefore, not destroyed by the second gc.collect() call.

Changing the code to:

foriinxrange(2):
   [ . . . ]
byte_array = None
a = None
gc.collect()

made the memory resreved by byte_array and a unaccessible, and therefore, freed.

There's more on Python's garbage collection in this SO answer: https://stackoverflow.com/a/4484312/289011

Also, it may be worth looking at How do I determine the size of an object in Python?. This is tricky, though... if your object is a list pointing to other objects, what is the size? The sum of the pointers in the list? The sum of the size of the objects those pointers point to?

Post a Comment for "Python Memory Leak Using Binascii, Zlib, Struct, And Numpy"