Creating Voxel Volumes Quicky in Python
Creating large chunks of memory is quite tricky in Python because of a number of overheads. I’ve been attempting this myself and have found a fast method to do this which combines a few handy optimisations we would need later on for a Voxel Engine anyway.
Please be aware that the code evolves along the way, so don’t blindly copy and paste the first examples =).
Also, the WordPress theme is cutting of some text. If you copy and paste it, it is still there.
First of all, if you’re not using NumPy, you’re not going to get anywhere very quickly.
The first thing most people will try and do is create a 3D array of Voxel objects.
This works… for a while… until you decide to start testing with a reasonably sized array.
Using objects causes 2 problems:
- We have to create each object individually, there’s no Python magic to help us.
- We can’t use nice compression methods on Voxel objects, which we need to do in order to manage any reasonably sized volume.
Voxels should be stored as integers. Unless you’re doing something incredibly detailed, a uint8 should suffice. This gives you 255 possible voxels per volume. Each volume can have its own look up table or you can have a single global one.
With that out the way, let’s proceed with creating our volumes!
3D NumPy array
The most obvious way to generate a volume is to just delcare a big array:
print "512x512x512 numpy zeros" startTime = time.time() array = numpy.zeros( (512, 512, 512), dtype = numpy.uint8 ) print "Time: %s" % str(time.time() - startTime) print "Bytes: %s" % str(array.nbytes)
We get the following output:
512x512x512 numpy zeros Time: 0.111999988556 Bytes: 134217728
The first thing you’ll notice is it took 0.11 seconds to create this. Nothing spectacular but we’re allocating A LOT of memory here. NumPy is more efficient than Python lists so we get the literal memory usage of what we asked for, 134,217,728 bytes.
This isn’t going to scale well. And it’s going to be pretty painful to add new volumes on the fly.
Run Length Encoding Object
So lets use Run Length Encoding to reduce our memory footprint a bit.
There are other compression schemes and further optimisations to RLE that can be used.
Of particular interest is an optimisation to remove RLE encoding of non-repeating values. This will prevent a list like [ 1, 2, 3, 3 ] from becoming [ (1, 1), (1, 2), (2, 3) ], which is a waste.
However, these won’t be covered here.
There is a good implementation of RLE here, but the Improved encoding (which uses the above optimisation) does not work for me. So lets just use the standard RLE encoding.
Here is our initial RLE class:
import itertools import copy class RLE( object ): def __init__( self ): super( RLE, self ).__init__() self.rle =  def copy( self ): rle = RLE() rle.rle = copy.copy( self.rle ) return rle def encode( self, values ): self.rle = [ (len(list(group)),name) for name, group in itertools.groupby( values ) ] def decode( self ): return sum( [length * [item] for length, item in self.rle],  ) def getValueAt( self, position ): # TODO: optimise this # decode, extract values = self.decode() return values[ position ] def setValueAt( self, position, value ): # decode, set, re-encode list = self.decode() list[ position ] = value return self.encode( list )
Now we convert our volume to the following:
print "512x512x512 RLE encoding" startTime = time.time() array = numpy.empty( (512, 512), dtype = numpy.object ) rle = RLE() values = [ 0 for index in xrange( 512 ) ] rle.encode( values ) for x in xrange( 512 ): for y in xrange( 512 ): array[ (x, y) ] = rle.copy() print "Time: %s" % str(time.time() - startTime) print "Bytes: %s" % str(array.nbytes)
Nothing amazing here, just a 2D array creation and the population with a copy of our RLE object. We’ve optimised the RLE computation to only be done once and use the handy copy function we added to our class.
The output is the following:
512x512x512 RLE encoding Time: 1.65600013733 Bytes: 1048576
We’ll, the memory footprint has dropped from 134,217,728 to 1,048,576. A reduction of a massive 99%. But the creation time has spiked to 1.6 seconds.
This isn’t going to scale at all!
Lets time our RLE class to see how long that takes.
print "512 RLE encoding" rle = RLE() values = [ 0 for index in xrange( 512 ) ] startTime = time.time() rle.encode( values ) print "Time: %s" % str(time.time() - startTime)
This gives the output:
512 RLE encoding Time: 0.0
So a single encoding is negligable.
How about making 512 * 512 copies:
print "512x512 RLE copies" rle = RLE() values = [ 0 for index in xrange( 512 ) ] rle.encode( values ) startTime = time.time() for x in xrange( 512 ): for y in xrange( 512 ): rle2 = rle.copy() print "Time: %s" % str(time.time() - startTime)
This gives the output:
512x512 RLE copies Time: 0.853999853134
So its taking almost a second to copy 512×512 RLE objects. Thats pretty bad, but there’s still nearly 1 second un-accounted for.
This can only by the method we’re using to store our objects into the numpy array.
Lets simply it by using the power of numpy:
print "512x512x512 RLE encoding with single assignment" startTime = time.time() array = numpy.empty( (512, 512), dtype = numpy.object ) rle = RLE() values = [ 0 for index in xrange( 512 ) ] rle.encode( values ) array[ : ] = rle.copy() print "Time: %s" % str(time.time() - startTime) print "Bytes: %s" % str(array.nbytes)
Now we get:
512x512x512 RLE encoding with single assignment Time: 0.546000003815 Bytes: 1048576
So we’ve knocked 3.5 seconds off just by removing the 2 loops we have and used a single assignment with numpy.
array[ (0,0)].setValueAt( 0, 9 ) assert array[ (0,1) ].getValueAt( 0 ) != 9
This results in an assertion error. In doing so we’ve made EVERY entry in array point to the SAME RLE object. Modifying one, modifies them all.
So this is definitely out of question.
We somehow need to get the speed of the slice assignment, without assigning a pointer to the same object over and over.
Run Length Encoding Value
Let’s try putting the RLE value itself in the array instead of an object. This way we can use the numpy “fill” operation instead of having to iterate and copy the RLE object.
The RLE value takes up our 3rd dimension, so the numpy array is now 2D and we have to extract the 3rd dimension from the RLE values themselves.
print "512x512 numpy object with starting RLE value" startTime = time.time() array = numpy.empty( (512, 512), dtype = numpy.object ) values = [ 0 for index in xrange( 512 ) ] rle = RLE() rle.encode( values ) rleValues = rle.rle array.fill( rleValues ) print "Time: %s" % str(time.time() - startTime) print "Bytes: %s" % str(array.nbytes)
This gives the following output:
512x512 numpy object with starting RLE value Time: 0.00300002098083 Bytes: 1048576
It takes less time to allocate than a NumPy array and it takes up only 1,048,576 bytes! A reduction of 128% with a homogenous volume!
Obviously the volume will change, but we should be increasing cache coherency with the reduction.
With this method, I can create a chunk of size 512 x 512 x 512 in 0.003 seconds and 2048 x 2048 x 2048 in 0.04 seconds!!!
So the solution is to store RLE values and not Python objects with RLE values.
We can then manipulate the RLE values with the following functions:
import itertools # Code modified from: # http://www.builderau.com.au/program/python/soa/Run-length-encoding-in-Python/0,2000064084,339280649,00.htm def encode( values ): return [ (len(list(group)),name) for name, group in itertools.groupby( values ) ] def decode( rleValues ): return sum( [length * [item] for length, item in rleValues],  ) def getValueAt( rleValues, position ): # TODO: optimise this # decode, extract values = decode( rleValues ) return values[ position ] def setValueAt( rleValues, position, value ): # decode, set, re-encode list = decode( rleValues ) list[ position ] = value return encode( list )
So we’ve gotten a massive speed boost AND we’ve massively cut down our memory usage… with an empty volume anyway.
RLE getValueAt / setValueAt
The getValueAt and setValueAt functions are (obviously) horribly un-optimised and will extract the ENTIRE RLE value. It is pretty obvious how to optimise this but I haven’t written the code to do it yet.
Update: I implemented updated RLE algorithm with proper getValueAt / setValueAt functions and posted the code here. Be aware that I believe this is inferior to the algorithm I mention below.
RLE itself could be better
As an improvement, I’m going to try modifying RLE to store the index of the first repeating value rather than the number of times it repeats. This will allow us to do a simple binary search to perform “get” and “set” operations on the value.
A paper on this is located here.
Update: I’ve posted my implementation of this here.
Hand allocation out to other processes / threads
You could split this out into a threaded batch call and receive the result once it’s done. However, I’m not sure if NumPy would block the GIL, if so, this would probably prevent that from working.
Note: The test time varies if you include all the tests in a single script, or run the individually. These tests were run individually.