Creating Voxel Volumes Quicky in Python

Creating large chunks of memory is quite tricky in Python because of a number of overheads. I’ve been attempting this myself and have found a fast method to do this which combines a few handy optimisations we would need later on for a Voxel Engine anyway.

Please be aware that the code evolves along the way, so don’t blindly copy and paste the first examples =).

Also, the WordPress theme is cutting of some text. If you copy and paste it, it is still there.

This post is continued in Position Aware RLE, RLE and Part 2.

First of all, if you’re not using NumPy, you’re not going to get anywhere very quickly.

Voxel Representation

The first thing most people will try and do is create a 3D array of Voxel objects.

This works… for a while… until you decide to start testing with a reasonably sized array.

Using objects causes 2 problems:

  1. We have to create each object individually, there’s no Python magic to help us.
  2. We can’t use nice compression methods on Voxel objects, which we need to do in order to manage any reasonably sized volume.

Voxels should be stored as integers. Unless you’re doing something incredibly detailed, a uint8 should suffice. This gives you 255 possible voxels per volume. Each volume can have its own look up table or you can have a single global one.

With that out the way, let’s proceed with creating our volumes!

3D NumPy array

The most obvious way to generate a volume is to just delcare a big array:

print "512x512x512 numpy zeros"
startTime = time.time()
array = numpy.zeros( (512, 512, 512), dtype = numpy.uint8 )
print "Time: %s" % str(time.time() - startTime)
print "Bytes: %s" % str(array.nbytes)

We get the following output:

512x512x512 numpy zeros
Time: 0.111999988556
Bytes: 134217728

The first thing you’ll notice is it took 0.11 seconds to create this. Nothing spectacular but we’re allocating A LOT of memory here. NumPy is more efficient than Python lists so we get the literal memory usage of what we asked for, 134,217,728 bytes.

This isn’t going to scale well. And it’s going to be pretty painful to add new volumes on the fly.

Run Length Encoding Object

So lets use Run Length Encoding to reduce our memory footprint a bit.

There are other compression schemes and further optimisations to RLE that can be used.

Of particular interest is an optimisation to remove RLE encoding of non-repeating values. This will prevent a list like [ 1, 2, 3, 3 ] from becoming [ (1, 1), (1, 2), (2, 3) ], which is a waste.

However, these won’t be covered here.

There is a good implementation of RLE here, but the Improved encoding (which uses the above optimisation) does not work for me. So lets just use the standard RLE encoding.

Here is our initial RLE class:

import itertools
import copy

class RLE( object ):

    def __init__( self ):
        super( RLE, self ).__init__()
        self.rle = []

    def copy( self ):
        rle = RLE()
        rle.rle = copy.copy( self.rle )
        return rle

    def encode( self, values ):
        self.rle = [ (len(list(group)),name) for name, group in itertools.groupby( values ) ]

    def decode( self ):
        return sum( [length * [item] for length, item in self.rle], [] )

    def getValueAt( self, position ):
        # TODO: optimise this
        # decode, extract
        values = self.decode()
        return values[ position ]

    def setValueAt( self, position, value ):
        # decode, set, re-encode
        list = self.decode()
        list[ position ] = value
        return self.encode( list )

Now we convert our volume to the following:

print "512x512x512 RLE encoding"
startTime = time.time()
array = numpy.empty( (512, 512), dtype = numpy.object )
rle = RLE()
values = [ 0 for index in xrange( 512 ) ]
rle.encode( values )
for x in xrange( 512 ):
    for y in xrange( 512 ):
        array[ (x, y) ] = rle.copy()
print "Time: %s" % str(time.time() - startTime)
print "Bytes: %s" % str(array.nbytes)

Nothing amazing here, just a 2D array creation and the population with a copy of our RLE object. We’ve optimised the RLE computation to only be done once and use the handy copy function we added to our class.

The output is the following:

512x512x512 RLE encoding
Time: 1.65600013733
Bytes: 1048576

We’ll, the memory footprint has dropped from 134,217,728 to 1,048,576. A reduction of a massive 99%. But the creation time has spiked to 1.6 seconds.

This isn’t going to scale at all!

Lets time our RLE class to see how long that takes.

print "512 RLE encoding"
rle = RLE()
values = [ 0 for index in xrange( 512 ) ]
startTime = time.time()
rle.encode( values )
print "Time: %s" % str(time.time() - startTime)

This gives the output:

512 RLE encoding
Time: 0.0

So a single encoding is negligable.

How about making 512 * 512 copies:

print "512x512 RLE copies"
rle = RLE()
values = [ 0 for index in xrange( 512 ) ]
rle.encode( values )
startTime = time.time()
for x in xrange( 512 ):
    for y in xrange( 512 ):
        rle2 = rle.copy()
print "Time: %s" % str(time.time() - startTime)

This gives the output:

512x512 RLE copies
Time: 0.853999853134

So its taking almost a second to copy 512×512 RLE objects. Thats pretty bad, but there’s still nearly 1 second un-accounted for.

This can only by the method we’re using to store our objects into the numpy array.

Slice Assignment

Lets simply it by using the power of numpy:

print "512x512x512 RLE encoding with single assignment"
startTime = time.time()
array = numpy.empty( (512, 512), dtype = numpy.object )
rle = RLE()
values = [ 0 for index in xrange( 512 ) ]
rle.encode( values )
array[ : ] = rle.copy()
print "Time: %s" % str(time.time() - startTime)
print "Bytes: %s" % str(array.nbytes)

Now we get:

512x512x512 RLE encoding with single assignment
Time: 0.546000003815
Bytes: 1048576

So we’ve knocked 3.5 seconds off just by removing the 2 loops we have and used a single assignment with numpy.

But…

array[ (0,0)].setValueAt( 0, 9 )
assert array[ (0,1) ].getValueAt( 0 ) != 9

This results in an assertion error. In doing so we’ve made EVERY entry in array point to the SAME RLE object. Modifying one, modifies them all.

So this is definitely out of question.

We somehow need to get the speed of the slice assignment, without assigning a pointer to the same object over and over.

Run Length Encoding Value

Let’s try putting the RLE value itself in the array instead of an object. This way we can use the numpy “fill” operation instead of having to iterate and copy the RLE object.

The RLE value takes up our 3rd dimension, so the numpy array is now 2D and we have to extract the 3rd dimension from the RLE values themselves.

print "512x512 numpy object with starting RLE value"
startTime = time.time()
array = numpy.empty( (512, 512), dtype = numpy.object )
values = [ 0 for index in xrange( 512 ) ]
rle = RLE()
rle.encode( values )
rleValues = rle.rle
array.fill( rleValues )
print "Time: %s" % str(time.time() - startTime)
print "Bytes: %s" % str(array.nbytes)

This gives the following output:

512x512 numpy object with starting RLE value
Time: 0.00300002098083
Bytes: 1048576

Beautiful!

It takes less time to allocate than a NumPy array and it takes up only 1,048,576 bytes! A reduction of 128% with a homogenous volume!

Obviously the volume will change, but we should be increasing cache coherency with the reduction.

With this method, I can create a chunk of size 512 x 512 x 512 in 0.003 seconds and 2048 x 2048 x 2048 in 0.04 seconds!!!

So the solution is to store RLE values and not Python objects with RLE values.

We can then manipulate the RLE values with the following functions:

import itertools

# Code modified from:
# http://www.builderau.com.au/program/python/soa/Run-length-encoding-in-Python/0,2000064084,339280649,00.htm
def encode( values ):
    return [ (len(list(group)),name) for name, group in itertools.groupby( values ) ]

def decode( rleValues ):
    return sum( [length * [item] for length, item in rleValues], [] )

def getValueAt( rleValues, position ):
    # TODO: optimise this
    # decode, extract
    values = decode( rleValues )
    return values[ position ]

def setValueAt( rleValues, position, value ):
    # decode, set, re-encode
    list = decode( rleValues )
    list[ position ] = value
    return encode( list )

So we’ve gotten a massive speed boost AND we’ve massively cut down our memory usage… with an empty volume anyway.

Further Work

RLE getValueAt / setValueAt

The getValueAt and setValueAt functions are (obviously) horribly un-optimised and will extract the ENTIRE RLE value. It is pretty obvious how to optimise this but I haven’t written the code to do it yet.

Update: I implemented updated RLE algorithm with proper getValueAt / setValueAt functions and posted the code here. Be aware that I believe this is inferior to the algorithm I mention below.

RLE itself could be better

As an improvement, I’m going to try modifying RLE to store the index of the first repeating value rather than the number of times it repeats. This will allow us to do a simple binary search to perform “get” and “set” operations on the value.

A paper on this is located here.

Update: I’ve posted my implementation of this here.

Hand allocation out to other processes / threads

You could split this out into a threaded batch call and receive the result once it’s done. However, I’m not sure if NumPy would block the GIL, if so, this would probably prevent that from working.

Note: The test time varies if you include all the tests in a single script, or run the individually. These tests were run individually.

Advertisements

13 Responses to “Creating Voxel Volumes Quicky in Python”

  1. […] So we created a volume pretty quickly in our previous post. […]

  2. You need a proofreader…

    “We’ll, the memory footprint has dropped from 134,217,728 to 536,870,912. A reduction of 75%.”

    “536,870,912”, 9 digits, is BIGGER than “134,217,728”, also 9 digits. So it’s not a big reduction, but a big increase!

    • Thanks for pointing that out.
      The issue was not that number but the code itself. It was allocating 512x512x512 and inserting RLE of 512 (512 ^ 4) instead of 512×512 with RLE of 512 (512 ^ 3).
      I’ve updated the code and output.

      Thanks,
      Adam

  3. Tom Mitchell Says:

    After following the creation of the initial RLE encoding I get a Traceback of ” TypeError: long() argument must be a string or a number, not ‘RLE’ ” , can’t seem to figure out how I’ve gone wrong :/ Typed everything in word for word!

    Thanks,
    Tom

    • I’m sure it’s a trivial typo on my part.
      I haven’t got the project up at the moment. What line is creating the issue?
      Sorry, I’m in a lazy mood =(

  4. I am considering using a hashmap, or as Python calls it a dictionary.

    When the volume is sparse, it is very efficient.
    And the operations on it are very simple to do.

    To place a single voxel in an empty volume:
    volume = {}
    volume[ (0,0,0) ] = 1

    Ranged operations wil be slower though: to do a neighbourhood search, each neighbour needs a hash lookup.

    But the representation is elegant, and the volume is unbounded: it can grow in any direction.

    • It really depends on a lot of values.
      How are you rendering it?
      How smart is your polygonisation algorithm?
      How often do you modify the mesh?
      How big will your volumes be?

      If you’re not updating the mesh much then you may be ok with this approach.

      Even with a single byte for each voxel, you’ll very quickly consume all of your memory with a reasonably sized volume. And a dict will expand WELL beyond a standard array’s size, so you’re talking many multiples of the volume.
      Unless you specify a max at creation, which would just negate ANY optimisation since it will alloc it all at once.

      I don’t know if you can force a dict to use a certain type only, ie int, byte, etc.
      If not, then I’d also be concerned about the dict over-allocating space. I haven’t looked into the dict implementation. It could be a simple address redirect, or it could be an address redirect with an array of storage behind it.
      If it’s the former then you’ll have issues with memory being non-contiguous and you will get cache thrashing as you load data in from all over your allocated RAM.

      If you’re ray casting, you NEED SVO or a similar structure. You just can’t do it without being able to have some method of collision detection that is inherent in the data container.

      RLE would work well with something like a hilbert curve, but I haven’t tried this.
      All research seems to be going toward SVO as Octree is a pretty well proven data structure / algorithm.

      I wouldn’t use the above approach in production, it was just a stepping stone. I would go for either SVO or RLE hilbert.
      I think they would be comparable. And Hilbert curve gives you nearest neighbor more easily, which in SVO requires some nasty logic.

      A dict will get you going quickly, just don’t expect it to keep you going long.

      Good luck, and if you get anywhere, consider releasing the source =)
      Python needs a hand in the game dev department.

      Cheers,
      Adam

    • A friend told me about a different form of SVO. It’s basically the inverse of an SVO.
      Instead of an empty leaf representing empty space, empty regions represent full areas.
      Non empty areas contain the distance to the nearest voxel.

      Ie,
      Null = a solid volume
      1 = nearest filled voxel is 1 away
      2 = nearest filled voxel is 2 away
      etc

      This is incredibly fast for ray casting.
      It’s just a real pain if you want to modify a voxel. There may be some algorithm out there to help with that, but I haven’t looked it up.

      It also prevents you from using multiple materials. At least without some duplication.
      If you wanted multiple materials you could overlay multiple volumes and have 1 material each.
      You could then step the smallest distance of each volume at that point
      Ie, volume 1 point says next voxel is 5 away, volume 2 says nearest is 1 away.. so you step 1.

      Cheers
      Adam

  5. Thank you very much for your effort! Im trying to learn Python 3 but it seems your Code is made for a older Python version. Would you update it please? That would be cool, because i want to Test your Code in the Blender Game Engine.

    • I’m not currently using Python 3 due to Pyglet and other dependencies not working in it.

      I also wouldn’t recommend using the code I present in this.
      A better data structure is a Sparse-Voxel Octree.
      They’re not too hard to code. I can upload one I wrote later on when I get time.

      Cheers,
      Adam

  6. Is it just me or is the first code section under the heading “Run Length Encoding Value” wrong as fill will create a reference in each element meaning that if you change any field it changes the field everywhere?

    • This is all old code from when I was learning Python, I wouldn’t be surprised if it’s a bit bollocks.
      Feel free to fix it up and post any corrections.

      Cheers,
      Adam

  7. Link exchange is nothing else however it is only placing the other person’s website link on your
    page at suitable place and other person will also do similar
    in support of you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: