Position Aware Run Length Encoding in Python

I mentioned previously wanting to improve the RLE to include position information. I’m sure it’s already got a name, but I’m going to call it Position Aware RLE (PARLE).

Below is the code I’ve written to accomplish this. I’m sure someone can come up with a more concise way of writing it, but this is the best I can do at the moment.

'''
Created on 09/06/2011

@author: adam
'''

# similar to RLE
# except it also includes the start position
# of the value
# this lets us to binary searches for get / set
# we can also quickly get the length of the RLE by looking at the
# last element
# the last element should always be (position, None)

import itertools
import bisect

def encode( values ):
    # group like values
    groups = itertools.groupby( values )

    # add offsets
    encoded = []
    position = 0
    for name, group in groups:
        length = len( list(group) )
        encoded.append( (length, position, name) )
        position += length

    return encoded

def decode( rleValues ):
    return sum( [length * [item] for length, position, item in rleValues], [] )

def setValueAt( rleValues, index, value ):
    """
    # count the values with position <= our index
    groups = [ i for i, v in enumerate(rleValues) if v[ 1 ] <= index ]

    # the length of this list - 1 is the index we need to alter
    groupIndex = len(groups) - 1
    """
    keys = [r[1] for r in rleValues]
    groupIndex = bisect.bisect_left( keys, index )
    if groupIndex >= len( rleValues ):
        groupIndex -= 1
    if rleValues[ groupIndex ][ 1 ] > index:
        groupIndex -= 1

    # we have a number of cases to deal with
    # - merging with previous group if we're at the start and are the same value
    # - merging with next group if we're at the end and are the same value
    # - being at the start of the modified group
    # - being in the middle of the modified group
    # - being at the end of the modified group
    # - the modified group being only a single length

    # to make this easier, lets expand the current group, previous, and next
    # we will modify the value, then run our encoder over it
    # get the groups from our encoded array
    startIndex = groupIndex - 1 if groupIndex > 0 else groupIndex
    endIndex = groupIndex + 1 if groupIndex < len(rleValues) else groupIndex

    # de-code the values from our subset
    #values = decode( rleValues[ startIndex: endIndex + 1 ] )
    values = sum( [length * [item] for length, position, item in rleValues[ startIndex : endIndex + 1]], [] )

    # determine our offset within the decoded values
    groupStart = rleValues[ startIndex ][ 1 ]

    # write our new value into the appropriate location
    values[ index - groupStart ] = value

    # re-encode the groups
    # we need to ensure we re-add our proper start positions so we can't use the
    # existing encode function
    def encodeWithOffset( values, offset ):
        # group like values
        groups = itertools.groupby( values )

        # add offsets
        encoded = []
        position = 0
        for name, group in groups:
            length = len( list(group) )
            encoded.append( (length, position + offset, name) )
            position += length

        return encoded

    encoded = encodeWithOffset( values, offset = groupStart )

    # re-insert the values back into the encoded list
    rleValues[ startIndex : endIndex + 1 ] = encoded

    return rleValues

def getValueAt( rleValues, index ):
    """
    # count the values with position <= our index
    values = [ i for i, v in enumerate(rleValues) if v[ 1 ] <= index ]
    # the length of this list - 1 is the index we need to alter
    return rleValues[ len(values) - 1 ][ 2 ]
    """
    keys = [r[1] for r in rleValues]
    groupIndex = bisect.bisect_left( keys, index )
    if groupIndex >= len( rleValues ):
        groupIndex -= 1
    if rleValues[ groupIndex ][ 1 ] > index:
        groupIndex -= 1
    return rleValues[ groupIndex ][ 2 ]

And here is the test code I’m using. It’s not representative of a typical voxel volume (values are random), but it proves the algorithm.

if __name__ == "__main__":
    import random
    import time

    values = [ 1, 1, 2, 2, 4, 6, 6, 6, 1, 1, ]

    encoded = encode( values )
    print encoded

    decoded = decode( encoded )
    #print decoded

    setValueAt( encoded, 2, 1 )
    decoded = decode( encoded )
    values[ 2 ] = 1
    expectedEncoded = encode( values )
    print "encoded         %s" % str(encoded)
    print "expectedEncoded %s" % str(expectedEncoded)
    print "decoded         %s" % str(decoded)
    print "values          %s" % str(values)
    assert len(values) == len(decoded)
    for index in xrange( len( values ) ):
        assert values[ index ] == decoded[ index ]
    for index in xrange( len( encoded ) ):
        assert expectedEncoded[ index ] == encoded[ index ]

    setValueAt( encoded, 2, 2 )
    decoded = decode( encoded )
    values[ 2 ] = 2
    expectedEncoded = encode( values )
    print "encoded         %s" % str(encoded)
    print "expectedEncoded %s" % str(expectedEncoded)
    print "decoded         %s" % str(decoded)
    print "values          %s" % str(values)
    assert len(values) == len(decoded)
    for index in xrange( len( values ) ):
        assert values[ index ] == decoded[ index ]
    for index in xrange( len( encoded ) ):
        assert expectedEncoded[ index ] == encoded[ index ]

    setValueAt( encoded, 6, 1 )
    decoded = decode( encoded )
    values[ 6 ] = 1
    expectedEncoded = encode( values )
    print "encoded         %s" % str(encoded)
    print "expectedEncoded %s" % str(expectedEncoded)
    print "decoded         %s" % str(decoded)
    print "values          %s" % str(values)
    for index in xrange( len( values ) ):
        assert values[ index ] == decoded[ index ]
    for index in xrange( len( encoded ) ):
        assert expectedEncoded[ index ] == encoded[ index ]

    setValueAt( encoded, 6, 6 )
    decoded = decode( encoded )
    values[ 6 ] = 6
    expectedEncoded = encode( values )
    print "encoded         %s" % str(encoded)
    print "expectedEncoded %s" % str(expectedEncoded)
    print "decoded         %s" % str(decoded)
    print "values          %s" % str(values)
    assert len(values) == len(decoded)
    for index in xrange( len( values ) ):
        assert values[ index ] == decoded[ index ]
    for index in xrange( len( encoded ) ):
        assert expectedEncoded[ index ] == encoded[ index ]

    setValueAt( encoded, 9, 6 )
    decoded = decode( encoded )
    values[ 9 ] = 6
    expectedEncoded = encode( values )
    print "encoded         %s" % str(encoded)
    print "expectedEncoded %s" % str(expectedEncoded)
    print "decoded         %s" % str(decoded)
    print "values          %s" % str(values)
    assert len(values) == len(decoded)
    for index in xrange( len( values ) ):
        assert values[ index ] == decoded[ index ]
    for index in xrange( len( encoded ) ):
        assert expectedEncoded[ index ] == encoded[ index ]

    values = [ 1 for x in xrange( 20 ) ]
    encoded = encode( values )
    setValueAt( encoded, 19, 2 )
    decoded = decode( encoded )
    values[ 19 ] = 2
    expectedEncoded = encode( values )
    print "encoded         %s" % str(encoded)
    print "expectedEncoded %s" % str(expectedEncoded)
    print "decoded         %s" % str(decoded)
    print "values          %s" % str(values)
    assert len(values) == len(decoded)
    for index in xrange( len( values ) ):
        assert values[ index ] == decoded[ index ]
    for index in xrange( len( encoded ) ):
        assert expectedEncoded[ index ] == encoded[ index ]

    values = [ random.randint( 0, 4 ) for x in xrange( 2048 ) ]
    encoded = encode( values )

    for x in xrange( 1000 ):
        offset = random.randint( 0, len(values) - 1)
        newValue = random.randint( 0, 10 )

        startTime = time.time()
        setValueAt( encoded, offset, newValue )
        endTime = time.time() - startTime
        print "Set time %s" % str(endTime)

        decoded = decode( encoded )
        assert decoded[ offset ] == newValue

        startTime = time.time()
        decodedValue = getValueAt( encoded, offset )
        endTime = time.time() - startTime
        print "Get time %s" % str(endTime)

        assert decodedValue == newValue

    print encoded
    print decoded
Advertisements

6 Responses to “Position Aware Run Length Encoding in Python”

  1. […] Two peoples adventures in the world of game development « Amnesia… really Run Length Encoding with positioning information in Python […]

  2. […] Twisted Pair Development Two peoples adventures in the world of game development « Run Length Encoding with positioning information in Python […]

  3. […] So we created a volume pretty quickly Part 1, we also implemented basic RLE and Position Aware RLE (PARLE). […]

  4. I used this in my (possibly not) opensource project. Can I use PARLE for it, or does it have some kind of lycense?

  5. One thing I came up with when I was brainstorming your code was: What if we saved it as a string? Do you think it would increase rendement or just slow things down, I’d gladly receive reply on this.

    • Hi drvanon,

      The code is without license. Do as you like.

      Personally, I would avoid using it in preference to other techniques.
      See my responses to this comment: https://twistedpairdevelopment.wordpress.com/2011/06/09/creating-voxel-volumes-quicky-in-python/#comment-463

      But if you think it’s of help then by all means.

      If you save as a string, you will have to uncompress and recompress the data. I only see a performance loss from doing that.
      Imagine 5000 x zero values. With RLE it’s just (5000,0), but saved as a literal string is 5000s zeros.

      Unless you just mean to save as “(5000,0)”, which should be trivial. Just iterate over each tuple and convert it to a string or just pickle it.
      I haven’t used pickle yet, so I’m not sure how it will represent a tuple. Perhaps it will already write that way.

      If you wanted to write to a human readable file I’d write it as 1 line per tuple. Ie:
      10,0
      1,1,
      1,2,
      15,0

      etc.

      Cheers,
      Adam

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: