Sunday, July 5, 2009

ICFP Contest 2009

The ICFP 2009 programming contest started a couple weeks earlier than last year, and unfortunately I only found out about it as it was ending. The challenge is to solve as best as possible in 72 hours a problem that can't be brute forced in 72 hours. I love this challenge; I've been participating in the contest (in python) since 2002 and even left EuroPython early last year just to compete. I don't do writeups every year but I did for 2007, 2004, and 2002. This year I didn't participate officially but I did tackle the program after the fact.

The best resource for writeups and implementations is the FUN team page. They did their entry in python as did an amazingly large number of contestants this year. Their page has links to a dozen writeups, a sub-reddit, and some good pages on the maths of orbital mechanics.

The contest is a great chance to try out new tools and methodologies. I tried Unit Testing for the first time during an ICFP weekend (many moons ago) and I've been sold ever since. If Unit Testing helps you finish a time-limited competition faster then how could it possibly hurt during normal dev? I've also used the contest to try out pair programming, Test Driven Development, and pyrex [I won't talk about those more unless someone asks].

This year I used the contest as a chance to try out the ctypes module and Pygame gui library. More on that below, but first to the ICFP problem:

Problem Defined

This year, like many past years, the challenge was to implement a virtual machine that runs the binaries provided by the organizers. The VM binaries simulated a bunch of orbital mechanics. You then had to write programs that interacted with the simulations to push a satellite from one orbit to another, meet up with other satellites in orbit, and more complicated variations on the theme.
My strategy was the same as in past years:
  1. Read the problem description, write a reference implementation in pure python, and test the hell out of the reference implementation.
    [actual time: 1 hour]

  2. Use that to write a visualizer and explore the problem mechanics and horribly underspecified written problem description.
    [actual time
    [actual time: 2 hours. I had to install pygame and read the docs first]

  3. Write a work-alike in screaming fast C so as much time as possible can be spent solving the problem instead of waiting on simulations to finish.
    [actual time: 3 hours. I had to learn the ctypes module and fool around with it.

The VM was very simple; it consisted of add/subtract/multiply/copy/if-test-else operations. Knocking out a working and tested version took about an hour. The tests were important because even my 150-line python implementation had a couple bugs in it. They also exposed some really shitty bugs in the written problem description. For instance a table listed 10 bits in one opcode as the 'imm' value but it should have been 3 bits of value with 7 bits of padding. The unit tests picked up bad values and a re-examination of the problem description led me to footnote 1.5 which said the value is 3-bits (why didn't they update the table too? no idea).
The nature of the VM lent itself to a fast C implementation. The executable cannot change itself so if the 30th opcode adds memory locations 101 and 102 it always adds memory locations 101 and 102. It never does a conditional jump or anything else funky. Writing a C version of the inner loop was almost as simple as adding print statements to the python version and then throwing the output at gcc. To understand the VM a little better (and test it even more) I added my own opcodes that did asserts and wrote a self-test binary that was a translation of my python unit tests into VM code. I could be confident in the C translation because it ran the self-check the same as the python version. The C version runs 1000 times faster than the pure python version, which is nice. Oddly, adding -Ox compiler flags makes the self test completely shit the bed; I say odd because the program is extremely deterministic so I guessed it would optimize nicely.

Writing a visualizer early (in pygame) was invaluable. The first draft just drew the current orbit, the target orbit, and the satellite's current position (state was printed to stdout). I assigned the arrow keys to manipulate the thrusters and discovered another bug in the spec -- setting thrust dx/dy points the thrusters in that direction so if you want to increase your speed in direction X you need to fire in direction negative X. Playing with the visualizer also answered some other ambiguities in the spec -- the simulation keeps track of your relative position to the Earth but it really means your relative position to the CENTER of the Earth adn not the surface. That isn't just important to know it also makes all the maths much easier.

Solving the level 1 problems was easy. Included in the problem description was maths for calculating thrust vectors for moving an object from one orbit to another. The maths, however, were for doing it using a minimum of fuel but the score for your solution is maximized by doing it as quickly as possible and using all your available fuel. Thanks to the first and simplier solutions I understood enough orbital mechanics to know you always wanted to fire thrusters either perpendicular or parallel to the tangent of the orbit at your current position. Thanks to the screaming fast C implementation I could brute force how much and which way to fire to get a high scoring solution. Having a fast state push/pop was a huge advantage too. Because the simulator works in discrete 1-second intervals the "real" mathematic solution is off by a little; so instead you want to do things a little early or a little later than the "ideal." That discrete solution is easily brute forced once you have the "real" solution.

Writing the ctypes/so library was interesting. The VM is specified as having a program execution area (a list of opcode, arg1, arg2 tuples), a memeory area (a finite array of doubles), a boolean status flag, a double score value, and a couple short double arrays for IO. Because the program loop is deterministic it goes away when you translate it to C. Then you are left with a bunch of double arrays of known size, one boolean, and one other double. The organizers left a big hint that you could implement this as a single array of doubles by leaving the first two values in one of the arrays undefined. So the obvious solution is to stick the one special double and the boolean flag in there (as a double) and then just concat all the arrays-of-doubles together. The max combined size is finite and under 3000 * sizeof(double). Pre-allocating a big array of these and memcpy'ing them for push/pop of state becomes dirt cheap. As a bonus it makes the ctypes interface stupid simple too because the struct is just a single array of 3000 doubles. Kinda: python doesn't have a native double type so ctypes converts to float. In order to get and set the raw double values of the VM I made the ctypes definition a union of 3000 doubles and 3000 unsigned long longs; when deciding what to do floats were close enough but when initializing the VM data or writing the trace I could set/get the 8-byte ulonglongs (hurray for c's type ignorance!).


In the past if I wanted C speed I always hand rolled CPython extension modules. It is a little bit of extra work but the speed is unbeatable (2x faster than pyrex in my experience). ctypes is so useful I don't ever think I'll ever hand-write an extension module again. The only stdlib module that currently uses ctypes is uuid, but I expect many new modules to use ctypes instead of doing it the hard way.

The VM operates on doubles but python only has a native float type. I was very tempted to create a new core datatype by copying Objects/floatobject.c and search/replace'ing every 'float' to 'double' but because I was using ctypes I opted for the quicker and simpler casting of those doubles a ulonglongs when I needed to set/get. I bet Numpy has a way to deal with all this but I'd already hit my limit on new-tools-learned-per-hour. I understand raw C

ctypes makes easy things easy and hard things possible. If your .so (*NIX .dll) has functions that take an int/void and return an int/void you don't even need to provide a prototype -- it just works. So instead of writing a full featured Python/C wrapper for my basic datatype (an array of 64bit values) I just wrote 6 lines of python that defined the struct layout and then a C library that had a bunch of manipulation functions that returned 1/0 success/failure values and 100 lines of python that mapped property names to assignments/reads of the memory chunk. It was much less work for 90% of the speed. It also meant it was easy to apply the unit tests for my pure-python solution to the hybrid solution because the only difference was the underlying storage - a dict for the pure python and an C-array for the ctypes wrapped .so.


In the past my goto-GUI has been Tkinter. I've been using Tk every since reading Learning Python in 2001 which is half a python book and half a Tk book. I've accumulated a personal library of Tk elements that do everything from menus to graph plotting to shape drawing. I threw it all out for this year for pygame and ended up with a very decent visualizer that was just a couple hundred lines of python code. I won't be going back to Tk for graphical GUIs in the future (I still like it for text).

[More later]

ps, I use "maths" plural like the British commonwealths simply because I like it; I lived in Australia for a year and it grew on me. However, you won't hear me singularizing "sports" or saying someone is "in hospital" - they are "in the hospital." English is the best language ever because what is legal is whatever works. To paraphrase a variously attributed quote: "English doesn't borrow from other languages. It takes them down dark alleys, bashes them on the head, and rifles through their pockets." Some people will tell you English has a giant number of rules, but really that someone is just trying to make sense of a system where anything goes.


kushal said...

CPython's float type uses C doubles, as the Library Reference mentions.

Jack Diederich said...

@kushal Well I'll be a son of a bitch. A cursory look at Objects/floatobject.c does suggest everything is done with doubles instead of floats. All the "float_" names are historical fictions.

Please do submit a bug because that will get it fixed faster than waiting for me to commit a renaming.

Jack Diederich said...

bug posted as item 6427. It will either be addressed or not by our phenoms with more experience in search/replace than me.

PhilipBober said...

The quote about english is from James Nicoll.

Morris said...

Thanks so much for the article, quite useful material.
metal building