Jack Diederich's Python Blog: 2009

Wednesday, December 30, 2009

A Glimpse of PyCon in Your Hometown

As in past years local Python user groups are hosting PyCon speakers to give their talks in a dress rehearsal before PyCon. It is good for the speakers, good for the local user groups, and good for PyCon (more polished talks! I did a teardown/rewrite of my talk last year based on Boston PIG feedback). Here are three dates I know of with locations and speakers:

Toronto, Feb 16, Linux Caffe

* Leigh Honeywell: Think Globally, Hack Locally - Teaching Python in Your Community
* Greg Wilson: What We've Learned From Building Basie
* Mike Fletcher: Debating 'til Dawn: Topics to keep you up all night

Boston, Jan 20, Microsoft NERD center

* Francesco Pierfederici: Python for Large Astronomical Data Reduction and Analysis Systems
* Jack Diederich: Python's Dusty Corners
* Ned Batchelder: Tests and Testability
* [added] Antonio Rodriguez: preview of his Keynote

Boston, Feb 3, Microsoft NERD center

* Peter Portante: Demystifying Non-Blocking and Asynchronous I/O
* Glyph Lefkowitz: Turtles All The Way Down: Demystifying Deferreds, Decorators, and Declarations
* Edward Abrams: DJing in Python: Audio processing fundamentals

Seattle, Jan 30, Paul Allen Center

[added from Seattle PIG in the comments]
* Speakers TBD

The Toronto announcement is here and though Boston hasn't been officially announced it was organized by Ned Batchelder this year (he organized last year's "PyCon on the Charles" too).

I'm happy to see Boston back up in the rankings with six talks this year. I intend to fully [ab]use my slot by briefly highlighting all the "dusty corners" (all the __method__ methods and object protocols) and then, based on feedback, expanding on just a subset for PyCon proper. The opportunity to see talks beforehand is a boon; you can see a talk NOW and free up a slot during PyCon and you can also see talks that you would otherwise skip altogether (I saw a great talk last year that I wouldn't have picked based on the printed program's blurb). And of course you can get ahead on your "hallway track" by catching up with people a month ahead of time.

Monday, October 5, 2009

Cthulhu and Python

Andrew Kuchling broached the topic on twitter so I think that makes it fair game.

For those of you who don't know H.P. Lovecraft wrote the short fiction horror story The Call of Cthulhu in 1926. The story would only become popular much later, and only in recent decades it has been every sci-fi writer's rite of passage to embellish and extend the genre. It is to those authors what The Aristocrats joke is to stand-up comedians - everyone tries to take the same idea and make it their own. Like The Aristocrats the Cthulhu Mythos is widely pursued because it is opened ended (inviting mutations) and nobody gets sued for riffing on it.

There are some sci-fi authors that routinely include Python in their fiction. Charlie Stross is one of those; as it happens at a PyCon I was chatting with a developer who was extolling Stross's work (his girlfriend happened to be DoD and working on a counter virus predicted by Stross and which Stross named in his book after a 1967 experimental Swedish soft-core movie). I had recently read and enjoyed Stross's short variation on the Cthulhu mythos A Colder War (in which Ollie North and Reagan damn us all to Hell using not nukes, but instead weakly godlike beings) and I've been a Stross fan ever since.

Charlie has featured python-3000 in his last couple books, the first of which was put out when py3.0 was just a joking reference. His politics are plain enough (though Scottish he's more of a Fabian Socialist) but his writing is tight so the occasional implausibilities (python dominates the world; the US economy becomes third world in ten years) are forgivable.

I would have tiwttered all that, but the medium doesn't allow it.

Wednesday, September 30, 2009

My PyCon 2010 Talk

My talk is tentatively titled "Python's Dusty Corners."* It will be a brief overview of all the features in python that you don't need to know about right up until the moment you do. The list includes how comparisons work, descriptors, iterators, context managers, namespaces, else clauses on for/while loops (suggested by Hettinger), and whatever else you can suggest in the comments (please do!). The narrative of the talk is that these are features that you don't need and/or shouldn't use in your day-to-day code but that you need to keep in the back of your mind because other people's code and the stdlib do use them. As Alex Martelli pointed out in his wonderful talk Abstractions as Leverage you can't successfully function at one level of abstraction if you don't know what is going on at the next level down. This talk is a whirlwind tour of the next level down.

I'm honored to be an invited speaker this year. This just means the program committee has pre-approved any talk I give instead of going through the normal program committee proposal process. I had some stem-winding conversations with friends about what this means and what purpose it serves. Firstly it is a flattering inducement to get prior popular speakers to speak again, and possibly no more than that. Secondly it gives speakers a chance to do a talk that might not make it through the normal approval process. I knew a person in college** who's motto was "There is a fine line between being The Man and being That Guy." Imagine a Venn diagram with barely overlapping circles labeled "Good Ideas" and "Bad Ideas;" Being "The Man" is the thin overlap between the two, and committees are very good at avoiding any idea that is anywhere close to the "Bad ideas" region, let alone one that that is actually in it. Having invited speakers is a way for the committee to include those ideas with minimal risk by inviting people who have a proven track record and hoping they don't screw up.

That said my talk is pretty safe and certainly would have made it through the normal process. I would love to give a talk I thought was in the dangerous "The Man" zone but I haven't the foggiest idea of what that talk would be. Err, I have some idea but none long enough to be a proper talk. For lightning talks I'll be preparing "I love graphs" (I do, and I have the graphs to prove it), "The Physics of Bowling Balls" (waaay more interesting than you would guess), and my always-threatened-never-done talk "PyAsshole: Simulating a partial information, non-trump, drinking card game in Python."

* As much as I liked the title I proposed on twitter It wouldn't help the conference (or me, or anyone really) to have a talk titled "Strange Python Shit" on the program.
** He looked suspiciously similar to me, but with hair down to his shoulders and an eyebrow ring (lay off, it was the early 90s).

Your Talk Proposal Here

As it turns out one of the marginal items that gets trimmed in a down economy is conference jaunts [Q: who would have guessed? A: everybody]. PyCon 2010 needs talks, so if you have something interesting to say to a few hundred people this is your chance. I didn't volunteer for the program committee this year so I don't know the exact numbers but the acceptance rate is going to be much higher than the 50% for past years.

This is a great opportunity to practice your chops; I got my start when PyCon was 300 people and talks had a 90% acceptance rate. The last few years PyCon has had 1000+ people and even the smallest talk room gets 200 people. If you have anything interesting to say, this is your chance to say it.

NB, hopefully PyCon won't have to do what other conferences routinely do and say "we've extended the deadline, but this time we mean it!" For fuck's sake the conference is in Atlanta in February - I'll be happily golfing during the time I'm not speaking.

NB, by "golfing" I also mean "going to the shooting range" and "bowling" as weather dictates.

ENB: Anna Ravenscroft has a tidy short list of bogus reasons why you can't give a talk. The title is pointed at women but the excuses are universal.

Friday, August 21, 2009

Looking For a New Gig

I'm a Boston based python developer looking for a new gig, full time or consulting. Here are my resume highlights:

Co-founded a web analytics & marketing company (Psynchronous Communications), and grew it to 1M+ in annual revenues.

Python core developer, author of Class Decorators.

PyCon, EuroPython, and PyCon UK speaker. (video of my PyCon2009 talk).

Hired, managed, and led small teams of developers in the rapid development of web applications.

10+ years at web startups using a LAMP stack. 7+ years in Python. 10+ years C/C++.

OpenAir.com company foosball champion.

The full "brag sheet" available on request (which omits that last item due to space restrictions).

Jack Diederich
email: jackdied@gmail.com
cell: 617-821-1734

PS I've posted my contact information in several public places over the years but it is still invisible to google. Hopefully repetition will fix that.

Sunday, July 12, 2009

Dominoes

Playing dominoes is hard. Like playing card games there are many games you can play with dominoes and last night I was exposed to Muggins/Fives. Muggins has a points system and unlike the kiddie version of dominoes that I grew up with is not a game of chance (think War versus Bridge). The big tipoffs that it is a hard problem are that A) it is a partial information game [you can only see the board and your own hand] and B) grown men play against each other for money [I'm told the variant we were playing is very popular in the Bahamas].

The first thing I did was look for AI research papers on solving dominoes. After an hour of feeding terms to search engines I can say: there is no research. This is strange because Checkers (a boring perfect information game) was still interesting enough to researchers that it was solved only 10 years ago. People don't bet on checkers so the fact that dominoes is a research orphan left me intrigued. Of course there's a personal angle too: If it was an easy problem to solve I could solve it and then travel while grifting strangers out of their money; If it was a hard problem to solve (like Bridge or Go) then I'd add it to my list of "fun things to play."

In Muggins you get points when all of the exposed end tiles add up to a multiple of five. If you go out first you also get bonus points equal to the sum of the pips in your opponents' hands.

After a couple hours I had a working 150 line simulator and two simple strategies: play any legal move (dumb strategy) and play the highest scoring legal move (less dumb). I use the term "working" loosely. A number of nefarious bugs related to non-randomness lurked in the code. These caused the last player added to win ties in scoring, who goes first, and some other spots. You can view the final dominoes.py source here.

To implement the strategies I went for the simplest thing that could possibly work: generator coroutines. They have a simple interface and they keep state so even somewhat complicated strategies are possible without writing a big interface class. You just write a function with a couple breakpoints and everything just works. Here is the generator for "play the first move that is legal"

def dumb_player(player_no, board, hand):
    ''' play the first domino that is legal '''
    yield None # signal that we are setup
    while True:
        for a, b in pairs_and_reverse(hand): # (a,b)+(b, a)
            try:
                board.play(player_no, a, b)
                # if we get here it is a legal play
                yield None
                break
            except IllegalPlay: pass
        else:
            # draw a possibly legal play
            board.draw(player_no)
    return

The boiler plate of setup and try/except Illegal play was identical over all the simple strategies so I refactored so that the strategy is just a scoring function applied to all legal moves. The simple strategy becomes ("a" and "b" are the two ends of the domino, "a" is the side that matches the board and "b" is the open end left after play):


def dumb_player(board, player_no, a, b):
    ''' randomly choose a legal play '''
    return None # all plays are equal

.. and the strategy for playing the highest scoring move possible is


def score_player(board, player_no, a, b):
    ''' always plays the immediately highest scoring tile '''
    sc = board.play(player_no, a, b)
    board.undo()
    return sc

Add a function that does a round-robin tourney of all scoring functions 5000 times each and you have a quick fitness test.

Well, it turns out dominoes is easy (or at least simple). This blew away all my assumptions; If you look at the source I have a dozen scoring functions some of which consider the pips on the already played tiles and the secret pips in the players hand. The best of them beats the very simple "score_player" just 50.5% of the time. 50.5% is a solid money maker if you are playing blackjack against the house at hundreds of hands an hour, but peanuts if you are playing 10 games of dominoes an hour against someone who is using the simple score_player strategy.

The source is out there and understandable (500 lines, 300 of which are short strategy functions). If anyone can consistently beat my best attempt "score_blocker6" then post the source in the comments and I'll buy you a beer next PyCon.

Sunday, July 5, 2009

ICFP Contest 2009

The ICFP 2009 programming contest started a couple weeks earlier than last year, and unfortunately I only found out about it as it was ending. The challenge is to solve as best as possible in 72 hours a problem that can't be brute forced in 72 hours. I love this challenge; I've been participating in the contest (in python) since 2002 and even left EuroPython early last year just to compete. I don't do writeups every year but I did for 2007, 2004, and 2002. This year I didn't participate officially but I did tackle the program after the fact.

The best resource for writeups and implementations is the FUN team page. They did their entry in python as did an amazingly large number of contestants this year. Their page has links to a dozen writeups, a sub-reddit, and some good pages on the maths of orbital mechanics.

The contest is a great chance to try out new tools and methodologies. I tried Unit Testing for the first time during an ICFP weekend (many moons ago) and I've been sold ever since. If Unit Testing helps you finish a time-limited competition faster then how could it possibly hurt during normal dev? I've also used the contest to try out pair programming, Test Driven Development, and pyrex [I won't talk about those more unless someone asks].

This year I used the contest as a chance to try out the ctypes module and Pygame gui library. More on that below, but first to the ICFP problem:

Problem Defined

This year, like many past years, the challenge was to implement a virtual machine that runs the binaries provided by the organizers. The VM binaries simulated a bunch of orbital mechanics. You then had to write programs that interacted with the simulations to push a satellite from one orbit to another, meet up with other satellites in orbit, and more complicated variations on the theme.
My strategy was the same as in past years:

Read the problem description, write a reference implementation in pure python, and test the hell out of the reference implementation.
[actual time: 1 hour]

Use that to write a visualizer and explore the problem mechanics and horribly underspecified written problem description.
[actual time
[actual time: 2 hours. I had to install pygame and read the docs first]

Write a work-alike in screaming fast C so as much time as possible can be spent solving the problem instead of waiting on simulations to finish.
[actual time: 3 hours. I had to learn the ctypes module and fool around with it.

The VM was very simple; it consisted of add/subtract/multiply/copy/if-test-else operations. Knocking out a working and tested version took about an hour. The tests were important because even my 150-line python implementation had a couple bugs in it. They also exposed some really shitty bugs in the written problem description. For instance a table listed 10 bits in one opcode as the 'imm' value but it should have been 3 bits of value with 7 bits of padding. The unit tests picked up bad values and a re-examination of the problem description led me to footnote 1.5 which said the value is 3-bits (why didn't they update the table too? no idea).
The nature of the VM lent itself to a fast C implementation. The executable cannot change itself so if the 30th opcode adds memory locations 101 and 102 it always adds memory locations 101 and 102. It never does a conditional jump or anything else funky. Writing a C version of the inner loop was almost as simple as adding print statements to the python version and then throwing the output at gcc. To understand the VM a little better (and test it even more) I added my own opcodes that did asserts and wrote a self-test binary that was a translation of my python unit tests into VM code. I could be confident in the C translation because it ran the self-check the same as the python version. The C version runs 1000 times faster than the pure python version, which is nice. Oddly, adding -Ox compiler flags makes the self test completely shit the bed; I say odd because the program is extremely deterministic so I guessed it would optimize nicely.

Writing a visualizer early (in pygame) was invaluable. The first draft just drew the current orbit, the target orbit, and the satellite's current position (state was printed to stdout). I assigned the arrow keys to manipulate the thrusters and discovered another bug in the spec -- setting thrust dx/dy points the thrusters in that direction so if you want to increase your speed in direction X you need to fire in direction negative X. Playing with the visualizer also answered some other ambiguities in the spec -- the simulation keeps track of your relative position to the Earth but it really means your relative position to the CENTER of the Earth adn not the surface. That isn't just important to know it also makes all the maths much easier.

Solving the level 1 problems was easy. Included in the problem description was maths for calculating thrust vectors for moving an object from one orbit to another. The maths, however, were for doing it using a minimum of fuel but the score for your solution is maximized by doing it as quickly as possible and using all your available fuel. Thanks to the first and simplier solutions I understood enough orbital mechanics to know you always wanted to fire thrusters either perpendicular or parallel to the tangent of the orbit at your current position. Thanks to the screaming fast C implementation I could brute force how much and which way to fire to get a high scoring solution. Having a fast state push/pop was a huge advantage too. Because the simulator works in discrete 1-second intervals the "real" mathematic solution is off by a little; so instead you want to do things a little early or a little later than the "ideal." That discrete solution is easily brute forced once you have the "real" solution.

Writing the ctypes/so library was interesting. The VM is specified as having a program execution area (a list of opcode, arg1, arg2 tuples), a memeory area (a finite array of doubles), a boolean status flag, a double score value, and a couple short double arrays for IO. Because the program loop is deterministic it goes away when you translate it to C. Then you are left with a bunch of double arrays of known size, one boolean, and one other double. The organizers left a big hint that you could implement this as a single array of doubles by leaving the first two values in one of the arrays undefined. So the obvious solution is to stick the one special double and the boolean flag in there (as a double) and then just concat all the arrays-of-doubles together. The max combined size is finite and under 3000 * sizeof(double). Pre-allocating a big array of these and memcpy'ing them for push/pop of state becomes dirt cheap. As a bonus it makes the ctypes interface stupid simple too because the struct is just a single array of 3000 doubles. Kinda: python doesn't have a native double type so ctypes converts to float. In order to get and set the raw double values of the VM I made the ctypes definition a union of 3000 doubles and 3000 unsigned long longs; when deciding what to do floats were close enough but when initializing the VM data or writing the trace I could set/get the 8-byte ulonglongs (hurray for c's type ignorance!).

ctypes

In the past if I wanted C speed I always hand rolled CPython extension modules. It is a little bit of extra work but the speed is unbeatable (2x faster than pyrex in my experience). ctypes is so useful I don't ever think I'll ever hand-write an extension module again. The only stdlib module that currently uses ctypes is uuid, but I expect many new modules to use ctypes instead of doing it the hard way.

The VM operates on doubles but python only has a native float type. I was very tempted to create a new core datatype by copying Objects/floatobject.c and search/replace'ing every 'float' to 'double' but because I was using ctypes I opted for the quicker and simpler casting of those doubles a ulonglongs when I needed to set/get. I bet Numpy has a way to deal with all this but I'd already hit my limit on new-tools-learned-per-hour. I understand raw C

ctypes makes easy things easy and hard things possible. If your .so (*NIX .dll) has functions that take an int/void and return an int/void you don't even need to provide a prototype -- it just works. So instead of writing a full featured Python/C wrapper for my basic datatype (an array of 64bit values) I just wrote 6 lines of python that defined the struct layout and then a C library that had a bunch of manipulation functions that returned 1/0 success/failure values and 100 lines of python that mapped property names to assignments/reads of the memory chunk. It was much less work for 90% of the speed. It also meant it was easy to apply the unit tests for my pure-python solution to the hybrid solution because the only difference was the underlying storage - a dict for the pure python and an C-array for the ctypes wrapped .so.

pygame

In the past my goto-GUI has been Tkinter. I've been using Tk every since reading Learning Python in 2001 which is half a python book and half a Tk book. I've accumulated a personal library of Tk elements that do everything from menus to graph plotting to shape drawing. I threw it all out for this year for pygame and ended up with a very decent visualizer that was just a couple hundred lines of python code. I won't be going back to Tk for graphical GUIs in the future (I still like it for text).

[More later]

ps, I use "maths" plural like the British commonwealths simply because I like it; I lived in Australia for a year and it grew on me. However, you won't hear me singularizing "sports" or saying someone is "in hospital" - they are "in the hospital." English is the best language ever because what is legal is whatever works. To paraphrase a variously attributed quote: "English doesn't borrow from other languages. It takes them down dark alleys, bashes them on the head, and rifles through their pockets." Some people will tell you English has a giant number of rules, but really that someone is just trying to make sense of a system where anything goes.

Tuesday, April 7, 2009

HOWTO: get useful information out of the buildbot

The CPython core has a raft of machines that do nothing but pull updates from subversion (the code repository) and run the unit tests. You can see the full and somewhat cryptic list of all the boxes and their status on the buildbot webpage. I had to relearn how to read all the output because I had failing tests that only failed on other people's boxen. So here's the HOWTO.

Find your branch
Ten minutes after your checkin reload the buildbot page and find the machines running the branch you checked into. The machines are titled with the codebase being run, currently either "trunk" (aka 2.7), 2.6 (the maintenance branch), 3.0 (another maintenance branch), or 3.x (the py3k trunk). The other words in the name are some combination of hardware, operating system, and compiler.

Open a bunch of tabs
Each vertical column below the name is a time series of builds and statuses with the most recent at the top. The items are either Green (completed, OK), Red (completed, catastrophe), or Yellow (either still running, ambiguous success, or informational). Open a tab by clicking on the "Build NNN" links on all the machines running the branch you care about. Your checkin is listed in the leftmost column so only pick builds than start above (afterwards) that checkin. Then wait an hour or two. [what does the build number mean? I have no idea but I'm guessing the Nth build for that machine]

Check the builds
Most of the builds should have finished so go ahead and reload all the tabs for the individual machines. If the build is still in progress you can tell by the giant header that says "Build In Progress." If it is done you will see a series of little headers and links. Each header is for the different stages: update from svn, run ./configure, recompile the source, and run the test suite. The link titled "stdio" after each of these should be renamed more plainly "view ./configure output," "view test output" etc. This is what you want to see.

Find the output you care about
Search to find the tests and failures that apply to you. Especially on the trunk there may be failures that aren't your fault. Someone elses' checkin might even be causing an abort before your stuff even gets run. If you stuff works, great! If not..

Checkin, rinse, and repeat
Based on the output you may need to make another checkin and let all the buldbots run again. If the failure isn't verbose enough then you will have to checkin some debugging output and wait for them to run again.

.. and that's all there is to it.

telnetlib progress

The first item in my Fixing Telnetlib TODO ("#1 test the hell out of telnet") is nearly done. The unit tests now test IAC handling and SB negotiation in addition to the read_* methods. As a bonus it looks like I fixed all the race conditions in the read tests'cause the builbots are going greener. (aside: did you know about Queue.join()? I didn't, very handy).

The only remaining nit is that the SB data tests are creating an uncollectable GC cycle. The Telnet object has a reference to the negotiation callback. The negotiation callback needs to call telnetob.read_sb_data() to get at the SB data. So I have a nego_collector class that looks like


class nego_collector(object):
    def __init__(self, sb_getter=None):
        self.seen = ''
        self.sb_getter = sb_getter  # cycle, this is a Telnet.read_sb_data bound method
        self.sb_seen = ''

    def do_nego(self, sock, cmd, opt):
        self.seen += cmd + opt
        if cmd == tl.SE and self.sb_getter:
            sb_data = self.sb_getter()
            self.sb_seen += sb_data

The nego_collector either needs to keep a weakref to the function or we have to break the cycle manually. Consider this just another crufty corner in telnetlib.

[woops]. I spoke too soon. Not all the buildbots are passing so I now have a machine running the telnetlib tests in an infinite loop with the CPU heavily loaded. Hopefully I can smoke out the remaining race conditions locally. If not I'll have to sign up to use the Snakebite testing farm.
[later] Fixed. Almost certainly. We now allow a margin of error of 100% (a whopping 0.3 seconds) in our timing assertions and we do fewer of them.

Saturday, April 4, 2009

Speaking about Speaking

AMK's talk How to Give a Python Talk is very informative, you should watch it even if you aren't planning on giving a talk. Why should you watch it? partly because it gives you an idea of what goes into a talk and partly because it demystifies giving a talk enough that it might prompt you into giving one. Lots of solid advice.

Andrew's talk itself is a nice illustration of some of his points. No one would mistake Andrew for a motivation speaker; you don't walk away from that talk with an inexplicable need to buy what he's selling and given the audience you might actually be pissed off if you thought he was trying to sell something. (talk->content != NULL) ? Good_talk : Bad_talk. PyCon attendees care more about red meat than glitter and are very forgiving on presentation if the red meat is there.

How I do it what I do to prepare has heavy overlap with what Andrew recommends. Practice is king. When I step on the stage I'm not nervous per se, but when speaking in front of a large audience I do tend to read the slides much more than I talk about them in practice. So my rule of thumb is to practice a talk where I spend three minutes per slide knowing that I'll drop most of my segues and only spend one minute live talking per slide. Figure out your own constant and practice against that. I was amazed at Ned Batchelder's talk because the the video of his talk matched so closely with his text explication of his slides. The prepared text is almost 1-to-1 which I personally just can't do.

Narrative, Narrative, Narrative: Pick a theme and stick with it. If you don't talk to your premise once every couple minutes then you have failed. My talk was "Class Decorators: Radically Simple" and I tried to say on every example that a decorator was a callable that took one argument and returned something. Raymond Hettinger's talk was "Easy AI in Python" and he started and finished every example emphasizing that a novice could do it. Alex Martelli's talk was "Abstractions as Leverage" and he introduced every slide with a quote from a very dead (and sometimes white) male who had made the same point back when writing was a novelty. It seems odd but part of your job as a speaker is to repeat yourself, repeatedly.

Don't drink coffee: This sucks, but you can't drink your normal amount of coffee before your talk. I was hoping to drink a few cups and balance it out with a bloody mary but my talk was in the AM and the hotel bar wasn't open. Instead I drank only a little coffee so I wouldn't be humming on stage. I'm told Beta Blockers work to suppress the nerves (symphony orchestras use them) but I haven't tried it myself.

Practice is free and Plentiful: It is a not-so-secret fact that user groups, PIGs, and even Cons are starved for presenters. My most recent talk started as a lightning talk and then I gave it at a local user's group and a couple Cons that had 90%+ acceptance rates before giving it at PyCon. Practice is good and the opportunities for practice are many.

You already know something to talk about At the Boston PIG talk-dry-run (all the PyCon presenters gave their talk to 30 people a week before they gave it to 300+) I spent the first five minutes talking about talking. You do know something you can do a talk about and it sounds like "what is something I wish I knew about one year ago?" It's that easy. Try one or three ideas on the local group as a lightning talk and then grow the best one into a proper talk proposal.

It isn't complicated, see you with a speaker's badge next year!

Small test_telnetlib progress

My first patch of test_telnetlib is up. It tests most of the guarantees that the various Telnet.read_* methods make (I'm sure I missed a couple). The only problem is that every single test theoretically has a race condition. In actual practice the chances of a race are 0.0%, but theoretically it isn't sound. I posted it as a patch (as opposed to just committing it) to see if anyone has an opinion.

For the next round of tests I'll be writing unit tests for the out-of-band negotiations parser.

Friday, April 3, 2009

PyCon Errata

Old and New Faces It was good to see everyone, too many names to mention. That includes all the other Boston pythoneers who I tend to see just once a year and in a city not named "Boston." There is never enough time to time to talk to everyone but I did try. I also did my usual thing which is to purposely eat lunch with no one I know [it's my fifth PyCon so this rule has been relaxed to "as few people I know as possible"]. A few mentions: somehow I'd never met Jesse Noller before (despite many PyCons and him being in Boston); Georg Brandl made it over to the US for PyCon for the first time; I didn't run into Martin Blais until day five when he was sitting next to me at sprints; a sixteen year old (who is senior to me on py-dev) thanked me for contributing a patch; and David Mertz (whom I had never met in person) ran up, introduced himself, and disappeared into the ether (far too brief: I have to invite him over for dinner or something).

Limited Excess In a down economy attendance and freebies were also down. Almost no speakers ended their talk with a "and we'ere hiring!" slide as opposed to the past standard of 100%. To my shock and horror I actually had to pay for most of my own dinners and drinks. CCP/EVE Online was a standout in this respect [If you're wondering how a company in Iceland can afford to be generous remember that their subscribers pay in dollars and euros, not kronas].

EVE Fan-Fest I learned about EVE Fan-Fest not from the CCP guys but from a husband/wife team of players. 1500+ gamers descend on Reykjavik annually. This is such a large number of extra people for a country of 300k that the conference has to be closely coordinated with the government, hotels, and airlines. The mind reels.

Code Blindness By the end of sprints I was suffering from the geek equivalent of snow blindness. Throughout sprints I traded bug reports, emails, and checkins with Hiro Yamamoto (the "John Smith" of Japan). He'd miss something and I'd whargarbl his name under my breath. I'd miss something and know he was grumbling half way across the world. I pretty clearly lost that battle when I committed a patch that checked to see if unsigned longs were less than zero (oh sure, the compiler can optimize it out, but still..). Which reminds me, I still need to revert that.

We have a prodigy on our critical path. Python's release manager is Benjamin Peterson and Benjamin is sixteen years old. On the internet nobody knows you're a dog and in open source no one cares if you're in High School. He gets stuff done, end of story. There is a small amount of cognitive dissonance involved, but not much. For instance he gave me an attaboy for a patch I submitted last year - and while I have shoes that are older than he is - he sincerely meant it as a compliment and I took it as such. He's good people to have around - though if he gets a driver's license or a girlfriend we're in a spot of trouble. [I talked to his mother only briefly but she treated his hobby as casually as if he was on a sports team.]

Benjamin is not without precedent. Our now somewhat older prodigy is named Georg Brandl. The idea of prolonged adolescence is pretty new in cultural terms (less than 60 years old). Both men are sterling illustrations that when you treat "kids" like adults, they behave like adults (heck, they were adults in the first place but just not acknowledged as so). Let's have more of this please.

Twitter Twitter was the breakout story of the year at PyCon. I've peeked at it several times but never seen the point. I'm so old school I still refer to IM as "talk." Twitter was nowhere to be seen last year but this year it was pervasive. Sure, most of the tweets were mindless blather but they fill the mindless blather niche very well. "bourbon in the Kennedy room" is useful when broadcast but not the kind of thing you'd send an email about. Michael Foord (aka voidspace) gained 50 followers a day during the conference. I have reluctantly broken down and signed up too. Oddly one of my first tweets was answering the question "do I need stitches for this?" which is something I know much about (I had a very full childhood and I have the scars to prove it).

My Talk Video of my talk Class Decorators: Radically Simple is now online. I was pleased with my performance until I saw the video. Thankfully attendees care more about content that presentation because there are a dozen things I would like to do over; I don't have a future as a motivational speaker. I have done a talk on that same topic several times now and this time was a giant rewrite. The night before I was in bed by midnight but tossed and turned. I ended up giving up and rewriting large portions until 5am. I slept for three hours and what you see was me looking at the slides for the second time. All the ridiculous example slides were what people [unsolicited!] came up and told me is what made class decorators "click" for them. Go figure.

There is a raft of little things I would change about the presentation. Unfortunately I won't ever give it again so I'll have to apply them to my next talk (after I think one up). Bloused shirt? gone, starch that thing and make sure it is tucked in. Conversational voice? gone, I have a separate speaker's voice and I didn't use it (lack of sleep?). USB remote slide dongle? gone, I spent as much time aiming the laser pointer at the screen as I did talking to the room. Wireless mike? keep, standing at the podium sucks [I lucked out - I was in the only room that had a wireless mike and I only got to use it because I asked].

Oh, and the perenial "pause between sentences." For the first five minutes I talked like I was reading a teleprompter. There isn't much you can do about this other than practice.

[and then some more errata]

International As I've mentioned before PyCon is the inverse of EuroPython in that it is 75% American and 25% European (eyeball numbers: I'd love to see hard data on this). The speakers list is somewhat more static because there is a subset of people who go to conventions for fun (myself included). To confuse things further there are a number of Americans who weren't born here and some "Americans" who are American but not in name (Alex Martelli is still Italian for sentimental reasons despite living in and literally marrying into to America).

Martelli's Slides Alex Martelli's slides are immediately recognizable because he uses the same background and the same quirky font on all of them, always. I got the scoop from Anna Ravenscroft (a sometimes PyCon speaker and AKA Mrs Alex Martelli). He is fond of the background and font because they remind him of a blackboard. No one has complained so that's all there is to it.

Sprints are Magic Two days of sprints generated the same amount of python-checkin traffic as a regular month. Questions are just so much cheaper in person than in email that it couldn't be otherwise. Raise you hand and say "can anyone tell me about [interface]" and you get an answer. Person-to-person social pressures also lead to quicker bug resolution. Jesse Noller said something like "I assigned a pickle functools bug to you while you were in the can, it seemed up your alley." It wasn't up my alley but a few hours later I had read the pickle docs and checked in a patch to make functools.partial instances pickle-able.

Fixing telnetlib

During the PyCon sprints I re-assigned all open (and unclaimed) telnetlib bugs to myself. The biggest longstanding complaint about telnetlib is that non-trivial negotations aren't possible because the negotiation callback is very bare bones. The biggest problem with telnetlib is that there is almost no test suite - which is why some bugs have been open for seven years. So my priorities are first to test the hell out of telnetlib and second to improve negotiation.

The negotiation problem is clearest when dealing with two-way communications like NAWS (Negotiate About Window Size). The first time the server asks DO NAWS the client can reply WILL NAWS and include its current window size. The current negotiation callback supports this just fine. But when the client resizes its window it needs to be able to tell the server, which means Telnet needs a hook for a pending negotiations queue. And forget about the STATUS code which asks the other end of the connection to say what options it thinks have been negotiated - the current Telnet has no notion of state.

Below are the raw TODO and research notes I put together in a few hours at sprints. I used google code search to find some of the attempts to fix telnetlib by either subclassing it or writing a semi-compatible Telnet-alike from scratch (these are harder to grep for, for obvious reasons). The RFCs section marks each RFC as Must/Will/Won't implement. "Must implement" means core stuff for the Telnet class, "Will implement" means the telnetlib should include a negotation implementation for that RFC, and "Won't implement" means it won't (because the RFC is either archaic or otherwise unused in the wild). The BUGS list includes all open bugs and the closed bugs I want to revisit or double-check.

---- TESTING TELNETLIB ----
* Testing
- test the read_* gaurantees
- test timeouts (already implemented?)
- test the sb handling
* make real negotation possible
* add real timeout and prompt exceptions
* make Telnet objects context managers
* process_rawq is a train wreck. Make sure we do something compatible but less icky.
* figure out where the hell they found all those contstants.
* Why is chr(17)/"\021" blindly filtered out of the stream?

---- BUGS ----

OPEN

http://bugs.python.org/issue5188
telnetlib process_rawq buffer handling is confused

http://bugs.python.org/issue2550
SO_REUSEADDR doesn't have the same semantics on Windows as on Unix

http://bugs.python.org/issue1360221
telnetlib expect() and read_until() do not time out properly

http://bugs.python.org/issue1252001
Issue with telnetlib read_until not timing out

http://bugs.python.org/issue1049450
Solaris: EINTR exception in select/socket calls in telnetlib

http://bugs.python.org/issue708007
TelnetPopen3, TelnetBase, Expect split
[THIS, a rewrite of telnetlib. Mine for good stuff]

http://bugs.python.org/issue1678077
improve telnetlib.Telnet so option negotiation becomes easie

http://bugs.python.org/issue1772788
chr(128) in u'only ascii' -> TypeError with misleading msg

http://bugs.python.org/issue1737737
telnetlib.Telnet does not process DATA MARK (DM)

http://bugs.python.org/issue1772794
Telnetlib dosn't accept u'only ascii'

CLOSED

http://bugs.python.org/issue2451
No way to disable socket timeouts in httplib, etc.

http://bugs.python.org/issue822974
Telnet.read_until() timeout parameter misleading

http://bugs.python.org/issue630829
telnetlib.py: don't block on IAC and enhancement

http://bugs.python.org/issue723312
ability to pass a timeout to underlying socket

http://bugs.python.org/issue1520081
telnetlib.py change to ease option handling.

http://bugs.python.org/issue664020
telnetlib option subnegotiation fix

http://bugs.python.org/issue723364
terminal type option subnegotiation in telnetlib

---- RFCs ----

http://en.wikipedia.org/wiki/Telnet
Wikipedia lists all the relevant RFCs at the bottom.

[--FORMAT--]
URL
Short Description
Will/Won't implement
[--FORMAT--]

http://www.iana.org/assignments/telnet-options
List of officially assigned option codes
Must implement.

http://tools.ietf.org/html/rfc854
(1983) Telnet protocol definition.
Must implement.

http://tools.ietf.org/html/rfc855
(1983) Telnet negotation.
Must implement.

http://tools.ietf.org/html/rfc856
(1983) Telnet binary protocol.
Won't implement. This was obviated by Kermit, Zmodem, and the like.

http://tools.ietf.org/html/rfc857
(1983) Telnet ECHO negotiation.
Will implement.

http://tools.ietf.org/html/rfc858
(1983) Supress Go-Ahead. Nego supression of "your turn" messages for full duplex connections.
Won't implement.

http://tools.ietf.org/html/rfc859 (Obsoletes http://tools.ietf.org/html/rfc651)
(1983) Telnet status. Ask other party to retransmit what they think the current negotiated options are.
Will implement.

http://tools.ietf.org/html/rfc860
(1983) Timing mark. A work around for servers that can't read the socket as fast as people type (!!!).
Won't implement.

http://tools.ietf.org/html/rfc861
(1983) negotiating about negotiating
Proln't, Doubtful this is still in effect.

http://tools.ietf.org/html/rfc885
(1983) End-of-Record code.
Might, I have a vague recollecting that this is used as a prompt sigil.

http://tools.ietf.org/html/rfc1073
(1988) NAWS (Negotiate About Window Size)
Will implement.

http://tools.ietf.org/html/rfc1079
(1988) Baud rate negotiation
Won't implement.

http://tools.ietf.org/html/rfc1091 (Obsoletes http://tools.ietf.org/html/rfc930)
(1989) Terminal type negotiation
Will implement.

http://tools.ietf.org/html/rfc1184 (Obsoletes http://tools.ietf.org/html/rfc1116)
(1990) Telnet linemode nego. Basically save packets by being less interactive.
Won't implement.

http://tools.ietf.org/html/rfc1372 (Obsoletes http://tools.ietf.org/html/rfc1080)
(1992) Terminal flow control. Local terminal stuff.
Won't implement.

http://tools.ietf.org/html/rfc2217
(1997) SLIP-lite protocol for sharing a modem.
Won't implement.

http://tools.ietf.org/html/rfc2946
(2000) Telnet Encryption nego.
Won't implement (does anyone actually use this?)

http://tools.ietf.org/html/rfc4777
(2006) IBM iSeries hardware telnet extensions.
Won't implement (starngely, the RFC argues against implementing itself)

---- Alternate Implementations ----
[found using google code search]
a hacky ECHO negotiator

subclass-and-patch NAWS negotiator

a from-scratch wrapper

a from-scratch reimplementation w/ better (but unpythonic) negotiating.

Tuesday, March 31, 2009

PyCon Organizers: Doug loves numbers and numbers love Doug

The PyCon organizers struggle to make each conference better than the last. As I mentioned in another post they did a bang up job on logistics this year. One metric they track is speaker popularity. It is a bit fuzzy because hot topics vary and individual speakers can do well on one topic while sucking at another. But the organizers do try to sift out the best.

Speaker Data

This year generated a bonanza of speaker data. The online talk schedule was all interact-y and allowed attendees to plan and print their preferred talks ahead of time. During the conference itself the back of every room had a pile of poker chips and three buckets: Green, Yellow, and Red. The idea being that everyone drops a chip in the Good/Neutral/Bad bucket as they walk out at the end of each talk [hopefully they leave at the end].

Doug Knowns Data

If you were thinking the stats are weak and there are many ways to game the numbers you are very right and very wrong. Doug Napoleone loves data even more than I do so he's doing regressions like nobodies business. Doug has a post up explaining the raw data and the problems associated with turning the raw stuff into usable numbers. He does speech recognition software as his day job so he knowns statistics backwards and forwards.

Is This Pythonic? Conclusion

I couldn't leave well enough alone and continued to refactor the code from the Is This Pythonic? Open Space. The final version I submitted as a patch to the original project. The biggest change was not in apply_all (as I assumed) but in writing a new chunk of code that sucks all the ugly and special cases from the rest of the code and puts it in one place. I don't know if there is a pattern name for this but it tends to happen at boundaries. Pretty print functions are usually ugly for instance, and for good reason - your only other choice is to ugly up the core.

So here is FetchAccumulator. It sits at the boundry just above the database calls and returns tidy, regular data to its callers.


class FetchAccumulator(object):
  def __init__(self, sql, args=None, fetch_per=-1, limit=-1):
    self.results = []
    self.sql = sql
    self.args = args
    self.fetch_per = fetch_per
    self.limit = limit
    return

  def fetch(self, cursor):
    cursor.execute(self.sql, self.args)
    if self.fetch_per == 1:
      results = cursor.fetchone()
      assert len(results) <= 1, results
    elif self.limit > 0:
      results = cursor.fetchmany(self.limit)
      assert len(results) <= self.limit, (len(results), self.limit)
    else:
      results = cursor.fetchall()

    if not results or not filter(None, results): # code smell
      return

    self.results.extend(results)
    self.limit -= len(results)

    if not self.limit: # we fetched our limit
      raise DoneApply()
    return

  def __iter__(self):
    return iter(self.results)

This makes the other functions much, much simpler. Here are four database query functions that use FetchAccumulator. Seventy lines are now twenty.


class ShardCursor(cursor.BaseCursor):
    def selectOne(self, sql, args=None):
        accum = FetchAccumulator(sql, args, fetch_per=1, limit=1)
        apply_all(valid_shards(self._shard), accum.fetch)
        return accum

    def selectMany(self, sql, args=None, size=-1):
        accum = FetchAccumulator(sql, args, limit=size)
        apply_all(valid_shards(self._shard), accum.fetch)
        return accum

    def selectAll(self, sql, args=None):
        accum = FetchAccumulator(sql, args)
        apply_all(valid_shards(self._shard), accum.fetch)
        return accum

    def countOne(self, sql, args=None):
        accum = FetchAccumulator(sql, args, fetch_per=1)
        apply_all(valid_shards(self._shard), accum.fetch)
        return accum

Of course these functions now have their own code smell -- they only vary in their accumulator so they could be collapsed into a single function. That would require refactoring all the calling code which is a bigger project than I wanted to take on.

The apply_all function grew a proper exception to allow callers to bail out of the loop early.


class DoneApply(Exception): pass

def apply_all(shards, func):
  for shard in shards:
    db = shard.establishConnection()
    try:
      cursor = db.cursor()
      func(cursor)
    except DoneApply:
      break
    finally:
      db.close()

I'll omit the unit tests. The original project had no unit tests for this code so I had to write some to make sure my refactoring wasn't breaking anything.

Sunday, March 29, 2009

Is This Pythonic?

Moshe Zadka and I did an Open Space titled "Is This Pythonic?" where we took someone else's code and reworked it to be cleaner. The code we worked on was cursors.py from the PyShards project.

[Originally Steve Holden and Raymond Hettinger were going to host it (they've done it before) but Steve bowed out and Raymond decided to go downtown with his girl]

Here is the selectOne function in it's original form.


def selectOne(self, sql, args=None):
  results = []
  shard = self._shard;
  while shard != None and len(results) == 0:
    db = shard.establishConnection()
    cursor = db.cursor()
    cursor.execute(sql, args)
    res = cursor.fetchone()
    if res != None:
      results.extend(res)
      cursor.close ()
      db.close ()
      shard = shard.next
  return results

The code mixes a bunch of conceptual actions in one big blob. It is walking a linked list* of shards. It acquires a resource (making it harder to test) but doesn't safely release it in a try/finally. It builds up a list of results, and finally returns it. That's a lot of things for one function to be doing at once.

Below was the first cut. Each action is broken into a separate function. Because there are many functions almost like this one we can even reuse those parts.

* The linked list should just be a list, but that's a bigger refactoring.


def valid_shards(shard):
  ''' walk the shards linked list, yielding the items '''
  while shard:
    yield shard
    shard = shard.next

def apply_all(shards, func):
  ''' for each shard connect to the database, create a cursor, and pass it to func '''
  for shard in shards:
    db = shard.establishConnection()
    try:
      cursor = db.cursor()
      yield func(cursor)
    finally:
      db.close()

def selectOne(self, sql, args):
  ''' execute sql on each shard, returning the first row (if any) on each shard'''
  def fetchone(cursor):
    return curser.fetchone(sql, args)

  results = apply_all(valid_shards(self._shard), fetchone)
  return filter(None, results)

So each function has a little job and does it in a straghtforward way. Because the module has many methods that are almost like selectOne() we should be able to reuse those parts. So we gave it a try on selectMany()


def selectMany(self, sql, args=None, size=None):
        results = []
        stillToFetch = size
        shard = self._shard;
        while shard != None and stillToFetch > 0:
            db = shard.establishConnection()
            cursor = db.cursor()
            cursor.execute(sql, args)
            res = cursor.fetchmany(stillToFetch)
            if res != None:
                results.extend(res)
                stillToFetch = stillToFetch - len(res)
            cursor.close ()
            db.close ()
            shard = shard.next
        return results

SelectMany has an extra wrinkle that SelectOne doesn't in that it will stop early if it gets enough result rows. The apply_all function doesn't have a hook for stopping early so we have to kludge one into the function we pass in. Here is the first draft that has a big code smell. Raising StopIteration will do the right thing but it won't if the implementation changes.


def selectMany(self, sql, args=None, size=None):
  limit = [size]
  def fetchmany(cursor):
    res = cursor.fetchmany(sql, args)
    limit[0] = limit[0] - len(res)
    if size is not None and limit[0] <= 0:
      raise StopIteration
  
  for results in apply_all(valid_shards(self), fetchmany):
    for result in results:
      yield result

This code would be much cleaner in python2.6, and much much cleaner in 2.7 (the dev trunk). So let's pretend that 'nonlocal' and 'yield from' are available.


def selectMany(self, sql, args=None, size=None):
  def fetchmany(cursor):
    nonlocal size
    res = cursor.fetchmany(sql, args)
    size -= len(res)
    if size and size <= 0: # our bug just became more obvious!
      raise StopIteration
  
  for results in filter(None, apply_all(valid_shards(self), fetchmany)): # bug fixed!
    yield from results

[I fixed the missing filter bug in that one too]
Let's fix that size bug and raise a specific exception so our code is safe even if the implementation of apply_all changes.


def selectMany(self, sql, args=None, size=None):
  class LimitReached(Exception): pass
  def fetchmany(cursor):
    nonlocal size    
    if size is not None and size <= 0: # bug fixed!
      raise LimitReached
    res = cursor.fetchmany(sql, args)
    size -= len(res)

  try:
    for results in filter(None, apply_all(valid_shards(self), fetchmany)):
      yield from results
  except LimitReached:
    pass

Yuck. That might more correct but now the code smell is stinking up the room. What we need to do is stuff more smarts into apply_all().
To Be Continued...

PyCon Organizers

The organizers did a bang up job this year. With the addition of a green room for speaker prep (coffee available all day w/ your speaker badge) and walkie-talkies for the organizers everything went smoothly. I heard a couple gripes about the Wifi but I haven't had any problems myself.

The hitches are with the venue and are the usual complaints. They charge $25/day for wifi in the hotel rooms. We don't have projectors in the open space rooms because they want $600/per projector per day. Coffee and beverage service gets torn down and put back up repeated so they can charge each time. Annoying.

The videos for the conference are already being posted. This is an amazing feat - each talk has three video and several audio channels that have to be spliced together. I'll link when mine goes up.

Thanks guys!

Open Spaces Board is teh Funny

Someone got drunk and clever and backfilled yesterdays Open Spaces schedule board with fictional talks [typos mine, I was touch typing]. Another guy is making a panorama photo, I'll add a link when he sends it to me.

Teach Me Ian Bicking.
Settings.py: Why sysadmins love editing your .py files.
A.N.U.S.: Plugabble Sphincters
Traversal: URL mapping is for the west.
Forking: because arguing is too hard.
Djylons: Let's make it happen.
Fulton v Rossum: Cage Match ($10)
GROIN: Come see how it works.
Zope: Making the simple IMPOSSIBLE.
Plone: Making Zope unreadable.
Acquisition Algebra: Fultonian mind fuck & other OOPSLA oddities.
Tic Tac Toe: Learn how to play, learn the secret strategies.
Tresling: Arm wrestling + Tetris, let us teach you!
Wheels: let's try them square (w/ pic of a trapazoid).
Niagra Planning: Waterfall 2.0 session
Catastophe Planning: Waterfall 3.0 session
Play: Rubix cube with a brown belt.
Reality: My hairy twisted pony.
Pickle: Love/Hate.
Let's get the hell out of Rosemont and find something decent to eat.
re Rosemont: STEAK.
ISO 9000: The future of python?

Saturday, March 28, 2009

PyCon Day2

Class Decorators: Radically Simple

I gave my talk today, slides are available on the PyCon website (the ppt version might be crap - I exported from OpenOffice). I added two pages of speaker's notes at the top that answer some questions (and whargarbls) and some eratta.

The final talk only shares a few slides with the PyConUK version and maybe none with the original EuroPython version. I went to bed at midnight like a good boy but tossed and turned thinking of the talk until 2am. At that point I gave up and rewrote a big chunk of the talk (which had already been rewritten since Boston two weeks ago). I finished around 5am and then sacked out for 3 hours. Somehow it managed to come in at the perfect length of 25 minutes (+5 for Q/A).

Now that I have this talk licked, I'll be retiring it. I have about 8 months to come up with an idea for next year.

Talks

[updated as I go to them]
Manfred Schwendinger: "Google App Engine: How to survive in Google's Ecosystem"
This was a detailed description of how his particular application uses cloud services (they use both Amazon EC2 and Google AppEngine). It was interesting but very detail oriented (we do this like this, and that like that). I'll be downloading the slides for future reference.

Bob Ippolito: "Drop ACID and think about data"
Bob's talk was about using non-ACID (basically non-SQL) storage. A good overview of why you, the web developer, probably don't care about the things ACID databases do well and don't care about the things alternate data stores (key-value, column-based, "persistent eventually") do badly. It's a good trip through all the available alternatives. A intro pitch about each class of stores and then a quick overview of the major implementations. This talk was packed.

Ned Betchelder: "Whirlwind Excursion thrhough Writing a C Extension"
A good primer for writing modules and types in C. It's a massive subject and Ned did a good job of showing one of everything. I wasn't at the Boston meetup where he previewed his talk so I was glad to make this one.

Alex Martelli: "Abstractions as Leverage"
A typical Martelli talk, which is to say very good. His sonorous voice would tame rabid badgers. To say he's erudite doesn't begin to cover it - in 20 slides he used quotes from blogs, the American Journal of Psychriatry ("I recommend everyone subscribes"), and of course lots of dead people including a Chinese sword fighting manual (what, no Clausewitz?). Here are some of the bullets (full slides http://aleax.it/python_abst.pdf)
* to use an abstraction well you need to understand at least two layers below it
* (Splosky's Law) "All Abstractions leak", which is to say all abstractions lie.
* ex/ NFS isn't a cloal file system. It can be useful to treat it like one but if you don't know how it works you are going to get burned sooner or later.
* You can be a good python programmer without understanding how it is implemented but you can never be a great one.
* You can't write a good abstraction unless you know they layers above too -- how it will actually be used.

Friday, March 27, 2009

PyCon Day 1

Today is the first day of the conference proper. The most popular talk (measured by the online talk planner widget) was canceled. Titlted "Designing Applications with Non-Relational Databases" I was sure to go, but alas the speaker canceled for reasons unknown.

This year there is a "Green Room" for speakers and conference volunteers. I wasn't expecting it to be green but I was hoping for a lounge. Instead it is a purely functional ops area - the network is run from here. It has power strips, a test projector, and free coffee. It might not have a wet bar but it is still a nice perk.

Talks

[updated as I go to them]
Brett Cannon "How Python is Developed." It was a good overview of core python development pitched at newbies. He sketched out the basic bug and feature cycles, how to [eventually] get core commit privs, etc. It was mainly an informational session so it included lost of links to the existing documentation (some of which was written by Brett).
Jess Noller "Introduction to Multiprocessing in Python." 'Multiprocessing' is a module that lets you do .. multiprocessing in python. I only new vaguely what it did before. Now I now kinda what it does. I might know more but I was busy refactoring some itertools types [see below].
Raymond Hettinger "Easy AI in Python." A ramble through several different problems with code of the solvers. The point is to show how easy it is to solve most problems. So easy that a kid could literally do it (part of the talk was about why kids should do it). I missed most off this one because I was hacking [see below] but I'd seen it before so I didn't mind.

Balls

I should read python-dev more regularly. It turns out Hettinger went and implemented fast-C permutations, combinations, and cartesian product in the itertools module. You know, just like the probstat module I wrote. That old code is pretty un-pythonic (I wrote it in my inbetween stage so it is a generic lib with both python and perl wrappings). I had a mostly finished rewrite that was CPython from the ground up and - suprise! - it looks almost identical to Raymond's. Almost, I spun out the iterator into a separate object so the base object could have a len (iterators aren't allowed to support len). His doesn't have random access but that is one of the things no one used on mine so I was going to drop it anyway.

Python Language Summit

[this is about yesterday, I'm getting caught up]

Yesterday was the Python Language Summit. 40+ of the core developers of all the python implementations (CPython, Jython, IronPython) met to discuss stuff. The meeting was five hours in 1.25 hour chunks. Morning topics included goals for future releases, the timing of future releases, and what processes we need to change, if any. Afternoon topics were how to share more stuff (tests and benchmarks) across implementations, and how to combine different the various setup/packaging projects into one.

Meeting face-to-face is always easier than using a mailing list. The conversations are synchronous, latency is low, and decorum is higher. The meeting could have used a dose of Robert's Rules of Order - it wasn't always clear when we had reached consensus so topics drifted until even tangents had been exhausted. Some quick pronouncements might have been better.

What decisions were made? I'm not really sure. Python 2.7 might be the last in the 2.x series, or it might not. New libraries are more likely to get backported than new core language features. If you were expecting to continue on the 2.x series until a magic day when you found yourself writing 3.x code you will be waiting forever. There was some interest in writing a 3to2 source converter that mirrors the 3to2 tool. Because 3.x has stronger semantics (only one obvious way to do it) a 3to2 tools is theoretically easier to write; but a 3to2 tool might also target many 2.x versions so "easier" is still a lot of work.

The different implementations will be sharing more tests and benchmarks in the future. Probably. There was general agreement we should but it will be up to individuals to make it so (as always). Ditto for packaging - everyone agreed that the different tools should combine but the devil is in the details - so the discussion is being moved to the packaging-SIG list.

Wednesday, March 25, 2009

In Chicago

All checked in at the Crown Plaza (where most of the core devs seem to be staying). Took the hotel shuttle over with Glyph and Brian Dorsey. Glyph and I had different flights from BOS to ORD but arrived at the same time. Go figure.

The talk of the town so far is Unladen Swallow - a google effort to replace the CPython byte code machine with LLVM. Goal #1 "Produce a version of Python at least 5x faster than CPython." Holy S**t.

Tuesday, March 24, 2009

PyCon Tomorrow

I arrive in the afternoon on the 25th and will be there through April 1st (leaving in the PM).

Tip: DO NOT bring a heavy coat unless you are from a sunshine state. Chicago sounds cold but like last year it is supposed to be warmer in Chicago than Boston (55F+ during the day, 35F+ at night). Last year I stepped off the airplane wearing an overcoat and it turned warm into sweltering hot.

No mustache again this year .. and maybe never again. While it was fun so was being a longhair with an eyebrow ring in the early 90s (dude, it was the early 90s). Many things are fun, once.

I have business cards this year. Plain $0.25/ea Kinkos cards and not the $2.00/pop custom wonders I had at my tech/marketing company. Last year I had neither and felt naked.

Consider the above an announcement that I'm officially back in the job market; tomorrow will be two years exactly that I've been on vacation. My bowling game has improved greatly, my golf game not as much (though to be fair I've been bowling for 2 years and golfing for 25). It was nice to be available to travel to every birthday/baptism/marriage/funeral/etc and go abroad on a whim but it does get old (and the pay stinks). Mainly I'm looking forward to working with a team again. Personal projects are fun but it ain't the same.

Wednesday, March 18, 2009

Afterward: PyCon on the Charles

The Boston mini-PyCon went well. The Beta House was at max capacity with 30+ in attendance. I met Jesse Noller for the first time; I'm not sure how this managed to be a first because he's a python dev and has been at the last few PyCons. I'd bet there are pictures on flickr that have both of us in frame.

Noller's talk was a thousand foot view of the plethora of multiprocessing/concurrent/messaging/you-name-it frameworks in python. He compared the current proliferation to the mess of competing web and ORM frameworks of two years ago. Sounds about right.

Taylor's talk was about Reinteract, his spreadsheetish interactive python shell. I expected to sleep through this but the app is actually interesting. It falls somewhere between IPython and Resolver One in functionality. I'm sure he and the Resolver guys will have lots to talk about.

My talk went OK. Half the slides are new (again) and I really like the individual slides, but in the rewrite it lost its narrative (the "Radically Simple" of the title). I loved the idea of jumping in on the second slide with an example titled "Why this is Cool" but it fell flat because the first slide didn't explain the pains and foibles of doing without class decorators. I also need to reinsert the longer explanation of what decorators (and metaclasses and mixins) are. The PyCon crowd will be more savvy than the Cambridge user's group but not that much more savvy. A few people said "that looks really cool but I have no idea WTF you are talking about." My talk is flagged as "advanced" in the program but beginner/intermediate/advanced is ignored by attendees (and usually isn't on the printed schedule).

Bruce Eckel will not be giving a keynote as he broke his leg badly while skiing. This is a mixed blessing for me because Bruce's keynote was about class decorators and metaclasses. Good for me because people won't skip my talk for his keynote. Bad for me because I wanted to see his talk and was looking forward to talking to him. Oh, and now I have a useless T-shirt that says "Bruce Eckel Stole My Talk" on the front and "I Stole Bruce Eckel's Talk" on the back.

Ned Bachelder gave his PyCon talk A Whirlwind Excursion through Python C Extensions at the previous meetup. Do click the link, it includes his slides interspersed with his own commentary. I did a similar talk titled "Writing Your Own Python Types in C" a couple years ago so Ned & I traded notes. One of the slides is a nod to our conversation. It isn't as egoboo as the time time Guido gave me full slide w/ attribution in his "State of Python" address, but I'll take it.

Tuesday, March 3, 2009

PyCon on the Charles

Ned Batchelder has organized a preview/practice session of all the Boston based pythoneer's talks. Ned gave his talk last month at the Cambridge meetup, and March 18th there will be a three hour session with Jesse Noller, Owen Taylor, and me at the Beta House. Talks start at 6:30pm. If you haven't been to a PyCon it is a chance to see a mini version of one. I'll even bring the beer.

Talks are:
Noller: Concurrency and Distributed Computing with Python Today
Taylor: Reinteract: a better way to interact with Python
Diederich: Class Decorators: Radically Simple

[fixed] I misattributed Taylor's talk to someone else.

Sunday, March 1, 2009

Python Language Summit

[cross posted from my non-python blog with some minor edits]

I just got my invitation to the Python language summit. I had planned on going anyway, so the invitation just makes it less awkward for everyone involved. The summit overlaps with the tutorial days of PyCon because, well, by definition if you belong in the summit you don't belong in the tutorials [tutorial instructors can suck it].

The summit is interesting because it is unusual* - open source events are usually open access too. The agenda is wide open and is roughly focused on standardization and the future, whatever that is. The invitees are the core developers of all the implementations of Python: regular Python (aka "CPython"), Jython (Java), and IronPython (C#).

I'm not sure what the purpose of the invites is, other than to convey weight. To make the list is to ask "who are the 50 people on the planet who would actually want to come?" Griefers and trolls would be bounced regardless [come to think of it, maybe I was invited because I have no compunctions about bouncing griefers and trolls] so perhaps the invitations are meant to discourage the well meaning but clueless. Every convention has at least a few of those - does anyone remember the "callable None" guy from a few years back? "well meaning but clueless" doesn't begin to describe him.

* It is not without precedent. See the the Reykjavik sprint. [my favorite bit from those posts "Cod jerky smells a bit like feet but tastes OK. Dried shark smells a lot like feet and tastes exactly like dried asshole."]

Friday, February 13, 2009

PSA: Wireless Keyboards

The batteries in your wireless keyboard don't die at arbitrarily long intervals. You put your cellphone close to the receiver at arbitrary intervals and then move it while changing the batteries.

And by "you" I mean "me."

Thursday, January 22, 2009

Look Who's [not] Talking at PyCon

The schedule is up and it looks good. It is a mix of conference warriors plus some new blood; likewise the talks are a mix of old standards and new topics (some of the new talks are by conference warriors and vice versa).

The familiar names include Brett Cannon (cpython), Jim Baker, (jython), and Michael Foord (ironpython). There is a host of names that might be missing or might not - I can't recall if they do talks every year - but there is no Norwitz, Warsaw, Holden, or Martelli. Noticeably absent is Raymond Hettinger who has given several talks per con at several cons a year. Noticeably present is EVE Online in the form of Richard Tew and Krisjan; with the Icelandic Krona where it is I don't know how they can afford a taxi let alone air fare.

On the new list is Bill Gribble. Bill Gribble gets a free beer for having one of the best names ever (if his Mother shows up she can collect for her work, instead). I bet none of his friends ever start a story "I was having lunch with my friend Bill..." But instead always "I was having lunch with my friend Bill Gribble..." I don't know Gribble from a hole in the wall but I'll make a point of changing that.

[update] Talked to Raymond and he'll be attending, at least (barring life's usual caveats).
[updateder] As Brett and Ivan mention in the comments there is now a tier of invited talks that includes many of the "missing" conference regulars. Glad to have em, the "hall track" is my favorite and many on the invited speakers list make it so.

Sunday, January 18, 2009

Post's Machine

An undergrad CompSci major, Shriphani Palakodety, posted an implementation of Post's Machine. Post's machine is a simple computer that has separate data and execution storage. Here's my implementation and my advice to Shriphani - Keep it simple! When working on hard problems post-graduation you'll find that code gets complicated of its own volition. The job of the writer is to fight it at every turn (the next guy to come along will appreciate the effort).

This version replaces the two custom containers with python dicts and moves all the complicated code to the pretty printer. A collections.defaultdict might be a slightly better choice for the 'infinite tape' that starts as all zeros. A nice thing about defaultdicts is that you can min() and max() even empty dicts.

I also did a python3.0 version which was nearly identical except for a 'with' on the file open and print-as-fucntion. Disappointingly I thought the advanced tuple unpack syntax would help in the case where a too-short tuple is padded and then the padding discarded.


# args might be a two or three tuple
a, b, c = (args + [0])[:3]

# python3 syntax
a, b, c, *ignore = args + [0]

The *ignore argument demands to be read as opposed to the [:3] trimmer on the end which keeps the low profile it deserves.

Here's the code for my Post Machine

def parse(lines):
  program = {}
  for line in lines:
    pos, action, jump = [p.strip() for p in line.split(',') + ['0']][:3]
    program[pos] = action, jump
  return program

def execute(program):
  tape = {}
  tape_pos = 0
  action = None
  action_pos = '0'

  while action != 'exit':
    action, action_pos = program[action_pos]

    pretty_tape(tape, tape_pos)

    if action == '<':
      tape_pos -= 1
    elif action == '>':
      tape_pos += 1
    elif action == 'mark':
      tape[tape_pos] = 1
    elif action == 'unmark':
      tape[tape_pos] = 0

  return tape, tape_pos

def pretty_tape(tape, tape_pos):
  if not tape:
    tape = {0:0}
  min_pos = min(tape_pos, min(tape), 0)
  max_pos = max(tape_pos, max(tape), 4)

  parts = []
  for pos in range(min_pos, max_pos + 1):
    val = tape.get(pos, 0)
    if pos == tape_pos:
      parts.append('[%d]' % val)
    else:
      parts.append('%d' % val)

  print '.....', ', '.join(parts), '.....'

if __name__ == '__main__':
  program = parse(open('post.txt'))
  execute(program)

Wednesday, December 30, 2009

Toronto, Feb 16, Linux Caffe

Boston, Jan 20, Microsoft NERD center

Boston, Feb 3, Microsoft NERD center

Seattle, Jan 30, Paul Allen Center

Monday, October 5, 2009

Wednesday, September 30, 2009

Friday, August 21, 2009

Sunday, July 12, 2009

Sunday, July 5, 2009

Problem Defined

ctypes

pygame

Tuesday, April 7, 2009

Saturday, April 4, 2009

Friday, April 3, 2009

Tuesday, March 31, 2009

Speaker Data

Doug Knowns Data

Sunday, March 29, 2009

Saturday, March 28, 2009

Class Decorators: Radically Simple

Talks

Friday, March 27, 2009

Talks

Balls

Wednesday, March 25, 2009

Tuesday, March 24, 2009

Wednesday, March 18, 2009

Tuesday, March 3, 2009

Sunday, March 1, 2009

Friday, February 13, 2009

Thursday, January 22, 2009

Sunday, January 18, 2009

Blog Archive

About Me

Pythoneers