Tuesday, March 31, 2009

PyCon Organizers: Doug loves numbers and numbers love Doug

The PyCon organizers struggle to make each conference better than the last. As I mentioned in another post they did a bang up job on logistics this year. One metric they track is speaker popularity. It is a bit fuzzy because hot topics vary and individual speakers can do well on one topic while sucking at another. But the organizers do try to sift out the best.


Speaker Data


This year generated a bonanza of speaker data. The online talk schedule was all interact-y and allowed attendees to plan and print their preferred talks ahead of time. During the conference itself the back of every room had a pile of poker chips and three buckets: Green, Yellow, and Red. The idea being that everyone drops a chip in the Good/Neutral/Bad bucket as they walk out at the end of each talk [hopefully they leave at the end].


Doug Knowns Data


If you were thinking the stats are weak and there are many ways to game the numbers you are very right and very wrong. Doug Napoleone loves data even more than I do so he's doing regressions like nobodies business. Doug has a post up explaining the raw data and the problems associated with turning the raw stuff into usable numbers. He does speech recognition software as his day job so he knowns statistics backwards and forwards.

Is This Pythonic? Conclusion

I couldn't leave well enough alone and continued to refactor the code from the Is This Pythonic? Open Space. The final version I submitted as a patch to the original project. The biggest change was not in apply_all (as I assumed) but in writing a new chunk of code that sucks all the ugly and special cases from the rest of the code and puts it in one place. I don't know if there is a pattern name for this but it tends to happen at boundaries. Pretty print functions are usually ugly for instance, and for good reason - your only other choice is to ugly up the core.

So here is FetchAccumulator. It sits at the boundry just above the database calls and returns tidy, regular data to its callers.

class FetchAccumulator(object):
def __init__(self, sql, args=None, fetch_per=-1, limit=-1):
self.results = []
self.sql = sql
self.args = args
self.fetch_per = fetch_per
self.limit = limit
return

def fetch(self, cursor):
cursor.execute(self.sql, self.args)
if self.fetch_per == 1:
results = cursor.fetchone()
assert len(results) <= 1, results
elif self.limit > 0:
results = cursor.fetchmany(self.limit)
assert len(results) <= self.limit, (len(results), self.limit)
else:
results = cursor.fetchall()

if not results or not filter(None, results): # code smell
return

self.results.extend(results)
self.limit -= len(results)

if not self.limit: # we fetched our limit
raise DoneApply()
return

def __iter__(self):
return iter(self.results)

This makes the other functions much, much simpler. Here are four database query functions that use FetchAccumulator. Seventy lines are now twenty.

class ShardCursor(cursor.BaseCursor):
def selectOne(self, sql, args=None):
accum = FetchAccumulator(sql, args, fetch_per=1, limit=1)
apply_all(valid_shards(self._shard), accum.fetch)
return accum

def selectMany(self, sql, args=None, size=-1):
accum = FetchAccumulator(sql, args, limit=size)
apply_all(valid_shards(self._shard), accum.fetch)
return accum

def selectAll(self, sql, args=None):
accum = FetchAccumulator(sql, args)
apply_all(valid_shards(self._shard), accum.fetch)
return accum

def countOne(self, sql, args=None):
accum = FetchAccumulator(sql, args, fetch_per=1)
apply_all(valid_shards(self._shard), accum.fetch)
return accum

Of course these functions now have their own code smell -- they only vary in their accumulator so they could be collapsed into a single function. That would require refactoring all the calling code which is a bigger project than I wanted to take on.

The apply_all function grew a proper exception to allow callers to bail out of the loop early.

class DoneApply(Exception): pass

def apply_all(shards, func):
for shard in shards:
db = shard.establishConnection()
try:
cursor = db.cursor()
func(cursor)
except DoneApply:
break
finally:
db.close()


I'll omit the unit tests. The original project had no unit tests for this code so I had to write some to make sure my refactoring wasn't breaking anything.

Sunday, March 29, 2009

Is This Pythonic?

Moshe Zadka and I did an Open Space titled "Is This Pythonic?" where we took someone else's code and reworked it to be cleaner. The code we worked on was cursors.py from the PyShards project.


[Originally Steve Holden and Raymond Hettinger were going to host it (they've done it before) but Steve bowed out and Raymond decided to go downtown with his girl]


Here is the selectOne function in it's original form.


def selectOne(self, sql, args=None):
results = []
shard = self._shard;
while shard != None and len(results) == 0:
db = shard.establishConnection()
cursor = db.cursor()
cursor.execute(sql, args)
res = cursor.fetchone()
if res != None:
results.extend(res)
cursor.close ()
db.close ()
shard = shard.next
return results

The code mixes a bunch of conceptual actions in one big blob. It is walking a linked list* of shards. It acquires a resource (making it harder to test) but doesn't safely release it in a try/finally. It builds up a list of results, and finally returns it. That's a lot of things for one function to be doing at once.


Below was the first cut. Each action is broken into a separate function. Because there are many functions almost like this one we can even reuse those parts.


* The linked list should just be a list, but that's a bigger refactoring.


def valid_shards(shard):
''' walk the shards linked list, yielding the items '''
while shard:
yield shard
shard = shard.next

def apply_all(shards, func):
''' for each shard connect to the database, create a cursor, and pass it to func '''
for shard in shards:
db = shard.establishConnection()
try:
cursor = db.cursor()
yield func(cursor)
finally:
db.close()

def selectOne(self, sql, args):
''' execute sql on each shard, returning the first row (if any) on each shard'''
def fetchone(cursor):
return curser.fetchone(sql, args)

results = apply_all(valid_shards(self._shard), fetchone)
return filter(None, results)

So each function has a little job and does it in a straghtforward way. Because the module has many methods that are almost like selectOne() we should be able to reuse those parts. So we gave it a try on selectMany()

def selectMany(self, sql, args=None, size=None):
results = []
stillToFetch = size
shard = self._shard;
while shard != None and stillToFetch > 0:
db = shard.establishConnection()
cursor = db.cursor()
cursor.execute(sql, args)
res = cursor.fetchmany(stillToFetch)
if res != None:
results.extend(res)
stillToFetch = stillToFetch - len(res)
cursor.close ()
db.close ()
shard = shard.next
return results

SelectMany has an extra wrinkle that SelectOne doesn't in that it will stop early if it gets enough result rows. The apply_all function doesn't have a hook for stopping early so we have to kludge one into the function we pass in. Here is the first draft that has a big code smell. Raising StopIteration will do the right thing but it won't if the implementation changes.

def selectMany(self, sql, args=None, size=None):
limit = [size]
def fetchmany(cursor):
res = cursor.fetchmany(sql, args)
limit[0] = limit[0] - len(res)
if size is not None and limit[0] <= 0:
raise StopIteration

for results in apply_all(valid_shards(self), fetchmany):
for result in results:
yield result

This code would be much cleaner in python2.6, and much much cleaner in 2.7 (the dev trunk). So let's pretend that 'nonlocal' and 'yield from' are available.

def selectMany(self, sql, args=None, size=None):
def fetchmany(cursor):
nonlocal size
res = cursor.fetchmany(sql, args)
size -= len(res)
if size and size <= 0: # our bug just became more obvious!
raise StopIteration

for results in filter(None, apply_all(valid_shards(self), fetchmany)): # bug fixed!
yield from results

[I fixed the missing filter bug in that one too]
Let's fix that size bug and raise a specific exception so our code is safe even if the implementation of apply_all changes.

def selectMany(self, sql, args=None, size=None):
class LimitReached(Exception): pass
def fetchmany(cursor):
nonlocal size
if size is not None and size <= 0: # bug fixed!
raise LimitReached
res = cursor.fetchmany(sql, args)
size -= len(res)

try:
for results in filter(None, apply_all(valid_shards(self), fetchmany)):
yield from results
except LimitReached:
pass

Yuck. That might more correct but now the code smell is stinking up the room. What we need to do is stuff more smarts into apply_all().
To Be Continued...

PyCon Organizers

The organizers did a bang up job this year. With the addition of a green room for speaker prep (coffee available all day w/ your speaker badge) and walkie-talkies for the organizers everything went smoothly. I heard a couple gripes about the Wifi but I haven't had any problems myself.

The hitches are with the venue and are the usual complaints. They charge $25/day for wifi in the hotel rooms. We don't have projectors in the open space rooms because they want $600/per projector per day. Coffee and beverage service gets torn down and put back up repeated so they can charge each time. Annoying.

The videos for the conference are already being posted. This is an amazing feat - each talk has three video and several audio channels that have to be spliced together. I'll link when mine goes up.

Thanks guys!

Open Spaces Board is teh Funny

Someone got drunk and clever and backfilled yesterdays Open Spaces schedule board with fictional talks [typos mine, I was touch typing]. Another guy is making a panorama photo, I'll add a link when he sends it to me.

Teach Me Ian Bicking.
Settings.py: Why sysadmins love editing your .py files.
A.N.U.S.: Plugabble Sphincters
Traversal: URL mapping is for the west.
Forking: because arguing is too hard.
Djylons: Let's make it happen.
Fulton v Rossum: Cage Match ($10)
GROIN: Come see how it works.
Zope: Making the simple IMPOSSIBLE.
Plone: Making Zope unreadable.
Acquisition Algebra: Fultonian mind fuck & other OOPSLA oddities.
Tic Tac Toe: Learn how to play, learn the secret strategies.
Tresling: Arm wrestling + Tetris, let us teach you!
Wheels: let's try them square (w/ pic of a trapazoid).
Niagra Planning: Waterfall 2.0 session
Catastophe Planning: Waterfall 3.0 session
Play: Rubix cube with a brown belt.
Reality: My hairy twisted pony.
Pickle: Love/Hate.
Let's get the hell out of Rosemont and find something decent to eat.
re Rosemont: STEAK.
ISO 9000: The future of python?

Saturday, March 28, 2009

PyCon Day2

Class Decorators: Radically Simple


I gave my talk today, slides are available on the PyCon website (the ppt version might be crap - I exported from OpenOffice). I added two pages of speaker's notes at the top that answer some questions (and whargarbls) and some eratta.

The final talk only shares a few slides with the PyConUK version and maybe none with the original EuroPython version. I went to bed at midnight like a good boy but tossed and turned thinking of the talk until 2am. At that point I gave up and rewrote a big chunk of the talk (which had already been rewritten since Boston two weeks ago). I finished around 5am and then sacked out for 3 hours. Somehow it managed to come in at the perfect length of 25 minutes (+5 for Q/A).

Now that I have this talk licked, I'll be retiring it. I have about 8 months to come up with an idea for next year.

Talks


[updated as I go to them]
Manfred Schwendinger: "Google App Engine: How to survive in Google's Ecosystem"
This was a detailed description of how his particular application uses cloud services (they use both Amazon EC2 and Google AppEngine). It was interesting but very detail oriented (we do this like this, and that like that). I'll be downloading the slides for future reference.

Bob Ippolito: "Drop ACID and think about data"
Bob's talk was about using non-ACID (basically non-SQL) storage. A good overview of why you, the web developer, probably don't care about the things ACID databases do well and don't care about the things alternate data stores (key-value, column-based, "persistent eventually") do badly. It's a good trip through all the available alternatives. A intro pitch about each class of stores and then a quick overview of the major implementations. This talk was packed.

Ned Betchelder: "Whirlwind Excursion thrhough Writing a C Extension"
A good primer for writing modules and types in C. It's a massive subject and Ned did a good job of showing one of everything. I wasn't at the Boston meetup where he previewed his talk so I was glad to make this one.

Alex Martelli: "Abstractions as Leverage"
A typical Martelli talk, which is to say very good. His sonorous voice would tame rabid badgers. To say he's erudite doesn't begin to cover it - in 20 slides he used quotes from blogs, the American Journal of Psychriatry ("I recommend everyone subscribes"), and of course lots of dead people including a Chinese sword fighting manual (what, no Clausewitz?). Here are some of the bullets (full slides http://aleax.it/python_abst.pdf)
* to use an abstraction well you need to understand at least two layers below it
* (Splosky's Law) "All Abstractions leak", which is to say all abstractions lie.
* ex/ NFS isn't a cloal file system. It can be useful to treat it like one but if you don't know how it works you are going to get burned sooner or later.
* You can be a good python programmer without understanding how it is implemented but you can never be a great one.
* You can't write a good abstraction unless you know they layers above too -- how it will actually be used.

Friday, March 27, 2009

PyCon Day 1

Today is the first day of the conference proper. The most popular talk (measured by the online talk planner widget) was canceled. Titlted "Designing Applications with Non-Relational Databases" I was sure to go, but alas the speaker canceled for reasons unknown.

This year there is a "Green Room" for speakers and conference volunteers. I wasn't expecting it to be green but I was hoping for a lounge. Instead it is a purely functional ops area - the network is run from here. It has power strips, a test projector, and free coffee. It might not have a wet bar but it is still a nice perk.

Talks


[updated as I go to them]
Brett Cannon "How Python is Developed." It was a good overview of core python development pitched at newbies. He sketched out the basic bug and feature cycles, how to [eventually] get core commit privs, etc. It was mainly an informational session so it included lost of links to the existing documentation (some of which was written by Brett).
Jess Noller "Introduction to Multiprocessing in Python." 'Multiprocessing' is a module that lets you do .. multiprocessing in python. I only new vaguely what it did before. Now I now kinda what it does. I might know more but I was busy refactoring some itertools types [see below].
Raymond Hettinger "Easy AI in Python." A ramble through several different problems with code of the solvers. The point is to show how easy it is to solve most problems. So easy that a kid could literally do it (part of the talk was about why kids should do it). I missed most off this one because I was hacking [see below] but I'd seen it before so I didn't mind.

Balls


I should read python-dev more regularly. It turns out Hettinger went and implemented fast-C permutations, combinations, and cartesian product in the itertools module. You know, just like the probstat module I wrote. That old code is pretty un-pythonic (I wrote it in my inbetween stage so it is a generic lib with both python and perl wrappings). I had a mostly finished rewrite that was CPython from the ground up and - suprise! - it looks almost identical to Raymond's. Almost, I spun out the iterator into a separate object so the base object could have a len (iterators aren't allowed to support len). His doesn't have random access but that is one of the things no one used on mine so I was going to drop it anyway.

Python Language Summit

[this is about yesterday, I'm getting caught up]

Yesterday was the Python Language Summit. 40+ of the core developers of all the python implementations (CPython, Jython, IronPython) met to discuss stuff. The meeting was five hours in 1.25 hour chunks. Morning topics included goals for future releases, the timing of future releases, and what processes we need to change, if any. Afternoon topics were how to share more stuff (tests and benchmarks) across implementations, and how to combine different the various setup/packaging projects into one.

Meeting face-to-face is always easier than using a mailing list. The conversations are synchronous, latency is low, and decorum is higher. The meeting could have used a dose of Robert's Rules of Order - it wasn't always clear when we had reached consensus so topics drifted until even tangents had been exhausted. Some quick pronouncements might have been better.

What decisions were made? I'm not really sure. Python 2.7 might be the last in the 2.x series, or it might not. New libraries are more likely to get backported than new core language features. If you were expecting to continue on the 2.x series until a magic day when you found yourself writing 3.x code you will be waiting forever. There was some interest in writing a 3to2 source converter that mirrors the 3to2 tool. Because 3.x has stronger semantics (only one obvious way to do it) a 3to2 tools is theoretically easier to write; but a 3to2 tool might also target many 2.x versions so "easier" is still a lot of work.

The different implementations will be sharing more tests and benchmarks in the future. Probably. There was general agreement we should but it will be up to individuals to make it so (as always). Ditto for packaging - everyone agreed that the different tools should combine but the devil is in the details - so the discussion is being moved to the packaging-SIG list.

Wednesday, March 25, 2009

In Chicago

All checked in at the Crown Plaza (where most of the core devs seem to be staying). Took the hotel shuttle over with Glyph and Brian Dorsey. Glyph and I had different flights from BOS to ORD but arrived at the same time. Go figure.

The talk of the town so far is Unladen Swallow - a google effort to replace the CPython byte code machine with LLVM. Goal #1 "Produce a version of Python at least 5x faster than CPython." Holy S**t.

Tuesday, March 24, 2009

PyCon Tomorrow

I arrive in the afternoon on the 25th and will be there through April 1st (leaving in the PM).

Tip: DO NOT bring a heavy coat unless you are from a sunshine state. Chicago sounds cold but like last year it is supposed to be warmer in Chicago than Boston (55F+ during the day, 35F+ at night). Last year I stepped off the airplane wearing an overcoat and it turned warm into sweltering hot.

No mustache again this year .. and maybe never again. While it was fun so was being a longhair with an eyebrow ring in the early 90s (dude, it was the early 90s). Many things are fun, once.

I have business cards this year. Plain $0.25/ea Kinkos cards and not the $2.00/pop custom wonders I had at my tech/marketing company. Last year I had neither and felt naked.

Consider the above an announcement that I'm officially back in the job market; tomorrow will be two years exactly that I've been on vacation. My bowling game has improved greatly, my golf game not as much (though to be fair I've been bowling for 2 years and golfing for 25). It was nice to be available to travel to every birthday/baptism/marriage/funeral/etc and go abroad on a whim but it does get old (and the pay stinks). Mainly I'm looking forward to working with a team again. Personal projects are fun but it ain't the same.

Wednesday, March 18, 2009

Afterward: PyCon on the Charles

The Boston mini-PyCon went well. The Beta House was at max capacity with 30+ in attendance. I met Jesse Noller for the first time; I'm not sure how this managed to be a first because he's a python dev and has been at the last few PyCons. I'd bet there are pictures on flickr that have both of us in frame.

Noller's talk was a thousand foot view of the plethora of multiprocessing/concurrent/messaging/you-name-it frameworks in python. He compared the current proliferation to the mess of competing web and ORM frameworks of two years ago. Sounds about right.

Taylor's talk was about Reinteract, his spreadsheetish interactive python shell. I expected to sleep through this but the app is actually interesting. It falls somewhere between IPython and Resolver One in functionality. I'm sure he and the Resolver guys will have lots to talk about.

My talk went OK. Half the slides are new (again) and I really like the individual slides, but in the rewrite it lost its narrative (the "Radically Simple" of the title). I loved the idea of jumping in on the second slide with an example titled "Why this is Cool" but it fell flat because the first slide didn't explain the pains and foibles of doing without class decorators. I also need to reinsert the longer explanation of what decorators (and metaclasses and mixins) are. The PyCon crowd will be more savvy than the Cambridge user's group but not that much more savvy. A few people said "that looks really cool but I have no idea WTF you are talking about." My talk is flagged as "advanced" in the program but beginner/intermediate/advanced is ignored by attendees (and usually isn't on the printed schedule).

Bruce Eckel will not be giving a keynote as he broke his leg badly while skiing. This is a mixed blessing for me because Bruce's keynote was about class decorators and metaclasses. Good for me because people won't skip my talk for his keynote. Bad for me because I wanted to see his talk and was looking forward to talking to him. Oh, and now I have a useless T-shirt that says "Bruce Eckel Stole My Talk" on the front and "I Stole Bruce Eckel's Talk" on the back.

Ned Bachelder gave his PyCon talk A Whirlwind Excursion through Python C Extensions at the previous meetup. Do click the link, it includes his slides interspersed with his own commentary. I did a similar talk titled "Writing Your Own Python Types in C" a couple years ago so Ned & I traded notes. One of the slides is a nod to our conversation. It isn't as egoboo as the time time Guido gave me full slide w/ attribution in his "State of Python" address, but I'll take it.

Tuesday, March 3, 2009

PyCon on the Charles

Ned Batchelder has organized a preview/practice session of all the Boston based pythoneer's talks. Ned gave his talk last month at the Cambridge meetup, and March 18th there will be a three hour session with Jesse Noller, Owen Taylor, and me at the Beta House. Talks start at 6:30pm. If you haven't been to a PyCon it is a chance to see a mini version of one. I'll even bring the beer.

Talks are:
Noller: Concurrency and Distributed Computing with Python Today
Taylor: Reinteract: a better way to interact with Python
Diederich: Class Decorators: Radically Simple

[fixed] I misattributed Taylor's talk to someone else.

Sunday, March 1, 2009

Python Language Summit

[cross posted from my non-python blog with some minor edits]

I just got my invitation to the Python language summit. I had planned on going anyway, so the invitation just makes it less awkward for everyone involved. The summit overlaps with the tutorial days of PyCon because, well, by definition if you belong in the summit you don't belong in the tutorials [tutorial instructors can suck it].


The summit is interesting because it is unusual* - open source events are usually open access too. The agenda is wide open and is roughly focused on standardization and the future, whatever that is. The invitees are the core developers of all the implementations of Python: regular Python (aka "CPython"), Jython (Java), and IronPython (C#).


I'm not sure what the purpose of the invites is, other than to convey weight. To make the list is to ask "who are the 50 people on the planet who would actually want to come?" Griefers and trolls would be bounced regardless [come to think of it, maybe I was invited because I have no compunctions about bouncing griefers and trolls] so perhaps the invitations are meant to discourage the well meaning but clueless. Every convention has at least a few of those - does anyone remember the "callable None" guy from a few years back? "well meaning but clueless" doesn't begin to describe him.



* It is not without precedent. See the the Reykjavik sprint. [my favorite bit from those posts "Cod jerky smells a bit like feet but tastes OK. Dried shark smells a lot like feet and tastes exactly like dried asshole."]