Tuesday, March 31, 2009

PyCon Organizers: Doug loves numbers and numbers love Doug

The PyCon organizers struggle to make each conference better than the last. As I mentioned in another post they did a bang up job on logistics this year. One metric they track is speaker popularity. It is a bit fuzzy because hot topics vary and individual speakers can do well on one topic while sucking at another. But the organizers do try to sift out the best.


Speaker Data


This year generated a bonanza of speaker data. The online talk schedule was all interact-y and allowed attendees to plan and print their preferred talks ahead of time. During the conference itself the back of every room had a pile of poker chips and three buckets: Green, Yellow, and Red. The idea being that everyone drops a chip in the Good/Neutral/Bad bucket as they walk out at the end of each talk [hopefully they leave at the end].


Doug Knowns Data


If you were thinking the stats are weak and there are many ways to game the numbers you are very right and very wrong. Doug Napoleone loves data even more than I do so he's doing regressions like nobodies business. Doug has a post up explaining the raw data and the problems associated with turning the raw stuff into usable numbers. He does speech recognition software as his day job so he knowns statistics backwards and forwards.

Is This Pythonic? Conclusion

I couldn't leave well enough alone and continued to refactor the code from the Is This Pythonic? Open Space. The final version I submitted as a patch to the original project. The biggest change was not in apply_all (as I assumed) but in writing a new chunk of code that sucks all the ugly and special cases from the rest of the code and puts it in one place. I don't know if there is a pattern name for this but it tends to happen at boundaries. Pretty print functions are usually ugly for instance, and for good reason - your only other choice is to ugly up the core.

So here is FetchAccumulator. It sits at the boundry just above the database calls and returns tidy, regular data to its callers.

class FetchAccumulator(object):
def __init__(self, sql, args=None, fetch_per=-1, limit=-1):
self.results = []
self.sql = sql
self.args = args
self.fetch_per = fetch_per
self.limit = limit
return

def fetch(self, cursor):
cursor.execute(self.sql, self.args)
if self.fetch_per == 1:
results = cursor.fetchone()
assert len(results) <= 1, results
elif self.limit > 0:
results = cursor.fetchmany(self.limit)
assert len(results) <= self.limit, (len(results), self.limit)
else:
results = cursor.fetchall()

if not results or not filter(None, results): # code smell
return

self.results.extend(results)
self.limit -= len(results)

if not self.limit: # we fetched our limit
raise DoneApply()
return

def __iter__(self):
return iter(self.results)

This makes the other functions much, much simpler. Here are four database query functions that use FetchAccumulator. Seventy lines are now twenty.

class ShardCursor(cursor.BaseCursor):
def selectOne(self, sql, args=None):
accum = FetchAccumulator(sql, args, fetch_per=1, limit=1)
apply_all(valid_shards(self._shard), accum.fetch)
return accum

def selectMany(self, sql, args=None, size=-1):
accum = FetchAccumulator(sql, args, limit=size)
apply_all(valid_shards(self._shard), accum.fetch)
return accum

def selectAll(self, sql, args=None):
accum = FetchAccumulator(sql, args)
apply_all(valid_shards(self._shard), accum.fetch)
return accum

def countOne(self, sql, args=None):
accum = FetchAccumulator(sql, args, fetch_per=1)
apply_all(valid_shards(self._shard), accum.fetch)
return accum

Of course these functions now have their own code smell -- they only vary in their accumulator so they could be collapsed into a single function. That would require refactoring all the calling code which is a bigger project than I wanted to take on.

The apply_all function grew a proper exception to allow callers to bail out of the loop early.

class DoneApply(Exception): pass

def apply_all(shards, func):
for shard in shards:
db = shard.establishConnection()
try:
cursor = db.cursor()
func(cursor)
except DoneApply:
break
finally:
db.close()


I'll omit the unit tests. The original project had no unit tests for this code so I had to write some to make sure my refactoring wasn't breaking anything.

Sunday, March 29, 2009

Is This Pythonic?

Moshe Zadka and I did an Open Space titled "Is This Pythonic?" where we took someone else's code and reworked it to be cleaner. The code we worked on was cursors.py from the PyShards project.


[Originally Steve Holden and Raymond Hettinger were going to host it (they've done it before) but Steve bowed out and Raymond decided to go downtown with his girl]


Here is the selectOne function in it's original form.


def selectOne(self, sql, args=None):
results = []
shard = self._shard;
while shard != None and len(results) == 0:
db = shard.establishConnection()
cursor = db.cursor()
cursor.execute(sql, args)
res = cursor.fetchone()
if res != None:
results.extend(res)
cursor.close ()
db.close ()
shard = shard.next
return results

The code mixes a bunch of conceptual actions in one big blob. It is walking a linked list* of shards. It acquires a resource (making it harder to test) but doesn't safely release it in a try/finally. It builds up a list of results, and finally returns it. That's a lot of things for one function to be doing at once.


Below was the first cut. Each action is broken into a separate function. Because there are many functions almost like this one we can even reuse those parts.


* The linked list should just be a list, but that's a bigger refactoring.


def valid_shards(shard):
''' walk the shards linked list, yielding the items '''
while shard:
yield shard
shard = shard.next

def apply_all(shards, func):
''' for each shard connect to the database, create a cursor, and pass it to func '''
for shard in shards:
db = shard.establishConnection()
try:
cursor = db.cursor()
yield func(cursor)
finally:
db.close()

def selectOne(self, sql, args):
''' execute sql on each shard, returning the first row (if any) on each shard'''
def fetchone(cursor):
return curser.fetchone(sql, args)

results = apply_all(valid_shards(self._shard), fetchone)
return filter(None, results)

So each function has a little job and does it in a straghtforward way. Because the module has many methods that are almost like selectOne() we should be able to reuse those parts. So we gave it a try on selectMany()

def selectMany(self, sql, args=None, size=None):
results = []
stillToFetch = size
shard = self._shard;
while shard != None and stillToFetch > 0:
db = shard.establishConnection()
cursor = db.cursor()
cursor.execute(sql, args)
res = cursor.fetchmany(stillToFetch)
if res != None:
results.extend(res)
stillToFetch = stillToFetch - len(res)
cursor.close ()
db.close ()
shard = shard.next
return results

SelectMany has an extra wrinkle that SelectOne doesn't in that it will stop early if it gets enough result rows. The apply_all function doesn't have a hook for stopping early so we have to kludge one into the function we pass in. Here is the first draft that has a big code smell. Raising StopIteration will do the right thing but it won't if the implementation changes.

def selectMany(self, sql, args=None, size=None):
limit = [size]
def fetchmany(cursor):
res = cursor.fetchmany(sql, args)
limit[0] = limit[0] - len(res)
if size is not None and limit[0] <= 0:
raise StopIteration

for results in apply_all(valid_shards(self), fetchmany):
for result in results:
yield result

This code would be much cleaner in python2.6, and much much cleaner in 2.7 (the dev trunk). So let's pretend that 'nonlocal' and 'yield from' are available.

def selectMany(self, sql, args=None, size=None):
def fetchmany(cursor):
nonlocal size
res = cursor.fetchmany(sql, args)
size -= len(res)
if size and size <= 0: # our bug just became more obvious!
raise StopIteration

for results in filter(None, apply_all(valid_shards(self), fetchmany)): # bug fixed!
yield from results

[I fixed the missing filter bug in that one too]
Let's fix that size bug and raise a specific exception so our code is safe even if the implementation of apply_all changes.

def selectMany(self, sql, args=None, size=None):
class LimitReached(Exception): pass
def fetchmany(cursor):
nonlocal size
if size is not None and size <= 0: # bug fixed!
raise LimitReached
res = cursor.fetchmany(sql, args)
size -= len(res)

try:
for results in filter(None, apply_all(valid_shards(self), fetchmany)):
yield from results
except LimitReached:
pass

Yuck. That might more correct but now the code smell is stinking up the room. What we need to do is stuff more smarts into apply_all().
To Be Continued...

PyCon Organizers

The organizers did a bang up job this year. With the addition of a green room for speaker prep (coffee available all day w/ your speaker badge) and walkie-talkies for the organizers everything went smoothly. I heard a couple gripes about the Wifi but I haven't had any problems myself.

The hitches are with the venue and are the usual complaints. They charge $25/day for wifi in the hotel rooms. We don't have projectors in the open space rooms because they want $600/per projector per day. Coffee and beverage service gets torn down and put back up repeated so they can charge each time. Annoying.

The videos for the conference are already being posted. This is an amazing feat - each talk has three video and several audio channels that have to be spliced together. I'll link when mine goes up.

Thanks guys!

Open Spaces Board is teh Funny

Someone got drunk and clever and backfilled yesterdays Open Spaces schedule board with fictional talks [typos mine, I was touch typing]. Another guy is making a panorama photo, I'll add a link when he sends it to me.

Teach Me Ian Bicking.
Settings.py: Why sysadmins love editing your .py files.
A.N.U.S.: Plugabble Sphincters
Traversal: URL mapping is for the west.
Forking: because arguing is too hard.
Djylons: Let's make it happen.
Fulton v Rossum: Cage Match ($10)
GROIN: Come see how it works.
Zope: Making the simple IMPOSSIBLE.
Plone: Making Zope unreadable.
Acquisition Algebra: Fultonian mind fuck & other OOPSLA oddities.
Tic Tac Toe: Learn how to play, learn the secret strategies.
Tresling: Arm wrestling + Tetris, let us teach you!
Wheels: let's try them square (w/ pic of a trapazoid).
Niagra Planning: Waterfall 2.0 session
Catastophe Planning: Waterfall 3.0 session
Play: Rubix cube with a brown belt.
Reality: My hairy twisted pony.
Pickle: Love/Hate.
Let's get the hell out of Rosemont and find something decent to eat.
re Rosemont: STEAK.
ISO 9000: The future of python?

Saturday, March 28, 2009

PyCon Day2

Class Decorators: Radically Simple


I gave my talk today, slides are available on the PyCon website (the ppt version might be crap - I exported from OpenOffice). I added two pages of speaker's notes at the top that answer some questions (and whargarbls) and some eratta.

The final talk only shares a few slides with the PyConUK version and maybe none with the original EuroPython version. I went to bed at midnight like a good boy but tossed and turned thinking of the talk until 2am. At that point I gave up and rewrote a big chunk of the talk (which had already been rewritten since Boston two weeks ago). I finished around 5am and then sacked out for 3 hours. Somehow it managed to come in at the perfect length of 25 minutes (+5 for Q/A).

Now that I have this talk licked, I'll be retiring it. I have about 8 months to come up with an idea for next year.

Talks


[updated as I go to them]
Manfred Schwendinger: "Google App Engine: How to survive in Google's Ecosystem"
This was a detailed description of how his particular application uses cloud services (they use both Amazon EC2 and Google AppEngine). It was interesting but very detail oriented (we do this like this, and that like that). I'll be downloading the slides for future reference.

Bob Ippolito: "Drop ACID and think about data"
Bob's talk was about using non-ACID (basically non-SQL) storage. A good overview of why you, the web developer, probably don't care about the things ACID databases do well and don't care about the things alternate data stores (key-value, column-based, "persistent eventually") do badly. It's a good trip through all the available alternatives. A intro pitch about each class of stores and then a quick overview of the major implementations. This talk was packed.

Ned Betchelder: "Whirlwind Excursion thrhough Writing a C Extension"
A good primer for writing modules and types in C. It's a massive subject and Ned did a good job of showing one of everything. I wasn't at the Boston meetup where he previewed his talk so I was glad to make this one.

Alex Martelli: "Abstractions as Leverage"
A typical Martelli talk, which is to say very good. His sonorous voice would tame rabid badgers. To say he's erudite doesn't begin to cover it - in 20 slides he used quotes from blogs, the American Journal of Psychriatry ("I recommend everyone subscribes"), and of course lots of dead people including a Chinese sword fighting manual (what, no Clausewitz?). Here are some of the bullets (full slides http://aleax.it/python_abst.pdf)
* to use an abstraction well you need to understand at least two layers below it
* (Splosky's Law) "All Abstractions leak", which is to say all abstractions lie.
* ex/ NFS isn't a cloal file system. It can be useful to treat it like one but if you don't know how it works you are going to get burned sooner or later.
* You can be a good python programmer without understanding how it is implemented but you can never be a great one.
* You can't write a good abstraction unless you know they layers above too -- how it will actually be used.

Friday, March 27, 2009

PyCon Day 1

Today is the first day of the conference proper. The most popular talk (measured by the online talk planner widget) was canceled. Titlted "Designing Applications with Non-Relational Databases" I was sure to go, but alas the speaker canceled for reasons unknown.

This year there is a "Green Room" for speakers and conference volunteers. I wasn't expecting it to be green but I was hoping for a lounge. Instead it is a purely functional ops area - the network is run from here. It has power strips, a test projector, and free coffee. It might not have a wet bar but it is still a nice perk.

Talks


[updated as I go to them]
Brett Cannon "How Python is Developed." It was a good overview of core python development pitched at newbies. He sketched out the basic bug and feature cycles, how to [eventually] get core commit privs, etc. It was mainly an informational session so it included lost of links to the existing documentation (some of which was written by Brett).
Jess Noller "Introduction to Multiprocessing in Python." 'Multiprocessing' is a module that lets you do .. multiprocessing in python. I only new vaguely what it did before. Now I now kinda what it does. I might know more but I was busy refactoring some itertools types [see below].
Raymond Hettinger "Easy AI in Python." A ramble through several different problems with code of the solvers. The point is to show how easy it is to solve most problems. So easy that a kid could literally do it (part of the talk was about why kids should do it). I missed most off this one because I was hacking [see below] but I'd seen it before so I didn't mind.

Balls


I should read python-dev more regularly. It turns out Hettinger went and implemented fast-C permutations, combinations, and cartesian product in the itertools module. You know, just like the probstat module I wrote. That old code is pretty un-pythonic (I wrote it in my inbetween stage so it is a generic lib with both python and perl wrappings). I had a mostly finished rewrite that was CPython from the ground up and - suprise! - it looks almost identical to Raymond's. Almost, I spun out the iterator into a separate object so the base object could have a len (iterators aren't allowed to support len). His doesn't have random access but that is one of the things no one used on mine so I was going to drop it anyway.