Friday, November 03, 2006

More Unit Testing


bhearsum@wesley:~/projects/mozilla/buildbot/buildbot-bonsai/buildbot/test$ trial test_bonsaipoller.py

Ran 9 tests in 0.061s

PASSED (successes=9)


So I've finished my first set of test cases, rejoice!

When I started writing them I thought it would be really easy, I didn't expect to spend more than a few hours on them. Boy, was I wrong. It's been over a week since I started and I probably spent 8 to 10 hours total on a 181 line file. I received help and advice along the way from Rob Helmer for which I am much appreciative.

Problems I encountered

  • My original code was more or less untestable

  • Python regular expressions were confusing

  • My original code used a mixture of exception and return values for error reporting

  • Comparing two objects by their data, _not_ their references



Very quickly I decided to do rewrite of the BonsaiPoller module. It was untestable, confusing, and just plain ugly. I had been planning to do a rewrite becausue of the ugliness alone, so it wasn't hard to reach this decision. This also gave me a chance to attempt some test-driven development. I was looking at _some_ of the old code while writing the test cases but by the end the BonsaiPoller and BonsaiParser worked quite a little differently internally. For reasons of simplicity I decided to keep the interface the same.

I found much of the test case writing very tedious. Just inputing all the data I needed was a chore. I thought it might be easier to input one "good" piece of data and base all of the broken ones off of it. This worked well enough while using the replace() method to do simple substituition but as soon as I needed regular expressions I was in a world of hurt. For some reason the Python developers seem to have decided that they don't want regular expressions as a built-in part of the language. For the life of me I don't know why. I was stuck carrying around a 're' (regular expression) object for most of the regular expressions I used. Compared to how they work in Perl it's a complete pain in the ass. Observe:
Python:

import re

data = "<blah><ci><f></f></ci></blah>"
myre = re.compile("<ci.*></ci.*>", re.DOTALL, re.MULTILINE)
newdata = re.sub(myre, "", data)

Perl:

$data = "<blah><ci><f></f></ci></blah>"
($newdata = $data) =~ s/<ci.*><\/ci>//;


Not such a big deal when doing one or two, like I was, but if you're doing heavy text parsing this would be an ugly nighmare.

In my original code I used a lot of 'return True' and 'return False' to indicate when there was no more data. Seeing as python is object oriented and throws lots of exceptions I wanted to be consistent. This made much of my code a _lot_ cleaner and it has a nicer "feel" to it. There's one part I'm still not happy with though.
Here's the code when I was using True/False:

data = BonsaiResult()
while self._nextCiNode():
ci = CiNode()
ci.log = self._getLog()
ci.who = self._getWho()
ci.date = self._getDate()
while self._nextFileNode():
fn = FileNode()
fn.revision = self._getRevision()
fn.filename = self._getFilename()
ci.files.append(fn)

data.nodes.append(ci)

return data

And with exceptions:

nodes = []
try:
while self._nextCiNode():
files = []
try:
while self._nextFileNode():
files.append(FileNode(self._getRevision(),
self._getFilename()))
except NoMoreFileNodes:
pass
except InvalidResultError:
raise
nodes.append(CiNode(self._getLog(), self._getWho(),
self._getDate(), files))

except NoMoreCiNodes:
pass
except InvalidResultError, EmptyResult:
raise

return BonsaiResult(nodes)


I tried to think of a way to use exceptions cleanly with while loops but drew blanks. If anyone can think of a way to improve the above code please let me know! The function works fine, however, so I shouldn't worry so much.

The last problem I ran into was comparison of my BonsaiResult objects. I didn't have this problem before writing the test cases because there was no point where I needed to compare them! This part wasn't too difficult once I figured out what I had to do -- but that took awhile. I was considering pulling the __cmp__() method before creating a diff but I don't think it hurts to leave it in.

Overall I am very pleased with my test cases and new BonsaiPoller module. As soon as I get the energy I will be submitting it to Brian.

3 comments:

Giles said...

Your comment about Python regular expressions isn't very convincing. The "honest this won't scale up" doesn't wash. Regex approaches en-mass don't look good en-mass. You are better off using parsing tools.

Also for you loops you should really check out generator functions. Best way of handling loops by far.

Ben Hearsum said...

Yeah, after re-reading that post I can tell you that it mostly just whining about annoyances.

I've never heard of generator functions before, I'll take a look, thanks.

Anonymous said...

Unlimited Earnings Potential - http://1greatfuture.com

Our company is rapidly growing and offers you an extraordinary income helping others succeed. The primary requirement is to follow up on client inquiries and point them in the right direction. It is stress free, rewarding and straightforward work.

For complete details: http://1greatfuture.com


(Please feel free to delete this post if you don't want it on your blog. Thanks for the informative blog and opportunity to post.)