Friday, November 03, 2006

More Unit Testing

bhearsum@wesley:~/projects/mozilla/buildbot/buildbot-bonsai/buildbot/test$ trial

Ran 9 tests in 0.061s

PASSED (successes=9)

So I've finished my first set of test cases, rejoice!

When I started writing them I thought it would be really easy, I didn't expect to spend more than a few hours on them. Boy, was I wrong. It's been over a week since I started and I probably spent 8 to 10 hours total on a 181 line file. I received help and advice along the way from Rob Helmer for which I am much appreciative.

Problems I encountered

  • My original code was more or less untestable

  • Python regular expressions were confusing

  • My original code used a mixture of exception and return values for error reporting

  • Comparing two objects by their data, _not_ their references

Very quickly I decided to do rewrite of the BonsaiPoller module. It was untestable, confusing, and just plain ugly. I had been planning to do a rewrite becausue of the ugliness alone, so it wasn't hard to reach this decision. This also gave me a chance to attempt some test-driven development. I was looking at _some_ of the old code while writing the test cases but by the end the BonsaiPoller and BonsaiParser worked quite a little differently internally. For reasons of simplicity I decided to keep the interface the same.

I found much of the test case writing very tedious. Just inputing all the data I needed was a chore. I thought it might be easier to input one "good" piece of data and base all of the broken ones off of it. This worked well enough while using the replace() method to do simple substituition but as soon as I needed regular expressions I was in a world of hurt. For some reason the Python developers seem to have decided that they don't want regular expressions as a built-in part of the language. For the life of me I don't know why. I was stuck carrying around a 're' (regular expression) object for most of the regular expressions I used. Compared to how they work in Perl it's a complete pain in the ass. Observe:

import re

data = "<blah><ci><f></f></ci></blah>"
myre = re.compile("<ci.*></ci.*>", re.DOTALL, re.MULTILINE)
newdata = re.sub(myre, "", data)


$data = "<blah><ci><f></f></ci></blah>"
($newdata = $data) =~ s/<ci.*><\/ci>//;

Not such a big deal when doing one or two, like I was, but if you're doing heavy text parsing this would be an ugly nighmare.

In my original code I used a lot of 'return True' and 'return False' to indicate when there was no more data. Seeing as python is object oriented and throws lots of exceptions I wanted to be consistent. This made much of my code a _lot_ cleaner and it has a nicer "feel" to it. There's one part I'm still not happy with though.
Here's the code when I was using True/False:

data = BonsaiResult()
while self._nextCiNode():
ci = CiNode()
ci.log = self._getLog()
ci.who = self._getWho() = self._getDate()
while self._nextFileNode():
fn = FileNode()
fn.revision = self._getRevision()
fn.filename = self._getFilename()


return data

And with exceptions:

nodes = []
while self._nextCiNode():
files = []
while self._nextFileNode():
except NoMoreFileNodes:
except InvalidResultError:
nodes.append(CiNode(self._getLog(), self._getWho(),
self._getDate(), files))

except NoMoreCiNodes:
except InvalidResultError, EmptyResult:

return BonsaiResult(nodes)

I tried to think of a way to use exceptions cleanly with while loops but drew blanks. If anyone can think of a way to improve the above code please let me know! The function works fine, however, so I shouldn't worry so much.

The last problem I ran into was comparison of my BonsaiResult objects. I didn't have this problem before writing the test cases because there was no point where I needed to compare them! This part wasn't too difficult once I figured out what I had to do -- but that took awhile. I was considering pulling the __cmp__() method before creating a diff but I don't think it hurts to leave it in.

Overall I am very pleased with my test cases and new BonsaiPoller module. As soon as I get the energy I will be submitting it to Brian.


Giles said...

Your comment about Python regular expressions isn't very convincing. The "honest this won't scale up" doesn't wash. Regex approaches en-mass don't look good en-mass. You are better off using parsing tools.

Also for you loops you should really check out generator functions. Best way of handling loops by far.

Ben said...

Yeah, after re-reading that post I can tell you that it mostly just whining about annoyances.

I've never heard of generator functions before, I'll take a look, thanks.

Anonymous said...

Unlimited Earnings Potential -

Our company is rapidly growing and offers you an extraordinary income helping others succeed. The primary requirement is to follow up on client inquiries and point them in the right direction. It is stress free, rewarding and straightforward work.

For complete details:

(Please feel free to delete this post if you don't want it on your blog. Thanks for the informative blog and opportunity to post.)

Free Webmail Program said...

Webmail program for the major free email sites -

My Mail 1.0 is configured to work with AOL, Gmail, Hotmail, Linuxmail, and Yahoo. With My Mail 1.0 you get the benefit of premium services without having to pay site fees. My Mail 1.0 completely automates the process of sending and receiving mail from the major sites, saving you time and trouble.

My Mail 1.0 eliminates the need to visit web sites to send and receive mail, which increases the speed of sending and receiving email by over 80%, even if they do not offer what is known as POP3, IMAP and SMTP. My Mail 1.0's look is also fully customizable. One you use it, you'll never want to go back to the web site again to get your mail.

For complete details:

(Please feel free to delete this post if you don't want it on your blog. Thanks for the informative blog and opportunity to post.)