There are 4 posts and 7 comments on this blog, if you cannot find what you intially looked for, use the search above and press 'go'!


NAVIGATION

At some point, every programmer has to dive into another programmer’s code, and ends up wondering what the hell the previous guy was thinking. “Why did he do this? It could’ve been so much simpler, if he wasn’t an idiot…”

Idiocy aside, there are often good reasons why small, simple ideas become 4000 lines of code behemoth functions. I’ll give you one example we spent months wrestling to the ground:

Most of our applications are very date sensitive, and one of the things we constantly need to know is how many days there are between two given dates. Since any function that we wrote would be called 100s of thousands of times during the normal daily execution of the program, the programmer strove for something that was simple (since everyone would rely on it and read it often) and fast (which was a classic example of premature optimization, but generally not that big a deal).

His first pass at the function looked like this:

function daysBetween(Date d1, Dated2) {

//compute milliseconds in a day as: 24 hours * 60minute * 60 seconds * 1000 milliseconds

int millisInDay = 24 * 60 *60 * 1000;

int delta = d1.time() - d2.time();

return int(delta/millisInDay);

}

So, for the most part, this function is decent enough. It’s rather fast (the longest part taking place in d1.time()) and small enough that it’s very simple to read. All its doing is taking the two given dates, subtracting one from the other, and dividing by the number of milliseconds in an average day. Four lines of code gives you something that is fast, elegant, and wrong.

It didn’t take long for us to start finding the first few problems. They were pretty simple, both to find and to fix. The first was that the function was only working if you passed in dates in the right order. Since the function subtracts the second date from the first, if the latter date is listed first, you start getting things like having -7 days between consecutive Sundays, which could be said to be mathematically correct, but still isn’t very useful.

I could take you through each of the iterations we went through, incrementally improving the code, but I suspect you can guess most of them (using an absolute value for delta, using a round instead of an int cast…). What really ended up killing us (and the inherent elegance of the function) was two things.

1. Having the number of days between two dates isn’t actually that helpful. 99 times out of 100, we found that as soon as we called our daysBetween() function, we were turning around and calling some second function to tell us what days are between day X and day Y. It eventually seemed to be a lot smarter to have one function that just returned an array of all days between a given two, and then you could get the bit of extra information as a one off of the more useful array (by looking at array.length…)

2. Timezones.

Timezones? Yes, unfortunately. After we modified our function to return an array of dates between Date X and Date Y, we found an odd situation when, occasionally, one of two things would happen: we’d either get one day less than we expected, or we we get the right amount of days, but only because we’d get one date twice.

Or, more accurately, I should say that our customers kept getting these situations - we didn’t. Our code passed all of our unit tests, our QA group signed off on it, our UAT group signed off on it, and our collection of beta customers had no troubles. It wasn’t really until we were out in production that we started having trouble (this is probably one of those cases that shows why testing is, at best, a risk mitigation strategy, but that is a point for another day…)

For a blog post, this is getting rather long, so I won’t go through the details of how we found the bug (imagine a group of us looking here ), but it all comes down to this line:

int millisInDay = 24 * 60 *60 * 1000;

Counting days in milliseconds is problematic in a couple of ways - the main one being that a day is actually 86,162,400 milliseconds long instead of 86,400,000, but that wasn’t the problem. The problem was Daylight Saving Time. In the United States, we mostly follow DST (well, except for Arizona and Indiana - which has caused us other problems in the past). The most recent time the clock flipped was at 0200 on November 4th, 2007. When our software needed to compare dates, we always used dates based off of the same time of day: midnight. So, when November 4th rolled around (or more accurately, when we compared a set of dates that included November 4th) we were mostly okay.

Other countries, however, change over to DST at midnight, so in a few cases (Brazil, I’m looking at you here), when you cross over certain dates, some days will get double counted - Add 24 hours (86400 seconds) to November 3rd at midnight, and you should get November 4th at midnight, but since November 4th at midnight rolled back to November 3rd at 2300, you get two November 3rds in the list. Simply put, there are dozens of days of the year (since not everyone switches on or off DST on the same day) where one day is not 86,400 seconds long.

Our final version of the daysBetween() function is 232 lines long, all of them necessary, but none of them particularly fun to look at. Listed as I have, the solutions encompassed between the two {} seem pretty trivial, but each one was hard fought, and more importantly, hard won. We have an application that works marginally slower than before, but we know that it will work correctly (at least w/r/t dates) in Brazil and Western Australia and Southern Spain and even, begrudgingly enough, Arizona.

Yes, that function is now almost 80 times larger than it previously was, which means its at least 10 to 15 times harder for a programmer to read, learn, and understand, but that’s the way of software. I’ve seen hundreds of enhancements that have increased the size of an application codebase by 15% or more, but I’ve never seen one that shrinks a codebase more than a few hundred lines. You only really have three choices: watch your program grow until you can no longer keep track of it all, force it to stay a specific size and lose a certain number of Kangaroo friendly customers, or watch the whole endeavour die, because no one wants what you’ve written anyway.

We, obviously, chose option 1.


You must be logged in to post a comment.

Name (required)

Email (required)

Website

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Feel free to leave a comment