Newsflash: debugging parallel programs ain’t easy

I ran into a situation recently where I was asked to debug a legacy C# program that was crashing due to multiple threads trying to write to the same file at the same time. I was asked because I was the last guy to modify it, so I guess I had no room to complain. I focused on the changes that I had made, trying to figure out how the heck I could have introduced the bug – my changes weren’t anywhere close to the source of the crash!

Then it hit me – my changes were a bunch of refactoring to make the code faster. The bug was always there, we were just more likely to hit it after my changes since each thread executes in less time. I should have probably guessed right away – there was a shared resource that was not being handled properly – but I was blinded by my assumption that my changes had to have introduced the bug. I guess that’s one moral of the story. (And now that I think about it, in the past *I* have been the guy that introduced a parallelism related bug that someone had to fix later.)

Another lesson is perhaps that it’s unwise to screw around with multicore parallelism unless you know what you are doing. Say you’ve got 4 cores, and let’s say that you get a 3x speedup out of them (which is often pretty generous). Many times I would rather be 3x slower but compleletely reliable and avoid random crashes. Microsoft’s task parallel library is kind of cool, but kind of dangerous. I’m not sure how often it’s really helpful.

Author: natebrix

Follow me on twitter at @natebrix.

3 thoughts on “Newsflash: debugging parallel programs ain’t easy”

  1. From similar experiences, I think I’d always come to somebody else’s multi-threaded code with a large dose of scepticism and pretty much check every line of code to make sure that it was thread-safe.

    This is one reason why I have moved to using readonly backing fields and readonly collections rather than using auto-properties. It means you can go into a class and see immediately that it’s state after construction is thread-safe.

    The same would obviously be true if I was given some code to parallelise.

    1. Regarding read-only collections, the other option is to use the concurrent collections that come with the task parallel library, and remove the need to write error-prone code to avoid concurrency problems in most situations.
      Oh, and this is also a good use case for F#! I have spent some time recently writing an algorithm that uses the TPL extensively, used it from both C# and F# – and noticed that it was much easier not to do anything silly in F#, because you have to go out of your way to mutate a variable by reference…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s