How Not to Unit Test

Dave Cheney started a Twitter discussion about when a unit test is not a unit test.

I added this comment:

It’s also not a unit test if it mocks out everything that it touches. Then it’s just testing whether the computer it’s running on is powered up or not… and we know the answer to that.

Mark McKenzie replied to ask, Do you have any kind of red line for what constitutes “mocks out everything”?

My answer won’t fit in 160 chars, so here’s a blog post:

Well, Mark, I don’t have a specific answer but I can show you an example.

If you see a function like this…

1
2
3
4
5
6

def foo(x, y, z):
  thing1(x)
  thing2(y)
  thing3(z)
  thing4()

…and the unit test mocks out thing1(), thing2(), thing3(), thing4()… I wouldn’t call that a unit test. I’d call that a test that verifies if the computer is powered on; the only way it can fail is if it doesn’t run.

Sadly when I was at Google I had a terrible manager who didn’t understand this. In fact, he would complain that my code-coverage stats were usually around 60% and point to someone who did terrible tests like the one above. “Why can’t you be more like that guy?”

That person’s stats were in the 90-99% range. He was a low performer that didn’t understand what he was doing, or why people use unit tests. However he knew that if he mocked out everything and got high code coverage statistics it made his boss happy, so he kept doing it.

My stats were lower but my tests were better. I take a balanced approach: writing extensive unit tests that needed it, especially for algorithmically challenging code, or code that had to handle many edge cases, and so on. I always wrote a test for any bug before I fixed the bug, and left the unit test there to prevent regressions.

His tests basically did nothing. When I did code reviews of his PRs, I’d try to point out the tests that didn’t add value, or suggest other tests he could add that would test interesting edge cases, but he couldn’t understand why a test was bad if it increased the code coverage score, or why more than one test was needed for a piece of code.

Ironically that low performer is still at Google and I’m not.

(Comments enabled in case the bad manager or low performer wish to respond.)