One and one makes three, right?

zaterdag 29 juni 2013

Pain killing the pingponger's decease

(or how we got our ci builds green again)

Everyone (at least almost everyone I know) writing Selenium or webdriver tests knows it's not easy to keep all these tests stable. Most of the time, they run perfectly fine on your development machine and start failing after a while in your CI build (and I'm not talking about regression here).

I've felt this pain already in quite some projects and it's not always easy to find what's causing your tests to fail. On my current project, again we are having some pingpong playing webdriver tests (for those who don't understand: it means they run sometimes and sometimes they don't).

Of course your tests should be reproducible and consistently working or failing. And that should be your first step in solving the problem: find the cause of the instability in your test and fix it. Unfortunately when webdriver tests are involved this can be sometimes hard to achieve. The webdriver framework highly depends on references to elements that can become stale. When AJAX calls are involved, this can become a real headache-causer. I'm not going in too deep on all the possible causes of instability, since this is not the main subject of this blog.

So, how do you make your build less dependent upon some hickup of your CI system? At our current project we didn't have an auto-commit for about 3 months because of pingponging webdriver tests. Our 'solution' was, when a build had run and some tests failed, run them locally and if they pass, you can tag your build manually. It cost us a lot of time that could be better spend on more fun stuff.

While trying to find a solution for this problem, we had this idea: maybe we could just ignore the pingpong-playing webdriver tests by excluding them from the main build and running them separately. In that case our main build is no longer dependent upon the vagaries of our tests. That way we would have more auto-commits again, but we would introduce the risk that one of the pingpong-playing tests this time fails for the right reasons. When deploying this strategy, one could ask himself whether you shouldn't throw away the pingpong tests entirely, since you would be completely ignoring them.

Then, we came up with another solution which turned out to be our salvation. What is this magic drug we are using? It's quite simple actually: we created our own (extended) version of the JUnit runner we used before and let it retry pingponging tests. To accomplish this, we mark pingpong tests with the @PingPong annotation, using a maxNumberOfRetries property to define how many times the tests should be retried (default is one). The @PingPong annotation can be used both at method and class level to mark a single test or all tests in a test class as playing pingpong.

An example of a test class using the @PingPong annotation looks like this.

@RunWith(MyVeryOwnTestRunnerRetryingPingPongers.class)
public class MyTest {

  @PingPong(maxNumberOfRetries = 2)
  @Test
  public void iWillSucceedWhenSuccessWithinFirstThreeTries() {
    // ...
  }
}

With MyVeryOwnTestRunnerRetryingPingPongers defined like this.

public class MyVeryOwnTestRunnerRetryingPingPongers extends SomeTestRunner
    implements StatementFactory {

  public MyVeryOwnTestRunnerRetryingPingPongers (Class aClass)
      throws InitializationError {
    super(aClass);
  }

  @Override
  protected Statement methodBlock(FrameworkMethod frameworkMethod) {
    return new StatementWrapperForRetryingPingPongers(frameworkMethod, this);
  }

  @Override
  public Statement createStatement(FrameworkMethod frameworkMethod) {
    return super.methodBlock(frameworkMethod);
  }
}

You still need the implementation of StatementWrapperForRetryingPingPongers. You can find this one here.

I am conscious this is just a pain killer, but it's one that helped us getting our builds green again and gives us extra time to fix our instable pingpong tests more thoroughly.

Please let me know if this post was helpful to you. What do you think about our solution to our instability problem?

donderdag 17 januari 2013

When mocking builders... (using Mockito)

Most of you already know I'm not really keen about mock testing. It gets even worse when mocking builders. As a matter of fact, I'm not really crazy about builders either (so now I find myself looking for a better way to use two of the things I always try to avoid). At least not in production code. Builders in tests are great for quickly creating full-fledged domain objects. For production code however, always ask yourself whether you wouldn't be better of using a factory.

Anyway, in case you are using builders and you want to mock them out, don't do it this way (my example is overly simplistic, but that's not the point here):

  @Test
  public void createLogbookRecordIsNotReallyTested() {
    LogbookRecordBuilder builderMock = mock(LogbookRecordBuilder.class);
    when(builderMock.withType(any(LogbookRecordType.class)))
      .thenReturn(builderMock);
    when(builderMock.withPersonName(anyString()))
      .thenReturn(builderMock);
    when(builderMock.withAccountNumber(anyString()))
      .thenReturn(builderMock);
    when(builderMock.build()).thenReturn(LOG_RECORD);

    LogbookRecord result = client.createLogbookRecord(builderMock);
    assertThat(result).isEqualTo(LOG_RECORD);
  }

Of course we were testing this immensely vital production code:

  @Test
  public LogbookRecord createLogbookRecord(LogbookRecordBuilder builder) {
    return builder
      .withType(LogbookRecordType.DEPOSIT)
      .withPersonName(personName)
      .withAccountNumber(accountNumber)
      .build();
  }

What's wrong with the test? Well, it actually tests nothing. Ok, i'm exagerating a bit, it just tests the build method was called and the result is returned.
So you could change your production code into this:

  public LogbookRecord createLogbookRecord(LogbookRecordBuilder builder) {
    return builder.build();
  }

and your test would still work just fine. Reason is of course that you are stubbing your with-calls, the test is not forcing them to be called.
To fix that, we'll add verify methods to check whether the right methods are called on your builder:

  @Test
  public void createLogbookRecordIsTestedWithStubbingAndInOrderVerifications() {
    LogbookRecordBuilder builderMock = mock(LogbookRecordBuilder.class);
    when(builderMock.withType(any(LogbookRecordType.class)))
      .thenReturn(builderMock);
    when(builderMock.withPersonName(anyString()))
      .thenReturn(builderMock);
    when(builderMock.withAccountNumber(anyString()))
      .thenReturn(builderMock);
    when(builderMock.build()).thenReturn(LOG_RECORD);

    LogbookRecord result = client.createLogbookRecord(builderMock);
    assertThat(result).isEqualTo(LOG_RECORD);

    InOrder inOrder = Mockito.inOrder(builderMock);
    inOrder.verify(builderMock).withType(LogbookRecordType.DEPOSIT);
    inOrder.verify(builderMock).withPersonName(PERSONNAME);
    inOrder.verify(builderMock).withAccountNumber(ACCOUNTNUMBER);
    inOrder.verify(builderMock).build();
    verifyNoMoreInteractions(builderMock);
  }

I don't really like this approach because of the DRY principle (you're stubbing and verifying the same methods). Did you also notice the inOrder checking of the method calls? I'm not really proud about this, but that way I make sure the build method was called last.

One solution to stop repeating yourself might be using a default answer (ReturnSelfWhenTypesMatchAnswer) for your mocked builder which just returns the mock itself when the method's result type is some supertype of your mocked builder (so all your with-methods will return the mocked builder). That way your test looks like this:

  @Test
  public void createLogbookRecordIsTestedUsingDefaultAnswer() {
    LogbookRecordBuilder builderMock = 
      mock(LogbookRecordBuilder.class, new ReturnSelfWhenTypesMatchAnswer());
    when(builderMock.build()).thenReturn(LOG_RECORD);

    LogbookRecord result = client.createLogbookRecord(builderMock);
    assertThat(result).isEqualTo(LOG_RECORD);

    InOrder inOrder = Mockito.inOrder(builderMock);
    inOrder.verify(builderMock).withType(LogbookRecordType.DEPOSIT);
    inOrder.verify(builderMock).withPersonName(PERSONNAME);
    inOrder.verify(builderMock).withAccountNumber(ACCOUNTNUMBER);
    inOrder.verify(builderMock).build();
    verifyNoMoreInteractions(builderMock);
  }

The implementation of ReturnSelfWhenTypesMatchAnswer is quite simple:

public class ReturnSelfWhenTypesMatchAnswer implements Answer<Object> {

  @Override
  public Object answer(InvocationOnMock invocation) throws Throwable {
    Class<?> returnType = invocation.getMethod().getReturnType();
    if (returnType.isInstance(invocation.getMock())) {
      return invocation.getMock();
    }
    return null;
  }
}

Question I ask myself: why didn't anybody test the code like this?

  @Test
  public void createLogbookRecordIsTestedTheRightWayWithoutMocks() {
    LogbookRecordBuilder builder = new LogbookRecordBuilder();

    LogbookRecord result = client.createLogbookRecord(builder);

    assertThat(result.getType()).isEqualTo(LogbookRecordType.DEPOSIT);
    assertThat(result.getAccountNumber()).isEqualTo(ACCOUNTNUMBER);
    assertThat(result.getPersonName()).isEqualTo(PERSONNAME);
  }

For me, the only reasonable answer to the above question would be: because our builder is doing incredibly complex stuff which we don't want to re-test. Did you also notice the last test is the shortest and most readable version of all without copy-pasting any production code?

Ok, so I resulted back in discouraging you from writing mock tests. My original intention however was to show that using a default answer for mocks can sometimes be a more elegant solution than stubbing lots of stuff.

maandag 9 juli 2012

1 + 1 = 3. The flaws of strict (j)unit testing.

Sometimes extreme is good, sometimes it isn't. I like eXtreme Programming a lot: applying best programming practices to the extreme to make exceptional software. However, I believe sometimes you need to be pragmatic. When you apply unit testing to the extreme, I think you are missing important feedback.

The concept that 1 + 1 = 3 is especially relevant when writing object-oriented code. Two or more classes working together often reinforce each other's behavior and the result of their cooperation is more than just the sum of their individual parts.

What I am trying to dissuade is blindly applying strict unit testing to the extreme: some people believe every object different from the object tested should be mocked or stubbed.
The basic idea is good: you don't want to test things twice. However, I think people who only write tests this way to test their entire systems are ignoring important issues with this way of testing. I am pleading for what I call Behavioral Unit Testing (BUT), meaning that the unit tests should test the actual result of the code and stop focusing on implementation, in the end expressing as much business value as possible.

Changing behavior of one class can have impact on another class using this class
The worst nightmare of each committed developer is breaking code without noticing it. Suppose you have a Person class with a getMainAddress method, returning null if no main address is set. Changing its behavior so it throws a NoAddressFoundException impacts other classes using this method. When those classes are tested using mocks, you won't see you broke something by changing the behavior of the Person class. However, if you use real Person instances in the other classes' tests and you wrote tests for persons without a main address, you will notice the changed behavior of the Person class has impact on the other classes as well and you can fix your tests accordingly.

Behavioral Unit Tests better express business value than mock tests
I assume everyone agrees that testing behavior is more desirable and valuable than testing your implementaton. Besides, when writing a BUT, you have far more chance you can talk about the expected behaviour of your test with business people, reinforcing the ubiquitous language throughout your team. Another advantage of testing behavior is that you think in two ways about the problem you're solving: what should be the solution (this is your test) and how to get to the solution (the implementation). That way you reduce the chance of false positives: working tests while the implementation is wrong.

Mock tests provide less feedback
When writing mock tests to the extreme, you are having the risk of expecting an algorithm to work without really testing it thoroughly. Never thought your algorithm would work and it turned out otherwise? Then come over to our team and stop writing tests, it's such a waste of time ;-)

Strict mock tests are not very refactor-proof
This one is obvious to me: If you have a mock test and you refactor your domain code, chances are you need to change your mock test too (unless you just renamed some methods). When you use real domain objects, you don't need to change any test (assuming you only refactored without changing behavior). Since I see tests as a safety net, I feel a lot safer having behavior tests then I would when I'd only have mock tests. A special case of this: when mock tests fail, developers tend to fix it by just adding or modifying a verify (I've added/changed it in the code, so now I'll add/change it in my test). Consecutively, people will think less about why tests are failing, they just fix tests by doing in the test what they already did in the code.

Cucumber and Fitnesse
When I am telling people I prefer behavior tests over mock tests, people always start worrying about the immense overhead which would be introduced by fancy tools such as Cucumber and Fitnesse. Actually, I'm not at all talking about these tools. They surely add extra value by offering visual tooling for business people, but that's not my plan. I am just telling you can write the same test in junit as you would in Cucumber or fitnesse with the same business value, without the fancy gui, but also without all of the overhead.

Conclusion
There will always be people saying testing things twice is bad because of the DRY principle. I can tell those people consciously, I'd rather have 10 tests failing for the same reason than zero failing tests while I broke my implementation.

Other people are telling me they use mocks to hide complexity occuring behind the scenes of a cooperating object. I think this is a valid remark. In that case, make sure you have a solid idea about the api of your cooperating object and moreover be aware when this api is changing and adapt your tests accordingly. With api I mean not only the method signature, but also exception handling, how it deals with unexpected parameter values, ...

There are also some other cases where I think the use of mocks in unit tests are justified:

checking delegation
checking corner cases that are really hard to simulate (for example system errors, transaction timeouts, ...)
stubbing repositories, factories, very complex calculations, ...
checking notifications sent to loosely coupled components
interactions with other systems

Always watch your concrete situation, cause there's no silver bullet. However, when you have a choice, try to reduce the number of mocks since you don't want to (re)write your tests with each refactoring. Furthermore, with mocks, there's always a chance of introducing non-obvious blind spots in your unit tests.