zaterdag 29 juni 2013

Pain killing the pingponger's decease

(or how we got our ci builds green again)

Everyone (at least almost everyone I know) writing Selenium or webdriver tests knows it's not easy to keep all these tests stable. Most of the time, they run perfectly fine on your development machine and start failing after a while in your CI build (and I'm not talking about regression here).

I've felt this pain already in quite some projects and it's not always easy to find what's causing your tests to fail. On my current project, again we are having some pingpong playing webdriver tests (for those who don't understand: it means they run sometimes and sometimes they don't).

Of course your tests should be reproducible and consistently working or failing. And that should be your first step in solving the problem: find the cause of the instability in your test and fix it. Unfortunately when webdriver tests are involved this can be sometimes hard to achieve. The webdriver framework highly depends on references to elements that can become stale. When AJAX calls are involved, this can become a real headache-causer. I'm not going in too deep on all the possible causes of instability, since this is not the main subject of this blog.

So, how do you make your build less dependent upon some hickup of your CI system? At our current project we didn't have an auto-commit for about 3 months because of pingponging webdriver tests. Our 'solution' was, when a build had run and some tests failed, run them locally and if they pass, you can tag your build manually. It cost us a lot of time that could be better spend on more fun stuff.

While trying to find a solution for this problem, we had this idea: maybe we could just ignore the pingpong-playing webdriver tests by excluding them from the main build and running them separately. In that case our main build is no longer dependent upon the vagaries of our tests. That way we would have more auto-commits again, but we would introduce the risk that one of the pingpong-playing tests this time fails for the right reasons. When deploying this strategy, one could ask himself whether you shouldn't throw away the pingpong tests entirely, since you would be completely ignoring them.

Then, we came up with another solution which turned out to be our salvation. What is this magic drug we are using? It's quite simple actually: we created our own (extended) version of the JUnit runner we used before and let it retry pingponging tests. To accomplish this, we mark pingpong tests with the @PingPong annotation, using a maxNumberOfRetries property to define how many times the tests should be retried (default is one). The @PingPong annotation can be used both at method and class level to mark a single test or all tests in a test class as playing pingpong.

An example of a test class using the @PingPong annotation looks like this.

public class MyTest {

  @PingPong(maxNumberOfRetries = 2)
  public void iWillSucceedWhenSuccessWithinFirstThreeTries() {
    // ...

With MyVeryOwnTestRunnerRetryingPingPongers defined like this.

public class MyVeryOwnTestRunnerRetryingPingPongers extends SomeTestRunner
    implements StatementFactory {

  public MyVeryOwnTestRunnerRetryingPingPongers (Class aClass)
      throws InitializationError {

  protected Statement methodBlock(FrameworkMethod frameworkMethod) {
    return new StatementWrapperForRetryingPingPongers(frameworkMethod, this);

  public Statement createStatement(FrameworkMethod frameworkMethod) {
    return super.methodBlock(frameworkMethod);

You still need the implementation of StatementWrapperForRetryingPingPongers. You can find this one here.

I am conscious this is just a pain killer, but it's one that helped us getting our builds green again and gives us extra time to fix our instable pingpong tests more thoroughly.

Please let me know if this post was helpful to you. What do you think about our solution to our instability problem?

7 opmerkingen:

  1. Is it a solution?

    You call it a painkiller yourself, so it's more about covering up the symptoms, than healing the disease.

    Still, I can't say I can come up with a better plan, and perhaps masking the symptoms is all we actually need in this situation.

    In the long run, however, I think finding the cause of instability might be more beneficial.

    1. It's not a long-term solution as you said, but it helped us regaining time we lost constantly while releasing. And we released a lot the last couple of weeks (2 times a week approximately) But you are completely correct when you say it's not a solution in the long term.

  2. At VDAB, we use some tricks to let the tests run more predictable:

    - after webdriver loads a page, we disable all asynchronous jQuery animations (such as fade in, fade out, etc):

    $ = true;

    - in addition, we wait for all Ajax requests to have ended before doing assertions on a page. Some Javascript code monitors the start and end of all Ajax requests, and webdriver sleeps until all Ajax requests have ended.

    1. We used to have this, but I heard people claim it doesn't work for us. I don't know why though... Seems like a good solution to me. Are you working with a counter for keeping track of the pending ajax requests? Or what strategy do you use? And how do you make sure this counter is raised and lowered at the right times? The only reason I can think of why it doesn't work for us could be this case:
      -type some text in textfield A using webdriver + focus on next element (B)
      -Then click button (first triggering ajax call behind element B)
      When you try to click the button using webdriver, two things will happen in your browser: an event occurred on element B which might trigger an ajax call + the button is clicked. There is no check on the number of pending ajax calls between these two things happening in you browser. In your java code, you are only doing one thing, so you are unaware of what is happening inside your browser.

    2. We use Prototype (and not jQuery) to execute Ajax requests. Yes, we're old fashioned :-) Prototype has a property Ajax.activeRequestCount . So in our Webdriver tests, when we know an Ajax request might be running, we poll this activeRequestCount until it reaches 0.

      jQuery seems to have a badly-documented property that indicates the number of pending Ajax requests: $.active

  3. What kind of problems are you running into when the tests fail?
    We used to have a lot of problems with StaleElementReferenceExceptions, but managed to get rid of them entirely.
    If you have those, let me know, we might have a good solution for you.

  4. I think the better solution - is to divide your test into two groups. White list which contains only stable tests. Black list with flaky tests. And then steadily investigate and move tests from black to white list.

    The problem with re-runing failing test is if you get the passed result after re-run, you can't say for sure if your product working well. So, such tests are useless anyway..