Introducing WebDriver

Friday, May 8, 2009 at 4:45 PM

WebDriver is a clean, fast framework for automated testing of webapps. Why is it needed? And what problems does it solve that existing frameworks don't address?

For example, Selenium, a popular and well established testing framework is a wonderful tool that provides a handy unified interface that works with a large number of browsers, and allows you to write your tests in almost every language you can imagine (from Java or C# through PHP to Erlang!). It was one of the first Open Source projects to bring browser-based testing to the masses, and because it's written in JavaScript it's possible to quickly add support for new browsers that might be released

Like every large project, it's not perfect. Selenium is written in JavaScript which causes a significant weakness: browsers impose a pretty strict security model on any JavaScript that they execute in order to protect a user from malicious scripts. Examples of where this security model makes testing harder are when trying to upload a file (IE prevents JavaScript from changing the value of an INPUT file element) and when trying to navigate between domains (because of the single host origin policy problem).

Additionally, being a mature product, the API for Selenium RC has grown over time, and as it has done so it has become harder to understand how best to use it. For example, it's not immediately obvious whether you should be using "type" instead of "typeKeys" to enter text into a form control. Although it's a question of aesthetics, some find the large API intimidating and difficult to navigate.

WebDriver takes a different approach to solve the same problem as Selenium. Rather than being a JavaScript application running within the browser, it uses whichever mechanism is most appropriate to control the browser. For Firefox, this means that WebDriver is implemented as an extension. For IE, WebDriver makes use of IE's Automation controls. By changing the mechanism used to control the browser, we can circumvent the restrictions placed on the browser by the JavaScript security model. In those cases where automation through the browser isn't enough, WebDriver can make use of facilities offered by the Operating System. For example, on Windows we simulate typing at the OS level, which means we are more closely modeling how the user interacts with the browser, and that we can type into "file" input elements.

With the benefit of hindsight, we have developed a cleaner, Object-based API for WebDriver, rather than follow Selenium's dictionary-based approach. A typical example using WebDriver in Java looks like this:

// Create an instance of WebDriver backed by Firefox
WebDriver driver = new FirefoxDriver();

// Now go to the Google home page
driver.get("http://www.google.com");

// Find the search box, and (ummm...) search for something
WebElement searchBox = driver.findElement(By.name("q"));
searchBox.sendKeys("selenium");
searchBox.submit();

// And now display the title of the page
System.out.println("Title: " + driver.getTitle());

Looking at the two frameworks side-by-side, we found that the weaknesses of one are addressed by the strengths of the other. For example, whilst WebDriver's approach to supporting browsers requires a lot of work from the framework developers, Selenium can easily be extended. Conversely, Selenium always requires a real browser, yet WebDriver can make use of an implementation based on HtmlUnit which provides lightweight, super-fast browser emulation. Selenium has good support for many of the common situations you might want to test, but WebDriver's ability to step outside the JavaScript sandbox opens up some interesting possibilities.

These complementary capabilities explain why the two projects are merging: Selenium 2.0 will offer WebDriver's API alongside the traditional Selenium API, and we shall be merging the two implementations to offer a capable, flexible testing framework. One of the benefits of this approach is that there will be an implementation of WebDriver's cleaner APIs backed by the existing Selenium implementation. Although this won't solve the underlying limitations of Selenium's current JavaScript-based approach, it does mean that it becomes easier to test against a broader range of browsers. And the reverse is true; we'll also be emulating the existing Selenium APIs with WebDriver too. This means that teams can make the move to WebDriver's API (and Selenium 2) in a managed and considered way.

If you'd like to give WebDriver a try, it's as easy as downloading the zip files, unpacking them and putting the JARs on your CLASSPATH. For the Pythonistas out there, there's also a version of WebDriver for you, and a C# version is waiting in the wings. The project is hosted at http://webdriver.googlecode.com, and, like any project on Google Code, is Open Source (we're using the Apache 2 license) If you need help getting started, the project's wiki contains useful guides, and the WebDriver group is friendly and helpful (something which makes me feel very happy).

So that's WebDriver: a clean, fast framework for automated testing of webapps. We hope you like it as much as we do!

11 comments:

mikeal said...

This is very interesting and I'll be digging in to this code over the next few weeks.

I am a little concerned about the continuing assertion that testing cross domain is a limitation of the approach Selenium has taken, it's not. Windmill has the same Proxy hack approach and has worked cross site for over a year now using a series of forwarding hacks which the Selenium RC could do provided someone took the time to write it.

This approach side steps the content security model but I'd like to see what debugging WebDriver tests is like since the event simulation is another layer removed from content.

skugg said...

Gosh, it sounds great!

Hi Simon :)

greis said...

and what about the name.. it will continue "Selenium" or will change to "Webdriver" or maybe both together "Selenium Webdriver"???

MrAjax said...

The comparison to Selenium is helpful. But how does Web Driver compare to iMacros? http://wiki.imacros.net/Selenium

For me, it seems WebDriver stands exactly in the middle between these two options?

Can it control Ajax and Flash applets?

Mark said...

Looks very interesting. Although I can see it being used to 'grab' data from other sites. It's the ultimate page-scraper!

Lbordea said...

Very interesting - it's a good market to get into. I shall have a look over it. By the way, looking forward for the C# version :)

Frank said...

The example code for WebDriver looks pretty similar to code that uses WatiN. Any comparison between the two available?

Simon Stewart said...

@mikeal: Testing cross domain is one of the drawbacks of the approach taken by selenium (and windmill too) because that approach is constrained by the JS security model. Writing a custom proxy is certainly one way of solving this problem but it doesn't work around other limitations imposed by the JS security model.

Simon Stewart said...

@MrAjax: I'd not seen iMacros before, and it looks interesting. After a quick look through the site, the major differences between that and webdriver are that webdriver is platform independent (running on OS X, Linux and Windows), supports Java (though perhaps I missed this reference?), and has nascent Opera support.

In addition, the equivalent of "DirectScreen technology" (which we call "native events") is exposed to the user without requiring the browser window to take focus and without needing to specify screen coordinates. It also happens automatically, without the user needing to do anything special; it's how clicking, drag and drop and typing work on IE. Admittedly, we haven't finished rolling this out to all browsers on all platforms, but we're working on it :)

Simon Stewart said...

@greis The name of the merged project will be Selenium 2.

The interface names won't change, though, so you can start writing tests with the WebDriver API and when Selenium 2 is released, just change the JAR file being used.

Simon Stewart said...

@Frank I'm a big fan of WatiN: it's a lovely project and I admire some of the ways that they've solved some thorny problems.

WatiN and WebDriver take very similar approaches, especially on Windows. The major differences are:

* WebDriver is designed to be cross-platform, cross-browser and to support multiple languages (not just Java, but also C#, Python and Ruby) WatiN is designed for use within .Net, which limits its usability on different platforms (though perhaps mono may help here)

* WatiN's support for handling dialogs is better than WebDriver's, at least for now.

* WebDriver's API is less granular than WatiN's. WebDriver only has a "WebElement" interface, whereas WatiN has a subclass for many HTML elements. Which is preferable is something that is a matter of taste.

* WebDriver uses native events where possible (that is, sending window messages) whereas WatiN appears to simulate events on the DOM. While this is fine most of the time, it does mean that some corner cases are better handled in WebDriver. For example, what happens when a user clicks on an element that's totally covered by another? By synthesizing the event on the DOM, WatiN would click on the lower element, whereas WebDriver would click on the upper (as a user would)