Moon: the Swiss Army Knife for Browser Automation

Published in

Aerokube

10 min readOct 28, 2020

Hi there,

During the last years we have been talking a lot about efficient Selenium infrastructure (full list of articles can be found on our web site). Three months ago we announced full support for a new emerging browser automation technology called Playwright. Today we would like to show you the third popular browser automation approach called Chrome Developer Tools Protocol.

Every modern browser nowadays has so-called developer toolbar — a graphical user interface mainly needed for web site developers. This toolbar allows to analyze in detail what’s happening when you open a web site. For example you can visualize how every piece of HTML is rendered, which HTTP and WebSocket requests are being sent, debug Javascript, inspect cookies and local storage and many more. Being the most popular browser Google Chrome and other browsers based on Chromium (e.g. Opera, Vivaldi, Brave, Yandex Browser and recent versions of Microsoft Edge) certainly have such toolbar.

Internally this toolbar represents an independent web-application that communicates with main Chrome process using a protocol called simply Chrome Developer Tools Protocol or CDP. Initially Chrome main process was able to handle Chrome Developer Tools Protocol commands coming from one possible source (also called client) — developer toolbar. But since Chrome 63.0 it became possible to send commands to browser from multiple clients simultaneously. Not only the developer toolbar but any external program can now easily interact with browser and use any of powerful features provided by the toolbar. That is to say Chrome Developer Tools Protocol is now a new way of browser automation. Why do we need one more browser automation approach in the world where Selenium is a de-facto browser automation standard? Simply because web application development is evolving fast and Selenium is already missing some features that could dramatically simplify automated testing of such new generation web apps. For example, in the world of single-page web applications it seems to be an obvious feature to have full control of any network requests executed in the background or being able to directly manipulate page layout, but these features are simply not present in Selenium protocol standard.

CDP protocol internals

First of all let’s quickly familiarize with the Chrome Developer Tools Protocol itself. Contrarily to Selenium where every command like opening the browser, navigating to pages, clicking on buttons and so on represent an independent HTTP request, CDP is sending all commands through one long-living web socket connection. Because of that there is no need to spend time for establishing TCP connection or sending redundant HTTP headers for every command and everything works faster. In addition to faster communication channel web-sockets provide fully duplex connection, so not only the user can send a command to the browser, but also the browser can independently notify the user in case of any events occurring on tested pages. All information in CDP is transferred in JSON format. Two types of interactions between the user and the browser exist: request — response and subscribing to events. In request — response mode everything works similarly to Selenium. User sends some command e.g. take a screenshot and then waits for a command execution result from browser. A typical request body looks like the following:

{
    "id":12345, 
    "method":"Page.captureScreenshot",
    "params":{   }
}

It always contains auto-incrementing request identifier, a method name and an optional set of parameters. For example in case of taking a screenshot params can contain screenshot output format or quality. For every request you will sooner or later receive a response:

{
    "id":12345, 
    "result":{   }
}

Response body will also contain id field with exactly the same value that was in request id and result section containing command execution result. For example when you request a screenshot this section will contain base64-encoded screenshot bytes. To understand why every request and response has an id field - let's remember that all CDP messages including event notifications are being transferred through the same network connection. Additionally Chrome Developer Tools Protocol is asynchronous. That means that the user can receive events from browsers and responses to commands in any order. For example the response to request 2 can arrive before the response to request 1. So numeric identifiers are needed to simply understand which response corresponds to which request.

The second type of interaction in CDP is subscribing to browser events. In this mode user first of all allows browser to send desired types of events, e.g. when a network request is sent or a Javascript error occurs. When such events occur — user automatically receives a JSON with event type and additional information like this:

{
    "method":"Page.frameStartedLoading",
    "params":{
        "frameId": "3310BE4384D4A1B366794A36A8A58FC3"
    }
}

Information about all requests, responses and events available in Chrome Developer Tools protocol is stored in Chromium sources in JSON format. Copies of these files can be found in Github repository. From these JSON source files Chrome development team automatically generates a human-readable protocol documentation available here.

If you take a look at protocol documentation — you will notice that the protocol is divided into sections called domains, e.g. Browser, Network, Page, Runtime and so on. Every domain roughly corresponds to one of the features of Chrome developer toolbar. For example Network domain contains everything related to sending network requests from browser and Runtime corresponds to Javascript execution runtime. Inside every domain you will find at least two subsections: methods and events.

Methods corresponds to available types of requests you can send to browser in request-response mode. Every method definition contains possible request parameters as well as response data type:

Events in its turn correspond to various events you can subscribe to:

By default browser does not send any events to user. If you wish to start receiving events you need to call enable method corresponding to respective domain, e.g. Page.enable for Page domain or Network.enable for Network domain.

It is certainly interesting to dive into protocol internals. However our final goal is efficient browser automation which is impossible without ready-to-use client libraries. Hopefully CDP features are now available in any mainstream programming language. You can find a list of open-source libraries for Java, Python, C#, PHP, Go and other languages in awesome-chrome-devtools repository. One of the most popular client libraries based on Chrome Developer Tools protocol and maintained by Google is called Puppeteer. Originally developed as a Typescript library Puppeteer now has now been ported to Python, Rust and C#. Contrarily to the majority of CDP client libraries, Puppeteer delivers higher level abstraction on top of CDP protocol than simply providing generated types and methods. An example Puppeteer test taking a screenshot looks like the following:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://aerokube.com');
  await page.screenshot({path: 'example.png'});

  await browser.close();
})();

Parallel execution of CDP tests

One of the main issues of Chrome Developer Tools protocol is that for historical reasons it does not define a method for launching the browser. In web application development purposes this does not seem to be a problem — one browser is sufficient. In automated testing contrarily we have to run dozens and hundreds of browsers in parallel to provide faster feedback and this is where lack of browser startup method becomes important. The majority of libraries compatible with CDP provide an API for launching the browser like browser.launch() in Puppeteer example above. This method simply finds Chrome executable, starts it and then connects to respective web socket for sending CDP commands to browser. Although launching multiple processes of Chrome (especially in headless mode) on the same machine allows to run tests faster, it is anyway limited by total number of computing resources (CPUs and memory) on this machine. So if you need to launch more Chrome instances in parallel this approach will not work.

In our previous articles we have already been talking about Moon — our flagship browser automation solution running everything in Kubernetes or Openshift cluster. In Moon release 1.7.0 we finally implemented an ability to remotely execute parallel tests using Chrome Developer Tools Protocol. Similarly to Playwright to create a remote CDP session on Moon instance available at moon.example.com host you have to change only one line in your code.

// Local execution
const browser = await puppeteer.launch();

// Remote execution with Moon
const browser = await puppeteer.connect({timeout: 0, browserWSEndpoint: 'ws://moon.example.com:4444/cdtp/chrome'});

By updating browserWSEndpoint parameter value you can easily choose which Chrome version to launch, whether to record video of running browser or run browser in headless mode. For every connection to this web socket - Moon will create a new browser instance, thus allowing to run as many browsers in parallel as your Kubernetes cluster allows. When your code disconnects from the web socket, browser instance is automatically removed.

Real use cases

Already knowing how to run in parallel any number of Chrome sessions automated by Chrome Developer Tools Protocol, let’s take a look at possible real-life usage scenarios. We’ll use Puppeteer in examples but the same features are available for all programming languages. Also we’ll use puppeteer-core module instead of puppeteer because everything will be running remotely and puppeteer module additionally downloads a Chromium browser binary.

Printing and mocking network requests

The first thing we’ll be talking about is everything related to working with network requests. CDP allows to easily print all HTTP and web socket requests being sent by browser. To get this information you need to subscribe to respective events in Network domain: Network.requestWillBeSent, Network.loadingFailed, Network.responseReceived. In case of Puppeteer subscribing to Network.requestWillBeSent event is wrapped in syntactic sugar like this:

const puppeteer = require('puppeteer-core');

(async () => {
  const browser = await puppeteer.connect({timeout: 0, browserWSEndpoint: 'ws://moon.example.com:4444/cdtp/chrome'});
  const page = await browser.newPage();

  page.on('request', req =>	console.log(req.url)); // Network.enable method is called automatically by Puppeteer under the hood

  await page.goto('https://example.com');

  await browser.close();
})();

This code will only print all network requests being sent by browser without showing their execution result. To get detailed information about request status code and duration you have to subscribe to other events like requestfinished and requestfailed using the same pattern.

Getting detailed information about network requests being sent can dramatically simplify tests debugging. However better approach for tests stability is mocking the most important requests. Chrome Developer Tools protocol allows emulating request failure, manipulating request and response headers and even completely replace response body. This feature is called request interception. Everything related to request interception resides in Fetch domain and Fetch.requestPaused event. When you subscribe to this event every request containing URL that matches desired regular expression will be "paused" allowing you to decide what to do with it. Puppeteer example code looks like this:

const puppeteer = require('puppeteer-core');

(async () => {
  const browser = await puppeteer.connect({timeout: 0, browserWSEndpoint: 'ws://moon.example.com:4444/cdtp/chrome'});
  const page = await browser.newPage();

  await page.setRequestInterception(true);
  page.on('request', (request) => {
    if (request.url() === "http://example.com/api/method") {
        request.respond({
            status: 200,
            contentType: 'text/json; charset=utf-8',
            body: '{"key": "mocked-value"}'
        });
    } else {
        request.continue();
    }
  });

  await page.goto('https://example.com');

  await browser.close();
})();

Print console messages and Javascript exceptions

It is a common situation when new changes deployed to testing environment completely break it because of a bug in Javascript. On the screenshots you are seeing blank screen or broken HTML and it can take a lot of time to reproduce this bug. Hopefully with CDP you can easily print any console messages and exception being output to remote browser console by Javascript:

const puppeteer = require('puppeteer-core');

(async () => {
  const browser = await puppeteer.connect({timeout: 0, browserWSEndpoint: 'ws://moon.example.com:4444/cdtp/chrome'});
  const page = await browser.newPage();

  page
    .on('console', message => console.log(`${message.type()} ${message.text()}`))
    .on('pageerror', ({ message }) => console.log(message));

  await page.goto('https://example.com');

  await browser.close();
})();

Taking element screenshots

Contrarily to Selenium allowing only the screenshots of the entire page, CDP out of the box can easily take a screenshot of any page element by CSS selector:

const puppeteer = require('puppeteer-core');

(async () => {
  const browser = await puppeteer.connect({timeout: 0, browserWSEndpoint: 'ws://moon.example.com:4444/cdtp/chrome'});
  const page = await browser.newPage();

  await page.goto('https://example.com');

  await page.waitForSelector('#some-element');
  const element = await page.$('#some-element');
  await element.screenshot({path: 'element-screenshot.png'});

  await browser.close();
})();

CSS and DOM manipulation

With Chrome Developer Tools protocol it’s a straightforward task to manipulate DOM tree and CSS styles. You can also set custom browser viewport size to test web application responsiveness or user agent to open version adapted for mobile platforms or even preview version for printing (so-called @media print):

const puppeteer = require('puppeteer-core');

(async () => {
  const browser = await puppeteer.connect({timeout: 0, browserWSEndpoint: 'ws://moon.example.com:4444/cdtp/chrome'});
  const page = await browser.newPage();

  await page.setViewport({
    width: 640,
    height: 480,
    deviceScaleFactor: 1,
  });
  await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1');
  await page.emulateMediaType('print');

  await page.goto('https://example.com');

  await page.addStyleTag({content: '.body{ background: red; }'});

  const element = page.$('#id');
  element.innerHTML = "new element contents";

  await browser.close();
})();

Conclusion

In this article we discussed in detail what is Chrome Developer Tools protocol, why it could be useful in your automated testing scenarios and how to run an unlimited number of CDP-backed tests with Moon. Supporting unlimited native and parallel Selenium, Playwright and CDP tests execution, Moon now is a true Swiss-army knife for browser automation. If you have any questions related to one of these protocols — don’t hesitate to contact us as support@aerokube.com. We also deliver the same features in our new product — Moon Cloud, which we will describe in detail in our next article.