JonBlog
Thoughts on website ideas, PHP and other tech topics, plus going car-free
Robust and parallelisable browser tests with PHP, GeckoDriver, and Symfony Panther
Categories: Uncategorized

Introduction

I am building a software project currently where the development focus is on a service layer that is not web-based. However, to visualise and iterate upon the work, I decided that I needed a simple web interface. Even though this contains nearly no business logic, even the simplest of CRUD websites can exhibit regressions, and since test automation is kinda my bag, I have built a simple PHP-based browser test system.

There are a few features that make this system worth writing about, particularly in the way tests are kept isolated from each other. This ensures that tests do not accidentally become dependent on the prior state left by other tests, and it gives me the ability to turn on parallelisation with nearly no extra engineering.

Here’s a tech run-down to whet the reader’s appetite:

  • PHP 8.1 – might bump up to 8.2, but this is pretty modern
  • SQLite – the techniques here would work with any database, but I like using SQLite for proof of concept work
  • Migrations system – I use something custom, but anything from Doctrine, Laravel, Phinx etc would be fine here. The main thing is that the app can build its own database without needing manual intervention
  • PhpUnit – I keep meaning to try alternatives like Pest, but honestly, this is the grand-daddy
  • Symfony Panther – a PHP library to talk to any of Selenium, GeckoDriver, and ChromeDriver
  • Docker Compose – to split the app and the browser into separate containers
  • GeckoDriver – the native WebDriver for Firefox
  • ParaTest – a library to parallelise tests

Considerations

There’s a few constraints that I wanted to apply to this project.

The first one was that I didn’t want to build the test browser system in the same container as the app. This would have created an excessively large image that contained far more stuff than is needed to run the app in production. So at an early stage I wanted this to be two containers sitting on the same virtual network. (At a late stage I also considered a test container that inherits from a production container – see later in this article).

To understand the second issue, the reader needs to be aware that GeckoDriver is a listener binary that needs to run when the browser tests are running. This accepts remote control messages from Panther (over TCP) and converts them into a protocol that the browser can understand.

To my surprise, it turns out that a GeckoDriver instance does not permit more than one browser to run (and the authors say this behaviour is specification-compliant). This can cause problems with tests requiring browser-to-browser effects (not relevant for my app) or running tests concurrently (definitely something I want).

Now this is easy if one is running everything (the app, the web server, GeckoDriver, and Firefox) in a single container – every instance of a Panther browser creates its own GeckoDriver, and this approach allows any number of parallel Firefox sessions to co-exist (within available RAM). The solution when splitting over containers is to pre-start a fleet of GeckoDrivers in the remote container, on a sequence of TCP ports, and then add some logic in the tests to connect to the right port.

Finally, I want each test to use its own database, to achieve convincing test isolation. This is interesting to achieve in practice, as a test usually consists of a sequence of page visits, clicks, types, form submits, etc. Thus the system to do this has to work over a browsing session, rather than for every atomic browser operation.

Test isolation

The way browser tests are isolated from each other is interesting enough I wanted to give it some substantive treatment. It’s worth noting that an endpoint was added to the app to notify it of the database name. This is not a production feature, and indeed it is set up so it only works in the test environment, but it is worth noting that purists may bristle at this approach – it violates the theory that production images should not contain any test code.

The notification of the database name from the test to the application can be seen in this sequence diagram:

SequenceTestsBrowserApplication
1The test bootstrap runs and creates a session with a GeckoDriver
2The test can read its parallelisation index set up via an environment variable set up by ParaTest (1…n) and contact the corresponding driver
3GeckoDriver creates a browsing session
4In the test bootstrap, create a random database name and use the migration system to create the tables and other objects
5Still in the test bootstrap, make a call to the app to create a test-only call to the app to set up the database name
6GeckoDriver receives this request and calls the app
7The app receives this call, starts a PHP session, and sets the database name, as long as it has not already been done in the current session
8Now a test can start with a local URL visit command
9GeckoDriver sends this command on
10The app renders the page using the database name in the session
11The test clicks on a page element (e.g. a link)
12GeckoDriver sends this command on as a page visit
13The app renders the new page using the database specified in the session

Code

At some point I may publish a demo of this work. For now here is some code to point readers in the right direction.

Firstly the Docker code for the browser container:

    FROM instrumentisto/geckodriver

    WORKDIR /root

    COPY docker/install/gecko.sh .
    RUN chmod +x /root/gecko.sh

    CMD ["/root/gecko.sh"]

    EXPOSE 4445
    EXPOSE 4446
    EXPOSE 4447
    EXPOSE 4448
    EXPOSE 4449

    # This removes the entrypoint from the parent
    ENTRYPOINT []

Here is the start script (“gecko.sh”) for the listeners:

    #!/bin/sh

    geckodriver --binary=/opt/firefox/firefox --log=debug --host=0.0.0.0 --port=4444 &
    geckodriver --binary=/opt/firefox/firefox --log=debug --host=0.0.0.0 --port=4445 &
    geckodriver --binary=/opt/firefox/firefox --log=debug --host=0.0.0.0 --port=4446 &
    geckodriver --binary=/opt/firefox/firefox --log=debug --host=0.0.0.0 --port=4447 &
    geckodriver --binary=/opt/firefox/firefox --log=debug --host=0.0.0.0 --port=4448 &
    # Note that the last one is blocking
    geckodriver --binary=/opt/firefox/firefox --log=debug --host=0.0.0.0 --port=4449

Next is a PHP snippet to create a parallelised Panther client in PHP. I use a custom FirefoxManager here to stop a local geckodriver instance from being started (for now, the code is just commented out). You can see here that TEST_TOKEN from ParaTest is used to decide which of six parallel instances to send the test commands to.

Note at the end we use an endpoint to set the database name, just once per session.

    /**
     * This arrangement allows us to connect to a remote geckodriver. The
     * host address here is the dynamic DNS name made available by Docker
     * Compose.
     */
    protected function getClient(): Symfony\Component\Panther\Client
    {
        // This is a sequential test number from ParaTest
        $token = getenv('TEST_TOKEN') ? (int) getenv('TEST_TOKEN') : 1;

        $client = new Symfony\Component\Panther\Client(
            new MyApp\FirefoxManager(
                '/project/tests/browser/bin/geckodriver',
                null,
                [
                    'host' => $this->getGeckoAddress(),
                    'port' => 4444 - 1 + $token,
                ]
            ),
            null
        );

        // We notify the app of the database path
        $client->get($this->getBaseUrl() . '/db-init.php?dbname=' . $this->dbName);

        return $client;
    }

The endpoint to notify the app of the database name will vary depending on the structure of the app, but here’s mine for interest:

    <?php
    
    /**
     * This is a file that only works in the test environment,
     * and is used to reset the location of the database file.
     */
    
    session_start();
    $envName = getenv('ENVIRONMENT');
    $oldDbName = $_SESSION['dbname'] ?? null;
    $envOk = $changed = false;
    if ($envName === 'test') {
        $envOk = true;
        if (!$oldDbName) {
            $dbName = $_GET['dbname'] ?? null;
            if ($dbName) {
                $changed = true;
                $_SESSION['dbname'] = $dbName;
            }
        }
    }
    
    echo json_encode([
        'env_ok' => $envOk,
        'changed_ok' => $changed,
    ]);

When the main app is visited, it will start the PHP session system, and pick up the database name if it exists.

Future improvements

I could explore the possibility of running tests in an inherited container rather than a sibling container. This would simplify quite a lot of the issues I bumped into here, particularly in needing to set up a fleet of always-running GeckoDriver instances on the browser container. It would certainly be interesting to try it, though at some point one has to reject the temptation of another side project!

Another impact of starting always-running GeckoDriver instances remotely is that Panther needs a customisation not to start this binary in the app container. It does not permit this very well currently, so I had to use some customised copies of Panther classes. I will raise a GitHub issue to see if the maintainers would be interested in an option to ask Panther not to start the driver listener.

Leave a Reply