README.md | Explore in Territory

# Buildbot Testing Configuration Files

The files in this directory control how tests are run on the
[Chromium buildbots](https://www.chromium.org/developers/testing/chromium-build-infrastructure/tour-of-the-chromium-buildbot).
In addition to specifying what tests run on which builders, they also specify
special arguments and constraints for the tests.

Adding a new test suite?

The bar for adding new test suites is high. New test suites result in extra
linking time for builders, and sending binaries around to the swarming bots.
This is especially onerous for suites such as browser_tests (more than 300MB
as of this writing). Unless there is a compelling reason to have a standalone
suite, include your tests in existing test suites. For example, all
InProcessBrowserTests should be in browser_tests. Similarly any unit-tests in
components should be in components_unittests.

## A tour of the directory

[tests in starlark]: /infra/config/targets#tests-in-starlark

* <builder_group\>.json -- test configuration json files. These are used to
configure what tests are run on what builders, in addition to specifying
builder-specific arguments and parameters. They are autogenerated, mainly
using the generate_buildbot_json tool in this directory.
* [generate_buildbot_json.py](./generate_buildbot_json.py) -- generates most of
the buildbot json files in this directory, based on data contained in the
waterfalls.pyl, test_suites.pyl, and test_suite_exceptions.pyl files.
* [waterfalls.pyl](./waterfalls.pyl) -- describes the bots on the various
waterfalls, and which test suites they run. By design, this file can only refer
(by name) to test suites that are defined in test_suites.pyl.
* [mixins.pyl](./mixins.pyl) -- describes reusable bits of configuration that
can be used to modify the expansion of tests from waterfalls.pyl into the
generated test specs. This file isn't actually used by when generating files in
this directory, instead it uses the one generated from starlark (see below).
This file needs to exist here for the generation of the targets json files in
the angle repo.
* [test_suite_exceptions.pyl](./test_suite_exceptions.pyl) -- describes
exceptions to the test suites, for example excluding a particular test from
running on one bot. The goal is to have very few or no exceptions, which is why
this information is factored into a separate file.
* [trybot_analyze_config.json](./trybot_analyze_config.json) -- used to provide
exclusions to
[the analyze step](https://www.chromium.org/developers/testing/commit-queue/chromium_trybot-json)
on trybots.
* [filters/](./filters/) -- filters out tests that shouldn't be
run in a particular mode.
* [check.py](./check.py) -- makes sure the buildbot configuration json
satisifies certain criteria.

*** note
**NOTE:** this directory has been updated to get non-builder specific
information (mixins, test suites, variants and binary information) from files
generated by starlark. The following files are read from
[/infra/config/generated/testing](/infra/config/generated/testing) when
generating the .json files in this directory. Other uses of this script will
contain hand-written versions of these files in the same directory as their
waterfalls.pyl and test_suites_exceptions.pyl. See [here][tests in starlark] for
information on files that have been migrated.
***

* [test_suites.pyl](/infra/config/generated/testing/test_suites.pyl) --
describes the test suites that are referred to by waterfalls.pyl. A test suite
describes groups of tests that are run on one or more bots.
* [mixins.pyl](/infra/config/generated/testing/mixins.pyl) -- describes reusable
bits of configuration that can be used to modify the expansion of tests from
waterfalls.pyl into the generated test specs.
* [variants.pyl](/infra/config/generated/testing/variants.pyl) -- describes
reusable bits of configuration that can be used to expand a single test suite
into multiple test specs so that a test can be run under multiple
configurations.
* [gn_isolate_map.pyl](/infra/config/generated/testing/gn_isolate_map.pyl) --
maps Ninja build target names to GN labels. Allows for certain overrides to get
certain tests targets to work with GN (and properly run when isolated).

## How the files are consumed
### Buildbot configuration json
Logic in the
[Chromium recipe](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipes/chromium.py)
looks up each builder for each builder group, and the test generators in
[chromium_tests/generators.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/generators.py)
parse the data into structures defined in
[chromium_tests/steps.py.](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/steps.py)

## Making changes

The majority of the JSON files in this directory are autogenerated. The "how to
use" section below describes the main tool, `generate_buildbot_json.py`, which
manages most of the waterfalls. It's not possible to hand-edit the JSON
files; presubmit checks forbid doing so.

Note that trybots mirror regular waterfall bots, with the mapping defined either
in
[trybots.py](https://chromium.googlesource.com/chromium/tools/build/+/HEAD/recipes/recipe_modules/chromium_tests/trybots.py).
or in the bots' `mirrors = ` attribute in their //infra/config/ definitions.
This means that, as of
[5af7340b](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/testing/buildbot/),
if you want to edit
[linux-wayland-rel](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/infra/config/subprojects/chromium/try/tryserver.chromium.linux.star#280),
you actually need to edit
[Linux Tests (Wayland)](https://chromium.googlesource.com/chromium/src/+/5af7340b4eb721380944ebc70ee28c44f21f0740/testing/buildbot/waterfalls.pyl#4895).

### Trying the changes on trybots
You should be able to try build changes that affect the trybots directly (for
example, adding a test to linux-rel should show up immediately in
your tryjob). Non-trybot changes have to be landed manually :(.

## Capacity considerations when editing the configuration files
When adding tests or bumping timeouts, care must be taken to ensure the
infrastructure has capacity to handle the extra load.  This is especially true
for the established
[Chromium CQ builders](https://chromium.googlesource.com/chromium/src/+/HEAD/infra/config/generated/cq-builders.md),
as they operate under strict execution requirements. Make sure to get a resource
owner or a member of Chrome Browser Infra to sign off that there is both builder
and swarmed test shard capacity available. The suggested process for adding new
test suites to the CQ builders is to:
1. File a bug if one isn't already on-file for the addition of the tests, assign
   it to yourself and apply the `Infra>Client>Chrome` component.
1. Add the test in both "post-submit only" and "experimental" mode:
   - Post-submit only mode will make the test run on post-submit bots, but not
     on pre-submit bots (a.k.a. CQ bots). This can be achieved by adding the
     `'ci_only': True` line to the test's definition in the pyl files here.
     ([Example.](https://chromium.googlesource.com/chromium/src/+/79ed7956/testing/buildbot/test_suite_exceptions.pyl#934))
     See the [infra glossary](../../docs/infra/glossary.md) for the distinction
     between a pre-submit and post-submit builder.
   - Experimental mode will prevent the test's failures from failing and turning
     the build red. This can be achieved by adding the
     `'experiment_percentage': 100` line to the test's definition in the pyl
     files here.
     ([Example.](https://chromium.googlesource.com/chromium/src/+/79ed7956/testing/buildbot/test_suite_exceptions.pyl#888))
1. After about one day's worth of builds have passed, examine the results of the
   the test on the affected post-submit builders. If they're green with little
   to no flakes, it can be promoted out of experimental. If there's more than
   a handful of flakes (e.g. 1 or more per day), then the test needs to be
   de-flaked before moving on. Once that's done, it can then be moved out of
   experimental and you can proceed to the next step.
1. After a sufficient amount of time (suggest 2 weeks), examine the results of
   the test on the affected post-submit builders to determine the amount of
   regressions they're catching. Note: unless the new test is providing unique
   info/artifacts (e.g. stack traces, log files) that pre-existing tests lack,
   exclude any regressions that _other_ tests also caught. We're only interested
   in the regressions that these new tests catch alone.
1. If the new tests aren't excessively flaky (use
   [this dashboard](http://shortn/_gP9pAC2IS3) to verify) and if they catch a
   sufficient number of regressions over that trial period, then they can be
   promoted to the CQ. To do so, see the steps below.
   **Note:** The precise number of regressions that need to be caught depends on
   the runtime of the tests. A large suite like browser_tests would need to
   catch multiple per week, while a much smaller one need not catch as many. If
   you're unsure if your tests meet the cutoff, proceed with the following steps
   and specify how many regressions were caught in the justification of the
   resource request. Depending on resources, the resource owners may not approve
   of the request. In which case, see step #5.
   1. Calculate the amount of machine resources needed for the tests. Googlers
      can use [this dashboard](http://shortn/_X75IFjffFk) to determine the
      amount of bots required by comparing it to a similar suite on the same
      builder. Do this for each CQ builder and each suite that's being added.
   1. File a [resource request](http://go/file-chrome-resource-bug) for the
      required amount of machines. Make sure to specify the correct type of bots
      needed (Linux, Windows, Android emulator, Android device, etc).
   1. If/when the request is approved and the resources have been deployed, you
      can remove the `'ci_only': True` line for the definitions here to start
      running the tests on the CQ.
1. If the new tests _don't_ catch regressions sufficiently frequently, then they
   don't provide a high-enough signal to warrant running on the CQ.
   Consequently, they should remain in post-submit only with a comment
   explaining why. This can be revisited if things change.

If your change doesn't affect the CQ but is expected to increase utilization in
the testing pools by any more than 5 VMs or 50 CPU cores per hour, it will still
need to be approved via a resource request. Consult the
[dashboard](http://shortn/_nyyTPgDJtF) linked above to calculate the resource
usage of a test change. See http://go/i-need-hw for the steps involved in
getting the approval.

## How to use the generate_buildbot_json tool
### Test suites
#### Basic test suites

The [test_suites.pyl](./test_suites.pyl) file describes groups of tests that run
on bots -- both waterfalls and trybots. In order to specify that a test like
`base_unittests` runs on a bot, it must be put inside a test suite. This
organization helps enforce sharing of test suites among multiple bots.

An example of a simple test suite:

    'basic_chromium_gtests': {
      'base_unittests': {},
    }

If a bot in [waterfalls.pyl](./waterfalls.pyl) refers to the test suite
`basic_chromium_gtests`, then that bot will run `base_unittests`.

The test's name is usually both the build target as well as how the test appears
in the steps that the bot runs. However, this can be overridden using dictionary
arguments like `test`; see below.

The dictionary following the test's name can contain multiple entries that
affect how the test runs. Generally speaking, these are copied verbatim into the
generated JSON file. Commonly used arguments include:

* `args`: an array of command line arguments for the test.

* `ci_only`: a boolean value (`True`|`False`) indicating whether the test
  should only be run post-submit on the continuous (CI) builders, instead
  of run both post-submit and on any matching pre-submit / cq / try builders.
  This flag should be set rarely, usually only temporarily to manage capacity
  concerns during an outage.

* `description`: a string to describe the test suite. The text will be shown on
  Milo.

* `swarming`: a dictionary of Swarming parameters. Note that these will be
  applied to *every* bot that refers to this test suite. It is often more useful
  to specify the Swarming dimensions at the bot level, in waterfalls.pyl. More
  on this below.

    * `can_use_on_swarming_builders`: if set to False, disables running this
      test on Swarming on any bot.

    * `idempotent`: if set to False, prevents Swarming from returning the same
      results of a similar run of the same test. See [task deduplication] for
      more info.

* `experiment_percentage`: an integer indicating that the test should be run
  as an experiment in the given percentage of builds. Tests running as
  experiments will not cause the containing builds to fail. Values should be
  in `[0, 100]` and will be clamped accordingly.

* `android_swarming`: Swarming parameters to be applied only on Android bots.
  (This feature was added mainly to match the original handwritten JSON files,
  and further use is discouraged. Ideally it should be removed.)

Arguments specific to GTest-based and isolated script tests:

* `test`: the target to build and run, if different from the test's name. This
  allows the same test to be run multiple times on the same bot with different
  command line arguments or Swarming dimensions, for example.

There are other arguments specific to other test types (script tests, JUnit
tests); consult the generator script and test_suites.pyl for more details and
examples.

### Compound test suites
#### Composition test suites

One level of grouping of test suites is composition test suites. A
composition test suite is an array whose contents must all be names of
individual test suites. Composition test suites *may not* refer to other
composition or matrix compound test suites. This restriction is by design.
First, adding multiple levels of indirection would make it more difficult to
figure out which bots run which tests. Second, having only one minimal grouping
construct motivates authors to simplify the configurations of tests on the bots
and reduce the number of test suites.

An example of a composition test suite:

    'common_gtests': {
      'base_unittests': {},
    },

    'linux_specific_gtests': {
      'ozone_x11_unittests': {},
    },

    # Composition test suite
    'linux_gtests': [
      'common_gtests',
      'linux_specific_gtests',
    ],

A bot referring to `linux_gtests` will run both `base_unittests` and
`ozone_x11_unittests`.

#### Matrix compound test suites

Another level of grouping of basic test suites is the matrix compound test
suite. A matrix compound test suite is a dictionary, composed of references to
basic test suites (key) and configurations (value). Matrix compound test suites
have the same restrictions as composition test suites, in that they *cannot*
reference other composition or matrix test suites. Configurations defined for
a basic test suite in a matrix test suite are applied to each tests for the
referenced basic test suite. "variants" is the only supported key via matrix
compound suites at this time.
Matrix compound test suites also supports no "variants". So if you want a
compound test suites, which some of basic test suites have "variants", and
other basic test suites don't have "variants", you will define a matrix compound
test suites.

##### Variants

“variants” is a top-level group introduced into matrix compound suites designed
to allow targeting a test against multiple variants. Each variant supports args,
mixins and swarming definitions. When variants are defined, args, mixins and
swarming aren’t specified at the same level.

Args, mixins, and swarming configurations that are defined by both the test
suite and variants are merged together. Args and mixins are lists, and thus are
appended together. Swarming configurations follow the same merge process -
dimension sets are merged via the existing dictionary merge behavior, and other
keys are appended.

**identifier** is a required key for each variant. The identifier is used to
make the test name unique. Each test generated from the resulting .json file
is identified uniquely by name, thus, the identifier is appended to the test
name in the format: "test_name" + "_" + "identifier"

For example, iOS requires running a test suite against multiple devices. If we
have the following variants.pyl:

```python
{
  'IPHONE_X_13.3': {
    'args': [
      '--platform',
      'iPhone X',
      '--version',
      '13.3'
    ],
    'identifier': 'iPhone_X_13.3',
  },
  'IPHONE_X_13.3_DEVICE': {
    'identifier': 'device_iPhone_X_13.3',
    'swarming': {
      'dimensions': {
        'os': 'iOS-iPhone10,3'
      },
    }
  },
}
```

and the following test_suites.pyl:

```python
{
  'basic_suites': {
    'ios_eg2_tests': {
      'basic_unittests': {
        'args': [
          '--some-arg',
        ],
      },
    },
  },
  'matrix_compound_suites': {
    'ios_tests': {
      'ios_eg2_tests': {
        'variants': [
          'IPHONE_X_13.3',
          'IPHONE_X_13.3_DEVICE',
        ]
      },
    },
  },
}
```

we can expect the following output:


```
{
  'args': [
    '--some-arg',
    '--platform',
    'iPhone X',
    '--version',
    '13.3'
  ],
  'merge': {
    'args': [],
    'script': 'some/merge/script.py'
  }
  'name': 'basic_unittests_iPhone_X_13.3',
  'test': 'basic_unittests'
},
{
  'args': [
    '--some-arg'
  ],
  'merge': {
    'args': [],
    'script': 'some/merge/script.py',
  },
  'name': 'basic_unittests_device_iPhone_X_13.3',
  'swarming': {
    'dimensions': {
      'os': 'iOS-iPhone10,3'
    },
  },
  'test': 'basic_unittests'
}
```

### Waterfalls

[waterfalls.pyl](./waterfalls.pyl) describes the waterfalls, the bots on those
waterfalls, and the test suites which those bots run.

A bot can specify a `swarming` dictionary including `dimensions`. These
parameters are applied to all tests that are run on this bot. Since most bots
run their tests on Swarming, this is one of the mechanisms that dramatically
reduces redundancy compared to maintaining the JSON files by hand.

A waterfall is a dictionary containing the following:

* `name`: the waterfall's name, for example `'chromium.win'`.
* `machines`: a dictionary mapping machine names to dictionaries containing bot
  descriptions.

Each bot's description is a dictionary containing the following:

* `additional_compile_targets`: if specified, an array of compile targets to
  build in addition to those for all of the tests that will run on this bot.

* `test_suites`: a dictionary optionally containing any of these kinds of
  tests. The value is a string referring either to a basic or composition test
  suite from [test_suites.pyl](./test_suites.pyl).

    * `gtest_tests`: GTest-based tests (or other kinds of tests that
       emulate the GTest-based API), which can be run either locally or
       under Swarming.
    * `isolated_scripts`: Isolated script tests. These are bundled into an
       isolate, invoke a wrapper script from src/testing/scripts as their
       top-level entry point, and are used to adapt to multiple kinds of test
       harnesses. These must implement the
       [Test Executable API](//docs/testing/test_executable_api.md) and
       can also be run either locally or under Swarming.
    * `junit_tests`: (Android-specific) JUnit tests. These are not run
       under Swarming.
    * `scripts`: Legacy script tests living in src/testing/scripts. These
       also are not (and usually can not) be run under Swarming. These
       types of tests are strongly discouraged.

* `swarming`: a dictionary specifying Swarming parameters to be applied to all
  tests that run on the bot.

* `os_type`: the type of OS this bot tests.

* `skip_cipd_packages`: (Android-specific) when True, disables emission of the
  `'cipd_packages'` Swarming dictionary entry. Not commonly used; further use is
  discouraged.

* `skip_merge_script`: (Android-specific) when True, disables emission of the
  `'merge'` script key. Not commonly used; further use is discouraged.

* `skip_output_links`: (Android-specific) when True, disables emission of the
  `'output_links'` Swarming dictionary entry. Not commonly used; further use is
  discouraged.

* `use_swarming`: can be set to False to disable Swarming on a bot.

### Test suite exceptions

[test_suite_exceptions.pyl](./test_suite_exceptions.pyl) contains specific
exceptions to the general rules about which tests run on which bots described in
[test_suites.pyl](./test_suites.pyl) and [waterfalls.pyl](./waterfalls.pyl).

In general, the design should be to have no exceptions. Roughly speaking, all
bots should be treated identically, and ideally, the same set of tests should
run on each. In practice, of course, this is not possible.

The test suite exceptions can only be used to _remove tests from a bot_, _modify
how a test is run on a bot_, or _remove keys from a test&apos;s specification on
a bot_. The exceptions _can not_ be used to add a test to a bot. This
restriction is by design, and helps prevent taking shortcuts when designing test
suites which would make the test descriptions unmaintainable. (The number of
exceptions needed to describe Chromium's waterfalls in their previous
hand-maintained state has already gotten out of hand, and a concerted effort
should be made to eliminate them wherever possible.)

The exceptions file supports the following options per test:

* `remove_from`: a list of bot names on which this test should not run.
  Currently, bots on different waterfalls that have the same name can be
  disambiguated by appending the waterfall's name: for example, `Nougat Phone
  Tester chromium.android`.

* `modifications`: a dictionary mapping a bot's name to a dictionary of
  modifications that should be merged into the test's specification on that
  bot. This can be used to add additional command line arguments, Swarming
  parameters, etc.

* `replacements`: a dictionary mapping bot names to a dictionaries of field
  names to dictionaries of key/value pairs to replace. If the given value is
  `None`, then the key will simply be removed. For example:
  ```
  'foo_tests': {
    'Foo Tester': {
      'args': {
        '--some-flag': None,
        '--another-flag': 'some-value',
      },
    },
  }
  ```
  would remove the `--some-flag` and replace whatever value `--another-flag` was
  set to with `some-value`. Note that passing `None` only works if the flag
  being removed either has no value or is in the `--key=value` format. It does
  not work if the key and value are two separate entries in the args list.

### Order of application of test changes

A test's final JSON description comes from the following, in order:

* The dictionary specified in [test_suites.pyl](./test_suites.pyl). This is
  used as the starting point for the test's description on all bots.

* The specific bot's description in [waterfalls.pyl](./waterfalls.pyl). This
  dictionary is merged in to the test's dictionary. For example, the bot's
  Swarming parameters will override those specified for the test.

* Any exceptions specified per-bot in
  [test_suite_exceptions.pyl](./test_suite_exceptions.pyl). For example, any
  additional command line arguments will be merged in here. Any Swarming
  dictionary entries specified here will override both those specified in
  test_suites.pyl and waterfalls.pyl.

### Tips when making changes to the bot and test descriptions

In general, the only specialization of test suites that _should_ be necessary is
per operating system. If you add a new test to the bots and find yourself adding
lots of exceptions to exclude the test from bots all of one particular type
(like Android, Chrome OS, etc.), here are options to consider:

* Look for a different test suite to add it to -- such as one that runs
  everywhere except on that OS type.

* Add a new test suite that runs on all of the OS types where your new test
  should run, and add that test suite to the composition test suites referenced
  by the appropriate bots.

* Split one of the existing test suites into two, and add the newly created test
  suite (including your new test) to all of the bots except those which should
  not run the new test.

If adding a new waterfall, or a new bot to a waterfall, *please* avoid adding
new test suites. Instead, refer to one of the existing ones that is most similar
to the new bot(s) you are adding. There should be no need to continue
over-specializing the test suites.

If you see an opportunity to reduce redundancy or simplify test descriptions,
*please* consider making a contribution to the generate_buildbot_json script or
the data files. Some examples might include:

* Automatically doubling the number of shards on Debug bots, by describing to
  the tool which bots are debug bots. This could eliminate the need for a lot of
  exceptions.

* Specifying a single hard_timeout per bot, and eliminating all per-test
  timeouts from test_suites.pyl and test_suite_exceptions.pyl.

* Merging some test suites. When the generator tool was written, the handwritten
  JSON files were replicated essentially exactly. There are many opportunities
  to simplify the configuration of which tests run on which bots. For example,
  there's no reason why the top-of-tree Clang bots should run more tests than
  the bots on other waterfalls running the same OS.

`dpranke`, `jbudorick` or `kbr` will be glad to review any improvements you make
to the tools. Thanks in advance for contributing!

[task deduplication]: https://chromium.googlesource.com/infra/luci/luci-py/+/HEAD/appengine/swarming/doc/Detailed-Design.md#task-deduplication
chromium/testing/buildbot/README.md