wuwt-e04-tests.md | Explore in Territory

# What’s Up With Tests

This is a transcript of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
Episode 4, a 2022 video discussion between [Sharon ([email protected])
and Stephen
([email protected])](https://www.youtube.com/watch?v=KePsimOPSro).

The transcript was automatically generated by speech-to-text software. It may
contain minor errors.

---

Testing is important! What kinds of tests do we have in Chromium? What are they
all about? Join in as Stephen, who led Chrome's involvement in web platform
tests, tells us all about them.

Notes:
- https://docs.google.com/document/d/1SRoNMdPn78vwZVX7YzcdpF4cJdHTIV6JLGiVC2dJUaI/edit

---

00:00 SHARON: Hello, everyone, and welcome to "What's Up With That," the series
that demystifies all things Chrome. I'm your host, Sharon. And today we're
talking testing. Within Chrome, there are so many types of tests. What are they
all? What's the difference? What are the Chromium-specific quirks? Today's
guest is Stephen. He previously led Chrome's involvement in web platform tests.
Since then, he's worked on rendering, payments, and interoperability. As a fun
aside, he's one of the first people I met who worked on Chrome and is maybe
part of why I'm here today. So welcome, Stephen.

00:33 STEPHEN: Well, thank you very much for having me, Sharon, I'm excited to
be here.

00:33 SHARON: Yeah, I'm excited to have you here. So today, we're in for maybe
a longer episode. Testing is a huge topic, especially for something like
Chrome. So grab a snack, grab a drink, and let's start. We'll start with what
are all of the things that we have testing for in Chrome. What's the purpose of
all these tests we have?

00:51 STEPHEN: Yeah. It's a great question. It's also an interesting one
because I wanted to put one caveat on this whole episode, which is that there
is no right answer in testing. Testing, even in the literature, never mind in
Chromium itself, is not a solved problem. And so you'll hear a lot of different
opinions. People will have different thoughts. And I'm sure that no matter how
hard we try, by the end of this episode, our inbox will be filled with angry
emails from people being like, no, you are wrong. So all of the stuff we're
saying here today is my opinion, albeit I'll try and be as useful as possible.
But yeah, so why do we test was the question, right? So there's a lot of
different reasons that we write tests. Obviously, correctness is the big one.
You're writing some code, you're creating a feature, you want it to be correct.
Other reasons we write them, I mean, tests can be useful as a form of
documentation in itself. If you're ever looking at a class and you're like,
what does - why is this doing this, why is the code doing this, the test can
help inform that. They're also useful - I think a topic of this podcast is sort
of security. Tests can be very useful for security. Often when we have a
security bug, we go back and we write what are called regression tests, so at
least we try and never do that security failure again. And then there are other
reasons. We have tests for performance. We have tests for - our launch process
uses tests. There's lots and lots of reasons we have tests.

02:15 SHARON: Now that you've covered all of the different reasons why we test,
how do we do each of these types of tests in Chromium? What are the test types
we have?

02:27 STEPHEN: Yeah. So main test types we have in Chromium, unit tests,
browser tests, what we call web tests, and then there's a bunch of more
specialized ones, performance tests, testing on Android, and of course manual
testing.

02:43 SHARON: We will get into each of these types now, I guess. The first type
of test you mentioned is unit tests. Why don't you tell us a quick rundown of
what unit tests are. I'm sure most people have encountered them or heard of
them before. But just a quick refresher for those who might not.

02:55 STEPHEN: Yeah, absolutely. So as the name implies, a unit test is all
about testing a unit of code. And what that is not very well defined. But you
can usually think of it as just a class, a file, a small isolated component
that doesn't have to talk to all the other bits of the code to work. Really,
the goal is on writing something that's testing just the code under test - so
that new method you've added or whatever. And it should be quick and easy to
run.

03:22 SHARON: So on the screen now we have an example of a pretty typical unit
test we see in Chrome. So there's three parts here. Let's go through each of
them. So the first type - the first part of this is `TEST_P`. What is that
telling us?

03:38 STEPHEN: Yeah. So that is - in Chromium we use a unit testing framework
called Google test. It's very commonly used for C++. You'll see it all over the
place. You can go look up documentation. The test macros, that's what this is,
are essentially the hook into Google test to say, hey, the thing that's coming
here is a test. There's three types. There is just test, which it just says
here is a function. It is a test function. `TEST_F` says that you basically
have a wrapper class. It's often called a test fixture, which can do some
common setup across multiple different tests, common teardown, and that sort of
thing. And finally, `TEST_P` is what we call a parameterized test. And what
this means is that the test can take some input parameters, and it will run the
same test with each of those values. Very useful for things like when you want
to test a new flag. What happens if the flag is on or off?

04:34 SHARON: That's cool. And a lot of the things we're mentioning for unit
test also apply to browser test, which we'll cover next. But the
parameterization is an example of something that carries over to both. So
that's the first part. That's the `TEST_P`, the macro. What's the second part,
PendingBeaconHostTest? What is that?

04:54 STEPHEN: Yeah. So that is the fixture class, the test container class I
was talking about. So in this case, we're assuming that in order to write a
beacon test, whatever that is, they have some set up, some teardown they need
to do. They might want to encapsulate some common functionality. So all you
have to do to write one of these classes is, you declare a C++ class and you
subclass from the Google test class name.

05:23 SHARON: So this is a `TEST_P`, but you mentioned that this is a fixture.
So are fixture tests a subset of parameterized tests?

05:35 STEPHEN: Parameterized tests are a subset of fixture test, is that the
right way around to put it? All parameterized tests are fixtures tests. Yes.

05:41 SHARON: OK.

05:41 STEPHEN: You cannot have a parameterized test that does not have a
fixture class. And the reason for that is how Google test actually works under
the covers is it passes those parameters to your test class. You will have to
additionally extend from the `testing::WithParamInterface`. And that says, hey,
I'm going to take parameters.

06:04 SHARON: OK. But not all fixture tests are parameterized tests.

06:04 STEPHEN: Correct.

06:04 SHARON: OK. And the third part of this, SendOneOfBeacons. What is that?

06:10 STEPHEN: That is your test name. Whatever you want to call your test,
whatever you're testing, put it here. Again, naming tests is as hard as naming
anything. A lot of yak shaving, finding out what exactly you should call the
test. I particularly enjoy when you see test names that themselves have
underscores in them. It's great.

06:30 SHARON: Uh-huh. What do you mean by yak shaving?

06:35 STEPHEN: Oh, also known as painting a bike shed? Bike shed, is that the
right word? Anyway, generally speaking -

06:40 SHARON: Yeah, I've heard -

06:40 STEPHEN: arguing about pointless things because at the end of the day,
most of the time it doesn't matter what you call it.

06:46 SHARON: OK, yeah. So I've written this test. I've decided it's going to
be parameterized. I've come up with a test fixture for it. I have finally named
my test. How do I run my tests now?

06:57 STEPHEN: Yeah. So all of the tests in Chromium are built into different
test binaries. And these are usually named after the top level directory that
they're under. So we have `components_unittests`, `content_unittests`. I think
the Chrome one is just called `unit_tests` because it's special. We should
really rename that. But I'm going to assume a bunch of legacy things depend on
it. Once you have built whichever the appropriate binary is, you can just run
that from your `out` directory, so `out/release/components_unittests`, for
example. And then that, if you don't pass any flags, will run every single
components unit test. You probably don't want to do that. They're not that
slow, but they're not that fast. So there is a flag `--gtest_filter`, which
allows you to filter. And then it takes a test name after that. The format of
test names is always test class dot test name. So for example, here
PendingBeaconHostTest dot SendOneOfBeacons.

08:04 SHARON: Mm-hmm. And just a fun aside for that one, if you do have
parameterized tests, it'll have an extra slash and a number at the end. So
normally, whenever I use it, I just put a star before and after. And that
generally does - covers the cases.

08:17 STEPHEN: Yeah, absolutely.

08:23 SHARON: Cool. So with the actual test names, you will often see them
prefixed with either `MAYBE_` or `DISABLED_`, or before the test, there will be
an ifdef with usually a platform and then depending on the cases, it'll prefix
the test name with something. So I think it's pretty clear what these are
doing. Maybe is a bit less clear. Disabled pretty clear what that is. But can
you tell us a bit about these prefixes?

08:51 STEPHEN: Yeah, absolutely. So this is our way of trying to deal with that
dreaded thing in testing, flake. So when a test is flaky, when it doesn't
produce a consistent result, sometimes it fails. We have in Chromium a whole
continuous integration waterfall. That is a bunch of bots on different
platforms that are constantly building and running Chrome tests to make sure
that nothing breaks, that bad changes don't come in. And flaky tests make that
very hard. When something fails, was that a real failure? And so when a test is
particularly flaky and is causing sheriffs, the build sheriffs trouble, they
will come in and they will disable that test. Basically say, hey, sorry, but
this test is causing too much pain. Now, as you said, the `DISABLED_` prefix,
that's pretty obvious. If you put that in front of a test, Google test knows
about it and it says, nope, will not run this test. It will be compiled, but it
will not be run. `MAYBE_` doesn't actually mean anything. It has no meaning to
Google test. But that's where you'll see, as you said, you see these ifdefs.
And that's so that we can disable it on just one platform. So maybe your test
is flaky only on Mac OS, and you'll see basically, oh, if Mac OS, change the
name from maybe to disabled. Otherwise, define maybe as the normal test name.

10:14 SHARON: Makes sense. We'll cover flakiness a bit later. But yeah, that's
a huge problem. And we'll talk about that for sure. So these prefixes, the
parameterization and stuff, this applies to both unit and browser tests.

10:27 STEPHEN: Yeah.

10:27 SHARON: Right? OK. So what are browser tests? Chrome's a browser. Browser
test, seems like there's a relation.

10:34 STEPHEN: Yeah. They test the browser. Isn't it obvious? Yeah. Browser
tests are our version - our sort of version of an integration or a functional
test depending on how you look at things. What that really means is they're
testing larger chunks of the browser at once. They are integrating multiple
components. And this is somewhere that I think Chrome's a bit weird because in
many large projects, you can have an integration test that doesn't bring your
entire product up and in order to run. Unfortunately, or fortunately, I guess
it depends on your viewpoint, Chrome is so interconnected, it's so
interdependent, that more or less we have to bring up a huge chunk of the
browser in order to connect any components together. And so that's what browser
tests are. When you run one of these, there's a massive amount of machinery in
the background that goes ahead, and basically brings up the browser, and
actually runs it for some definition of what a browser is. And then you can
write a test that pokes at things within that running browser.

11:42 SHARON: Yeah. I think I've heard before multiple times is that browser
tests launch the whole browser. And that's -

11:47 STEPHEN: More or less true. It's - yeah.

11:47 SHARON: Yes. OK. Does that also mean that because you're running all this
stuff that all browser tests have fixtures? Is that the case?

11:59 STEPHEN: Yes, that is the case. Absolutely. So there is only - I think
it's - oh my goodness, probably on the screen here somewhere. But it's
`IN_PROC_BROWSER_TEST_F` and `IN_PROC_BROWSER_TEST_P`. There is no version that
doesn't have a fixture.

12:15 SHARON: And what does the in proc part of that macro mean?

12:15 STEPHEN: So that's, as far as I know - and I might get corrected on this.
I'll be interested to learn. But it refers to the fact that we've run these in
the same process. Normally, the whole Chromium is a multi-process architecture.
For the case of testing, we put that aside and just run everything in the same
process so that it doesn't leak, basically.

12:38 SHARON: Yeah. There's flags when you run them, like `--single-process`.
And then there's `--single-process-test`. And they do slightly different
things. But if you do run into that, probably you will be working with people
who can answer and explain the differences between those more. So something
that I've seen quite a bit in browser and unit tests, and only in these, are
run loops. Can you just briefly touch on what those are and what we use them
for in tests?

13:05 STEPHEN: Oh, yeah. That's a fun one. I think actually previous on an
episode of this very program, you and Dana talked a little bit around the fact
that Chrome is not a completely synchronous program, that we do we do task
splitting. We have a task scheduler. And so run loops are part of that,
basically. They're part of our stack for handling asynchronous tasks. And so
this comes up in testing because sometimes you might be testing something
that's not synchronous. It takes a callback, for example, rather than returning
a value. And so if you just wrote your test as normal, you call the function,
and you don't - you pass a callback, but then your test function ends. Your
test function ends before that callback ever runs. Run loop gives you the
ability to say, hey, put this callback into some controlled run loop. And then
after that, you can basically say, hey, wait on this run loop. I think it's
often called quit when idle, which basically says keep running until you have
no more tasks to run, including our callback, and then finish. They're
powerful. They're very useful, obviously, with asynchronous code. They're also
a source of a lot of flake and pain. So handle with care.

14:24 SHARON: Yeah. Something a tip is maybe using the `--gtest_repeat` flag.
So that one lets you run your test however number of times you've had to do it.

14:30 STEPHEN: Yeah.

14:36 SHARON: And that can help with testing for flakiness or if you're trying
to debug something flaky. In tests, we have a variety of macros that we use. In
the unit test and the browser tests, you see a lot of macros, like `EXPECT_EQ`,
`EXPECT_GT`. These seem like they're part of maybe Google test. Is that true?

14:54 STEPHEN: Yeah. They come from Google test itself. So they're not
technically Chromium-specific. But they basically come in two flavors. There's
the `EXPECT_SOMETHING` macros. And there's the `ASSERT_SOMETHING` macros. And
the biggest thing to know about them is that expect doesn't actually cause - it
causes a test to fail, but it doesn't stop the test from executing. The test
will continue to execute the rest of the code. Assert actually throws an
exception and stops the test right there. And so this can be useful, for
example, if you want to line up a bunch of expects. And your code still makes
sense. You're like, OK, I expect to return object, and it's got these fields.
And I'm just going to expect each one of the fields. That's probably fine to
do. And it may be nice to have output that's like, no, actually, both of these
fields are wrong. Assert is used when you're like, OK, if this fails, the rest
of the test makes no sense. Very common thing you'll see. Call an API, get back
some sort of pointer, hopefully a smart pointer, hey. And you're going to be
like, assert that this pointer is non-null because if this pointer is null,
everything else is just going to be useless.

15:57 SHARON: I think we see a lot more expects than asserts in general
anecdotally from looking at the test. Do you think, in your opinion, that
people should be using asserts more generously rather than expects, or do we
maybe want to see what happens - what does go wrong if things continue beyond a
certain point?

16:15 STEPHEN: Yeah. I mean, general guidance would be just keep using expect.
That's fine. It's also not a big deal if your test actually just crashes. It's
a test. It can crash. It's OK. So use expects. Use an assert if, like I said,
that the test doesn't make any sense. So most often if you're like, hey, is
this pointer null or not and I'm going to go do something with this pointer,
assert it there. That's probably the main time you'd use it.

16:45 SHARON: A lot of the browser test classes, like the fixture classes
themselves, are subclass from other base classes.

16:53 STEPHEN: Mm-hmm.

16:53 SHARON: Can you tell us about that?

16:53 STEPHEN: Yeah. So basically, we have one base class for browser tests. I
think its `BrowserTestBase`, I think it's literally called, which sits at the
bottom and does a lot of the very low level setup of bringing up a browser. But
as folks know, there's more than one browser in the Chromium project. There is
Chrome, the Chrome browser that is the more full-fledged version. But there's
also content shell, which people might have seen. It's built out of content.
It's very simple browser. And then there are other things. We have a headless
mode. There is a headless Chrome you can build which doesn't show any UI. You
can run it entirely from the command line.

17:32 SHARON: What's the difference between headless and content shell?

17:39 STEPHEN: So content shell does have a UI. If you run content shell, you
will actually see a little UI pop up. What content shell doesn't have is all of
those features from Chrome that make Chrome Chrome, if you will. So I mean,
everything from bookmarks, to integration with having an account profile, that
sort of stuff is not there. I don't think content shell even supports tabs. I
think it's just one page you get. It's almost entirely used for testing. But
then, headless, sorry, as I was saying, it's just literally there is no UI
rendered. It's just headless.

18:13 SHARON: That sounds like it would make -

18:13 STEPHEN: And so, yeah. And so - sorry.

18:13 SHARON: testing faster and easier. Go on.

18:18 STEPHEN: Yeah. That's a large part of the point, as well as when you want
to deploy a browser in an environment where you don't see the UI. So for
example, if you're running on a server or something like that. But yeah. So for
each of these, we then subclass that `BrowserTestBase` in order to provide
specific types. So there's content browser test. There's headless browser test.
And then of course, Chrome has to be special, and they called their version in
process browser test because it wasn't confusing enough. But again, it's sort
of straightforward. If you're in Chrome, `/chrome`, use
`in_process_browser_test`. If you're in `/content`, use `content_browsertest`.
It's pretty straightforward most of the time.

18:58 SHARON: That makes sense. Common functions you see overridden from those
base classes are these set up functions. So they're set, set up on main thread,
there seems to be a lot of different set up options. Is there anything we
should know about any of those?

19:13 STEPHEN: I don't think that - I mean, most of it's fairly
straightforward. I believe you should mostly be using setup on main thread. I
can't say that for sure. But generally speaking, setup on main thread, teardown
on main thread - or is it shutdown main thread? I can't remember - whichever
the one is for afterwards, are what you should be usually using in a browser
thread. You can also usually do most of your work in a constructor. That's
something that people often don't know about testing. I think it's something
that's changed over time. Even with unit tests, people use the setup function a
lot. You can just do it in the constructor a lot of the time. Most of
background initialization has already happened.

19:45 SHARON: I've definitely wondered that, especially when you have things in
the constructor as well as in a setup method. It's one of those things where
you just kind of think, I'm not going to touch this because eh, but -

19:57 STEPHEN: Yeah. There are some rough edges, I believe. Set up on main
thread, some things have been initialized that aren't around when your class is
being constructed. So it is fair. I'm not sure I have any great advice unless -
other than you may need to dig in if it happens.

20:19 SHARON: One last thing there. Which one gets run first, the setup
functions or the constructor?

20:19 STEPHEN: The constructor always happens first. You have to construct the
object before you can use it.

20:25 SHARON: Makes sense. This doesn't specifically relate to a browser test
or unit test, but it does seem like it's worth mentioning, which is the content
public test API. So if you want to learn more about content and content public,
check out episode three with John. But today we're talking about testing. So
we're talking about content public test. What is in that directory? And how
does that - how can people use what's in there?

20:48 STEPHEN: Yeah. It's basically just a bunch of useful helper functions and
classes for when you are doing mostly browser tests. So for example, there are
methods in there that will automatically handle navigating the browser to a URL
and actually waiting till it's finished loading. There are other methods for
essentially accessing the tab strip of a browser. So if you have multiple tabs
and you're testing some cross tab thing, methods in there to do that. I think
that's probably where the content browser test - like base class lives there as
well. So take a look at it. If you're doing something that you're like, someone
should write - it's the basic - it's the equivalent of base in many ways for
testing. It's like, if you're like, someone should have written a library
function for this, possibly someone has already. And you should take a look.
And if they haven't, you should write one.

21:43 SHARON: Yeah. I've definitely heard people, code reviewers, say when you
want to add something that seems a bit test only to content public, put that in
content public test because that doesn't get compiled into the actual release
binaries. So if things are a bit less than ideal there, it's a bit more
forgiving for a place for that.

22:02 STEPHEN: Yeah, absolutely. I mean, one of the big things about all of our
test code is that you can actually make it so that it's in many cases not
compiled into the binary. And that is both useful for binary size as well as
you said in case it's concerning. One thing you can do actually in test, by the
way, for code that you cannot avoid putting into the binary - so let's say
you've got a class, and for the reasons of testing it because you've not
written your class properly to do a dependency injection, you need to access a
member. You need to set a member. But you only want that to happen from test
code. No real code should ever do this. You can actually name methods blah,
blah, blah for test or for testing. And this doesn't have any - there's no code
impact to this. But we have pre-submits that actually go ahead and check, hey,
are you calling this from code that's not marked as test code? And it will then
refuse to - it will fail to pre-submit upload if that happens. So it could be
useful.

23:03 SHARON: And another thing that relates to that would be the friend test
or friend something macro that you see in classes. Is that a gtest thing also?

23:15 STEPHEN: It's not a gtest thing. It's just a C++ thing. So C++ has the
concept of friending another class. It's very cute. It basically just says,
this other class and I, we can access each other's internal states. Don't
worry, we're friends. Generally speaking, that's a bad idea. We write classes
for a reason to have encapsulation. The entire goal of a class is to
encapsulate behavior and to hide the implementation details that you don't want
to be exposed. But obviously, again, when you're writing tests, sometimes it is
the correct thing to do to poke a hole in the test and get at something. Very
much in the schools of thought here, some people would be like, you should be
doing dependency injection. Some people are like, no, just friend your class.
It's OK. If folks want to look up more, go look up the difference between open
box and closed box testing.

24:00 SHARON: For those of you who are like, oh, this sounds really cool, I
will learn more.

24:00 STEPHEN: Yeah, for my test nerds out there.

24:06 SHARON: [LAUGHS] Yeah, Stephen's got a club. Feel free to join.

24:06 STEPHEN: Yeah. [LAUGHTER]

24:11 SHARON: You get a card. Moving on to our next type of test, which is your
wheelhouse, which is web tests. This is something I don't know much about. So
tell us all about it.

24:22 STEPHEN: [LAUGHS] Yeah. This is my - this is where hopefully I'll shine.
It's the area I should know most about. But web tests are - they're an
interesting one. So I would describe them is our version of an end-to-end test
in that a web test really is just an HTML file, a JavaScript file that is when
you run it, you literally bring up - you'll remember I said that browser tests
are most of a whole browser. Web tests bring up a whole browser. It's just the
same browser as content shell or Chrome. And it runs that whole browser. And
the test does something, either in HTML or JavaScript, that then is asserted
and checked. And the reason I say that I would call them this, I have heard
people argue that they're technically unit tests, where the unit is the
JavaScript file and the entire browser is just, like, an abstraction that you
don't care about. I guess it's how you view them really. I view the browser as
something that is big and flaky, and therefore these are end-to-end tests. Some
people disagree.

25:22 SHARON: In our last episode, John touched on these tests and how that
they're - the scope and that each test covers is very small. But how you run
them is not. And I guess you can pick a side that you feel that you like more
and go with that. So what are examples of things we test with these kind of
tests?

25:49 STEPHEN: Yeah. So the two big categories of things that we test with web
tests are basically web APIs, so JavaScript APIs, provided by the browser to do
something. There are so many of those, everything from the fetch API for
fetching stuff to the web serial API for talking to devices over serial ports.
The web is huge. But anything you can talk to via JavaScript API, we call those
JavaScript tests. It's nice and straightforward. The other thing that web tests
usually encompass are what are called rendering tests or sometimes referred to
as ref tests for reference tests. And these are checking the actual, as the
first name implies, the rendering of some HTML, some CSS by the browser. The
reason they're called reference tests is that usually the way you do this to
check whether a rendering is correct is you set up your test, and then you
compare it to some image or some other reference rendering that you're like,
OK, this should look like that. If it does look like that, great. If it
doesn't, I failed.

26:54 SHARON: Ah-ha. And are these the same as - so there's a few other test
names that are all kind of similar. And as someone who doesn't work in them,
they all kind of blur together. So I've also heard web platform tests. I've
heard layout tests. I've heard Blink tests, all of which do - all of which are
JavaScript HTML-like and have some level of images in them. So are these all
the same thing? And if not, what's different?

27:19 STEPHEN: Yeah. So yes and no, I guess, is my answer. So a long time ago,
there were layout tests basically. And that was something we inherited from the
WebKit project when we forked there, when we forked Chromium from WebKit all
those years ago. And they're exactly what I've described. They were both
JavaScript-based tests and they were also HTML-based tests for just doing
reference renderings. However, web platform test came up as an external project
actually. Web platform test is not a Chromium project. It is external upstream.
You can find them on GitHub. And their goal was to create a set of - a test
suite shared between all browsers so that all browsers could test - run the
same tests and we could actually tell, hey, is the web interoperable? Does it
work the same way no matter what browser you're on? The answer is, no. But
we're trying. And so inside of Chromium we said, that's great. We love this
idea. And so what we did was we actually import web platform test into our
layout tests. So web platform test now becomes a subdirectory of layout tests.
OK?

28:30 SHARON: OK. [LAUGHS]

28:30 STEPHEN: To make things more confusing, we don't just import them, but we
also export them. We run a continuous two-way sync. And this means that
Chromium developers don't have to worry about that upstream web platform test
project most of the time. They just land their code in Chromium, and a magic
process happens, and it goes up into the GitHub project. So that's where we
were for many years - layout tests, which are a whole bunch of legacy tests,
and then also web platform tests. But fairly recently - and I say that knowing
that COVID means that might be anything within the last three years because who
knows where time went - we decided to rename layout test. And partly, the name
we chose was web tests. So now you have web tests, of which web platform tests
are a subset, or a - yeah, subset of web test. Easy.

29:20 SHARON: Cool.

29:20 STEPHEN: [LAUGHS]

29:20 SHARON: Cool. And what about Blink tests? Are those separate, or are
those these altogether?

29:27 STEPHEN: I mean, if they're talking about the JavaScript and HTML, that's
going to just be another name for the web tests. I find that term confusing
because there is also the Blink tests target, which builds the infrastructure
that is used to run web tests. So that's probably what you're referring, like
`blink_test`. It is the target that you build to run these tests.

29:50 SHARON: I see. So `blink_test` is a target. These other ones, web test
and web platform tests, are actual test suites.

29:57 STEPHEN: Correct. Yes. That's exactly right.

30:02 SHARON: OK. All right.

30:02 STEPHEN: Simple.

30:02 SHARON: Yeah. So easy. So you mentioned that the web platform tests are
cross-browser. But a lot of browsers are based on Chromium. Is it one of the
things where it's open source and stuff but majority of people contributing to
these and maintaining it are Chrome engineers?

30:23 STEPHEN: I must admit, I don't know what that stat is nowadays. Back when
I was working on interoperability, we did measure this. And it was certainly
the case that Chromium is a large project. There were a lot of tests being
contributed by Chromium developers. But we also saw historically - I would like
to recognize Mozilla, most of all, who were a huge contributor to the web
platform test project over the years and are probably the reason that it
succeeded. And we also - web platform test also has a fairly healthy community
of completely outside developers. So people that just want to come along. And
maybe they're not able to or willing to go into a browser, and actually build a
browser, and muck with code. But they could write a test for something. They
can find a broken behavior and be like, hey, there's a test here, Chrome and
Firefox do different things.

31:08 SHARON: What are examples of the interoperability things that you're
testing for in these cross-browser tests?

31:17 STEPHEN: Oh, wow, that's a big question. I mean, really everything and
anything. So on the ref test side, the rendering test, it actually does matter
that a web page renders the same in different browsers. And that is very hard
to achieve. It's hard to make two completely different engines render some HTML
and CSS exactly the same way. But it also matters. We often see bugs where you
have a lovely - you've got a lovely website. It's got this beautiful header at
the top and some content. And then on one browser, there's a two-pixel gap
here, and you can see the background, and it's not a great experience for your
users. So ref tests, for example, are used to try and track those down. And
then, on the JavaScript side, I mean really, web platform APIs are complicated.
They're very powerful. There's a reason they are in the browser and you cannot
do them in JavaScript. And that is because they are so powerful. So for
example, web USB to talk to USB devices, you can't just do that from
JavaScript. But because they're so powerful, because they're so complicated,
it's also fairly easy for two browsers to have slightly different behavior. And
again, it comes down to what is the web developer's experience. When I try and
use the web USB API, for example, am I going to have to write code that's like,
if Chrome, call it this way, if Fire - we don't want that. That is what we do
not want for the web. And so that's the goal.

32:46 SHARON: Yeah. What a team effort, making the whole web work is. All
right. That's cool. So in your time working on these web platform tests, do you
have any fun stories you'd like to share or any fun things that might be
interesting to know?

33:02 STEPHEN: Oh, wow. [LAUGHS] One thing I like to bring up - I'm afraid it's
not that fun, but I like to repeat it a lot of times because it's weird and
people get tripped up by it - is that inside of Chromium, we don't run web
platform tests using the Chrome browser. We run them using content shell. And
this is partially historical. That's how layout tests run. We always ran them
under content shell. And it's partially for I guess what I will call
feasibility. As I talked about earlier, content shell is much simpler than
Chrome. And that means that if you want to just run one test, it is faster, it
is more stable, it is more reliable I guess I would say, than trying to bring
up the behemoth that is Chrome and making sure everything goes correctly. And
this often trips people up because in the upstream world of this web platform
test project, they run the test using the proper Chrome binary. And so they're
different. And different things do happen. Sometimes it's rendering
differences. Sometimes it's because web APIs are not always implemented in both
Chrome and content shell. So yeah, fun fact.

34:19 SHARON: Oh, boy. [LAUGHTER]

34:19 STEPHEN: Oh, yeah.

34:19 SHARON: And we wonder why flakiness is a problem. Ah. [LAUGHS]

34:19 STEPHEN: Yeah. It's a really sort of fun but also scary fact that even if
we put aside web platform test and we just look at layout test, we don't test
what we ship. Layout test running content shell, and then we turn around and
we're like, here's a Chrome binary. Like uh, those are different. But, hey, we
do the best we can.

34:43 SHARON: Yeah. We're out here trying our best. So that all sounds very
cool. Let's move on to our next type of test, which is performance. You might
have heard the term telemetry thrown around. Can you tell us what telemetry is
and what these performance tests are?

34:54 STEPHEN: I mean, I can try. We've certainly gone straight from the thing
I know a lot about into the thing I know very little about. But -

35:05 SHARON: I mean, to Stephen's credit, this is a very hard episode to find
one single guest for. People who are working extensively usually in content
aren't working a ton in performance or web platform stuff. And there's no one
who is - just does testing and does every kind of testing. So we're trying our
best. [INAUDIBLE]

35:24 STEPHEN: Yeah, absolutely. You just need to find someone arrogant enough
that he's like, yeah, I'll talk about all of those. I don't need to know the
details. It's fine. But yeah, performance test, I mean, the name is self
explanatory. These are tests that are trying to ensure the performance of
Chromium. And this goes back to the four S's when we first started Chrome as a
project - speed, simplicity, security, and I've forgotten the fourth S now.
Speed, simplicity, security - OK, let's not reference the four S's then.
[LAUGHTER] You have the Comet. You tell me.

36:01 SHARON: Ah. Oh, I mean, I don't read it every day. Stability. Stability.

36:08 STEPHEN: Stability. God damn it. Let's literally what the rest of this is
about. OK, where were we?

36:13 SHARON: We're leaving this in, don't worry. [LAUGHTER]

36:19 STEPHEN: Yeah. So the basic idea of performance test is to test
performance because as much as you can view behavior as a correctness thing, in
Chromium we also consider performance a correctness thing. It is not a good
thing if a change lands and performance regresses. So obviously, testing
performance is also hard to do absolutely. There's a lot of noise in any sort
of performance testing. An so, we do it essentially heuristically,
probabilistically. We run whatever the tests are, which I'll talk about in a
second. And then we look at the results and we try and say, hey, OK, is there a
statistically significant difference here? And there's actually a whole
performance sheriffing rotation to try and track these down. But in terms of,
yeah, you mentioned telemetry. That weird word. You're like, what is a
telemetry test? Well, telemetry is the name of the framework that Chromium
uses. It's part of the wider catapult project, which is all about different
performance tools. And none of the names, as far as I know, mean anything.
They're just like, hey, catapult, that's a cool name. I'm sure someone will
explain to me now the entire history behind the name catapult and why it's
absolutely vital. But anyway, so telemetry basically is a framework that when
you give it some input, which I'll talk about in a second, it launches a
browser, performs some actions on a web page, and records metrics about those
actions. So the input, the test essentially, is basically a collection of go to
this web page, do these actions, record these metrics. And I believe in
telemetry that's called a story, the story of someone visiting a page, I guess,
is the idea. One important thing to know is that because it's sort of insane to
actually visit real websites, they keep doing things like changing - strange.
We actually cache the websites. We download a version of the websites once and
actually check that in. And when you go run a telemetry test, it's not running
against literally the real Reddit.com or something. It's running against a
version we saved at some point.

38:31 SHARON: And how often - so I haven't really heard of anyone who actually
works on this and that we can't - you don't interact with everyone. But how -
as new web features get added and things in the browser change, how often are
these tests specifically getting updated to reflect that?

38:44 STEPHEN: I would have to plead some ignorance there. It's certainly also
been my experience as a browser engineer who has worked on many web APIs that
I've never written a telemetry test myself. I've never seen one added. My
understanding is that they are - a lot of the use cases are fairly general with
the hope that if you land some performance problematic feature, it will regress
on some general test. And then we can be like, oh, you've regressed. Let's
figure out why. Let's dig in and debug. But it certainly might be the case if
you are working on some feature and you think that it might have performance
implications that aren't captured by those tests, there is an entire team that
works on the speed of Chromium. I cannot remember their email address right
now. But hopefully we will get that and put that somewhere below. But you can
certainly reach out to them and be like, hey, I think we should test the
performance of this. How do I go about and do that?

39:41 SHARON: Yeah. That sounds useful. I've definitely gotten bugs filed
against me for performance stuff. [LAUGHS] Cool. So that makes sense. Sounds
like good stuff. And in talking to some people in preparation for this episode,
I had a few people mention Android testing specifically. Not any of the other
platforms, just Android. So do you want to tell us why that might be? What are
they doing over there that warrants additional mention?

40:15 STEPHEN: Yeah. I mean, I think probably the answer would just be that
Android is such a huge part of our code base. Chrome is a browser, a
multi-platform browser, runs on multiple desktop platforms, but it also runs on
Android. And it runs on iOS. And so I assume that iOS has its own testing
framework. I must admit, I don't know much about that at all. But certainly on
Android, we have a significant amount of testing framework built up around it.
And so there's the option, the ability for you to test your Java code as well
as your C++ code.

40:44 SHARON: That makes sense. And yeah, with iOS, because they don't use
Blink, I guess there's - that reduces the amount of test that they might need
to add, whereas on Android they're still using Blink. But there's a lot of
differences because it is mobile, so they're just, OK, we actually can test
those things. So let's go more general now. At almost every stage, you've
mentioned flakiness. So let's briefly run down, what is flakiness in a test?

41:14 STEPHEN: Yes. So flakiness for a test is just - the definition is just
that the test does not consistently produce the same output. When you're
talking about flakiness, you actually don't care what the output is. A test
that always fails, that's fine. It always fails. But a test that passes 90% of
the time and fails 10%, that's not good. That test is not consistent. And it
will cause problems.

41:46 SHARON: What are common causes of this?

41:46 STEPHEN: I mean, part of the cause is, as I've said, we write a lot of
integration tests in Chromium. Whether those are browser tests, or whether
those are web tests, we write these massive tests that span huge stacks. And
what comes implicitly with that is timing. Timing is almost always the
problem - timing and asynchronicity. Whether that is in the same thread or
multiple threads, you write your test, you run it on your developer machine,
and it works. And you're like, cool, my test works. But what you don't realize
is that you're assuming that in some part of the browser, this function ran,
then this function run. And that always happens in your developer machine
because you have this CPU, and this much memory, and et cetera, et cetera. Then
you commit your code, you land your code, and somewhere a bot runs. And that
bot is slower than your machine. And on that bot, those two functions run in
the opposite order, and something goes horribly wrong.

42:50 SHARON: What can the typical Chrome engineer writing these tests do in
the face of this? What are some practices that you generally should avoid or
generally should try to do more often that will keep this from happening in
your test?

43:02 STEPHEN: Yeah. So first of all, write more unit tests, write less browser
tests, please. Unit tests are - as I've talked about, they're small. They're
compact. They focus just on the class that you're testing. And too often, in my
opinion - again, I'm sure we'll get some nice emails stating I'm wrong - but
too often, in my opinion people go straight to a browser test. And they bring
up a whole browser just to test functionality in their class. This sometimes
requires writing your class differently so that it can be tested by a unit
test. That's worth doing. Beyond that, though, when you are writing a browser
test or a web test, something that is more integration, more end to end, be
aware of where timing might be creeping in. So to give an example, in a browser
test, you often do things like start by loading some web contents. And then you
will try and poke at those web contents. Well, so one thing that people often
don't realize is that loading web contents, that's not a synchronous process.
Actually knowing when a page is finished loading is slightly difficult. It's
quite interesting. And so there are helper functions to try and let you wait
for this to happen, sort of event waiters. And you should - unfortunately, the
first part is you have to be aware of this, which is just hard to be. But the
second part is, once you are aware of where these can creep in, make sure
you're waiting for the right events. And make sure that once those events have
happened, you are in a state where the next call makes sense.

44:28 SHARON: That makes sense. You mentioned rewriting your classes so they're
more easily testable by a unit test. So what are common things you can do in
terms of how you write or structure your classes that make them more testable?
And just that seems like a general good software engineering practice to do.

44:50 STEPHEN: Yeah, absolutely. So one of the biggest ones I think we see in
Chromium is to not use singleton accessors to get at state. And what I mean by
that is, you'll see a lot of code in Chromium that just goes ahead and threw
some mechanism that says, hey, get the current web contents. And as you, I
think, you've talked about on this program before, web contents is this massive
class with all these methods. And so if you just go ahead and get the current
web contents and then go do stuff on that web contents, whatever, when it comes
to running a test, well, it's like, hold on. That's trying to fetch a real web
contents. But we're writing a unit test. What does that even look like? And so
the way around this is to do what we call dependency injection. And I'm sure as
I've said that word, a bunch of listeners or viewers have just recoiled in
fear. But we don't lean heavily into dependency injection in Chromium. But it
is useful for things like this. Instead of saying, go get the web contents,
pass a web contents into your class. Make a web contents available as an input.
And that means when you create the test, you can use a fake or a mock web
contents. We can talk about difference between fakes and mocks as well. And
then, instead of having it go do real things in real code, you can just be
like, no, no, no. I'm testing my class. When you call it web contents do a
thing, just return this value. I don't care about web contents. Someone else is
going to test that.

46:19 SHARON: Something else I've either seen or been told in code review is to
add delegates and whatnot.

46:25 STEPHEN: Mm-hmm.

46:25 SHARON: Is that a good general strategy for making things more testable?

46:25 STEPHEN: Yeah. It's similar to the idea of doing dependency injection by
passing in your web contents. Instead of passing in your web contents, pass in
a class that can provide things. And it's sort of a balance. It's a way to
balance, if you have a lot of dependencies, do you really want to add 25
different inputs to your class? Probably not. But you define a delegate
interface, and then you can mock out that delegate. You pass in that one
delegate, and then when delegate dot get web content is called, you can mock
that out. So very much the same goal, another way to do it.

47:04 SHARON: That sounds good. Yeah, I think in general, in terms of Chrome
specifically, a lot of these testing best practices, making things testable,
these aren't Chrome-specific. These are general software engineering-specific,
C++-specific, and those you can look more into separately. Here we're mostly
talking about what are the Chrome things. Right?

47:24 STEPHEN: Yeah.

47:24 SHARON: Things that you can't just find as easily on Stack Overflow and
such. So you mentioned fakes and mocks just now. Do you want to tell us a bit
about the difference there?

47:32 STEPHEN: I certainly can do it. Though I want to caveat that you can also
just go look up those on Stack Overflow. But yeah. So just to go briefly into
it, there is - in testing you'll often see the concept of a fake version of a
class and also a mock version of a class. And the difference is just that a
fake version of the class is a, what I'm going to call a real class that you
write in C++. And you will probably write some code to be like, hey, when it
calls this function, maybe you keep some state internally. But you're not using
the real web contents, for example. You're using a fake. A mock is actually a
thing out of the Google test support library. It's part of a - Google mock is
the name of the sub-library, I guess, the sub-framework that provides this. And
it is basically a bunch of magic that makes that fake stuff happen
automatically. So you can basically say, hey, instead of a web contents, just
mock that web contents out. And the nice part about mock is, you don't have to
define behavior for any method you don't care about. So if there are, as we've
discussed, 100 methods inside web contents, you don't have to implement them
all. You can be like, OK, I only care about the do Foobar method. When that is
called, do this.

48:51 SHARON: Makes sense. One last type of test, which we don't hear about
that often in Chrome but does exist quite a bit in other areas, is manual
testing. So do we actually have manual testing in Chrome? And if so, how does
that work?

49:03 STEPHEN: Yeah, we actually do. We're slightly crossing the boundary here
from the open Chromium into the product that is Google Chrome. But we do have
manual tests. And they are useful. They are a thing. Most often, you will see
this in two cases as a Chrome engineer. You basically work with the test team.
As I said, all a little bit internal now. But you work with the test team to
define a set of test cases for your feature. And these are almost always
end-to-end tests. So go to this website, click on this button, you should see
this flow, this should happen, et cetera. And sometimes we run these just as
part of the launch process. So when you're first launching a new feature, you
can be like, hey, I would love for some people to basically go through this and
smoke test it, make sure that everything is correct. Some things we test every
release. They're so important that we need to have them tested. We need to be
sure they work. But obviously, all of the caveats about manual testing out
there in the real world, they apply equally to Chromium or to Chrome. Manual
testing is slow. It's expensive. We require people - specialized people that we
have to pay and that they have to sit there, and click on things, and that sort
of thing, and file bugs when it doesn't work. So wherever possible, please do
not write manual tests. Please write automated testing. Test your code, please.
But then, yeah, it can be used.

50:33 SHARON: In my limited experience working on Chrome, the only place that
I've seen there actually be any level of dependency on manual test has been in
accessibility stuff -

50:38 STEPHEN: Yeah.

50:38 SHARON: which kind of makes sense. A lot of that stuff is not
necessarily - it is stuff that you would want to have a person check because,
sure, we can think that the speaker is saying this, but we should make sure
that that's the case.

50:57 STEPHEN: Exactly. I mean, that's really where manual test shines, where
we can't integration test accessibility because you can't test the screen
reader device or the speaker device. Whatever you're using, we can't test that
part. So yes, you have to then have a manual test team that checks that things
are actually working.

51:19 SHARON: That's about all of our written down points to cover. Do you have
any general thoughts, things that you think people should know about tests,
things that people maybe ask you about tests quite frequently, anything else
you'd like to share with our lovely listeners?

51:30 STEPHEN: I mean, I think I've covered most of them. Please write tests.
Write tests not just for code you're adding but for code you're modifying, for
code that you wander into a directory and you say, how could this possibly
work? Go write a test for it. Figure out how it could work or how it couldn't
work. Writing tests is good.

51:50 SHARON: All right. And we like to shout-out a Slack channel of interest.
Which one would be the - which one or ones would be a good Slack channel to
post in if you have questions or want to get more into testing?

52:03 STEPHEN: Yeah. It's a great question. I mean, I always like to - I think
it's been called out before, but the hashtag #halp channel is very useful for
getting help in general. There is a hashtag #wpt channel. If you want to go ask
about web platform tests, that's there. There's probably a hashtag #testing.
But I'm going to admit, I'm not in it, so I don't know.

52:27 SHARON: Somewhat related is there's a hashtag #debugging channel.

52:27 STEPHEN: Oh.

52:27 SHARON: So if you want to learn about how to actually do debugging and
not just do log print debugging.

52:34 STEPHEN: Oh, I was about to say, do you mean by printf'ing everywhere in
your code?

52:41 SHARON: [LAUGHS] So there are a certain few people who like to do things
in an actual debugger or enjoy doing that. And for a test, that can be a useful
thing too - a tool to have. So that also might be something of interest. All
right, yeah. And kind of generally, as you mentioned a lot of things are your
opinion. And it seems like we currently don't have a style guide for tests or
best practices kind of thing. So how can we -

53:13 STEPHEN: [LAUGHS] How can we get there? How do we achieve that?

53:19 SHARON: How do we get one?

53:19 STEPHEN: Yeah.

53:19 SHARON: How do we make that happen?

53:19 STEPHEN: It's a hard question. We do - there is documentation for
testing, but it's everywhere. I think there's `/docs/testing`, which has some
general information. But so often, there's just random READMEs around the code
base that are like, oh, hey, here's the content public test API surface. Here's
a bunch of useful information you might want to know. I hope you knew to look
in this location. Yeah, it's a good question. Should we have some sort of
process for - like you said, like a style guide but for testing? Yeah, I don't
know. Maybe we should enforce that people dependency inject their code.

54:04 SHARON: Yeah. Well, if any aspiring test nerds want to really get into
it, let me know. I have people who are also interested in this and maybe can
give you some tips to get started. But yeah, this is a hard problem and
especially with so many types of tests everywhere. I mean, even just getting
one for each type of test would be useful, let alone all of them together. So
anyway - well, that takes us to the end of our testing episode. Thank you very
much for being here, Stephen. I think this was very useful. I learned some
stuff. So that's cool. So hopefully other people did too. And, yeah, thanks for
sitting and answering all these questions.

54:45 STEPHEN: Yeah, absolutely. I mean, I learned some things too. And
hopefully we don't have too many angry emails in our inbox now.

54:52 SHARON: Well, there is no email list, so people can't email in if they
have issues. [LAUGHTER]

54:58 STEPHEN: If you have opinions, keep them to yourself -

54:58 SHARON: Yeah. [INAUDIBLE]

54:58 STEPHEN: until Sharon invites you on her show.

55:05 SHARON: Yeah, exactly. Yeah. Get on the show, and then you can air your
grievances at that point. [LAUGHS] All right. Thank you.
chromium/docs/transcripts/wuwt-e04-tests.md