chromium/docs/transcripts/wuwt-e03-content.md

# What’s Up With //content

This is a transcript of [What's Up With
That](https://www.youtube.com/playlist?list=PL9ioqAuyl6ULIdZQys3fwRxi3G3ns39Hq)
Episode 3, a 2022 video discussion between [Sharon ([email protected])
and John ([email protected])](https://www.youtube.com/watch?v=SD3cjzZl25I).

The transcript was automatically generated by speech-to-text software. It may
contain minor errors.

---

What lives in the content directory? What is the content layer? How does it fit
into Chrome and the web at large? Here to answer all that and more is today’s
special guest, John, who not only is a Content owner, but actually split the
codebase to create the Content layer.

Notes:

- https://docs.google.com/document/d/1EJnG5gK8rQwHkdZTKl8vIwx9oScP8TaKBgwzBafIh9M/edit

Links:

- [//content/README.md]
- [//content/public/README.md]
- [What's Up With Pointers]

---

00:00 SHARON: Hello, and welcome to "What's Up with That", the series that
demystifies all things Chrome. I'm your host, Sharon, and today, we're talking
about content. What lives in the content directory? What is the content layer?
How does it fit into Chrome and the web at large? Here to answer all of that
and more is today's special guest, John. He's not only a content owner, but
actually split the code base to create the content layer. Since then, a theme
of his work has been Chrome's architecture, and how to make it usable by
others. He's been involved far and wide across Chrome, but today, we're
focusing on content. John, welcome to the program.

00:33 JOHN: Hi, everyone, and thanks for setting this up, Sharon. My name's
John, and I'm happy to try to shed some light and history on this part of the
Chrome codebase. I've had the pleasure of working on a lot of different parts
of Chrome over a number of years I've worked on it. A theme of my work has been
on the architecture of Chrome and making it reusable by other products. And one
of the projects has been splitting up the codebase and helping create this
content layer.

01:02 SHARON: So, can you tell us what the content layer is? Because content is
a very overloaded term, and we're going to say it a lot today. So you mentioned
the content layer. Can you tell us what that is?

01:10 JOHN: Yes. The content layer is a part of the Chrome codebase that's
responsible for the multiprocess sandbox implementation of our platform.

01:24 SHARON: And another term that I had heard a lot tossed around before I
really understood what was going on was the content/public API. So is that the
same as the content layer, or is that different?

01:36 JOHN: It's part of it. So the content component is very large, and so,
we've surrounded it by this small public API. So that you hide the
implementation details and the private directories, and then, embedders just
only have access to a small public layer.

01:56 SHARON: How did we end up with this content layer? Can you give us a bit
of history of how we came up with it? And also, maybe why it's called content?

02:02 JOHN: Sure. The history is - in the beginning, Chrome, like all software
projects begins nice and easy to understand. But over time, as you add a lot
more features to go from zero users to billions of users, it becomes harder to
understand. Small files, small classes become much larger; small functions kind
of get numerous hooks to talk to every feature, because they want to know when
something happens. And so, this idea started that let's separate the product -
things that make Google Chrome what it is - from the platform, which is what any
browser, any minimal browser doing the latest HTML specs would need to
implement them in a sandbox, a multiprocess way. And so, content was the lower
part, and that's how it started.

02:58 SHARON: How did we get the name content?

02:58 JOHN: The name is like a pun. And when we started Chrome, one of the
ideas was, we'll focus on content and not chrome, and so, the browser will get
out of the way. Chrome is a term used to refer to all the user interface parts
of the browser. And so, we said, it's going to be content and not chrome. And
so, when you open Chrome, you just see a very small UI. Most of what you see is
the content. And so, when we split the directory, it was originally called
src/chrome, and so, the content part, that's the pun. That's where it came
from.

03:34 SHARON: That's fun. Earlier, you mentioned embedders of content. Can you
tell us what an embedder of content is? And this is part of why I was very
excited about this episode, because I was working on a team where we were
embedders of content for a long time. Well over a year, and it took me a long
time to really understand what that was. Because, as you mentioned now,
Chrome's grown a lot. You work on a very specific thing understanding these
more general concepts of what is content, what is a content embedder, are less
important to what you do day-to-day. But can you tell us what an embedder of
content is?

04:13 JOHN: Sure. An embedder of content is simply anybody who chooses to use
that code to build a browser on top of it. And so, in the beginning, right when
we did this, the goal was just to have one embedder. Or not the goal, what we
had was just one embedder. It was Chrome. But then, right away, we were like,
you know what? It would be nice for people who work on content and not the
feature part to build a smaller binary. It builds faster. It debugs faster,
runs faster. And so, we built this minimal example also to other people called
`content_shell`. And then, we started running tests against that, and that was
the first - or the second embedder of content. And then since then, what was
unexpected, what we started for code health reasons turned out to be very
useful for other projects to restart - or start building their browser from.
And so, things like Android webview, which was using its own fork of WebKit,
then started using content. That was one first-party example. But then, other
projects came along. Things like Electron and \[Chromium\] Embedded Framework, all
started building not just products on top of it, but other frameworks.

05:30 SHARON: That was really surprising to learn about, because it seems
unsurprising that you would build another browser based on Chromium. And people
have heard about this when Edge switched over to Chromium. But to learn that
things like Electron are built around content seem really surprising, because
that's very different from what a browser is.

05:52 JOHN: But they have common needs. They have some HTML data, and they want
to render it and do so in a safe, and stable, and secure way. And that's not
their value add, working on that code. So it's better for them to use something
else.

06:11 SHARON: That makes sense. You also mentioned that Chrome is dependent on
content. And when I first started working on Chrome as an intern, I had it
told to me so many times, because I couldn't remember, that Chrome can depend on
content, but not the other way around. So can you tell us a bit about this
layering, and why it's there?

06:31 JOHN: I should also start by saying, content is not just - when we say
content, often what we mean, you embed content. You embed content in everything
that sits below it in the layer tree. So that includes things like Blink, our
rendering engine. V8, our JavaScript Engine. Net, our networking library, and
so on. And there's also you can talk to the content/public APIs, but also,
sometimes, you talk to the Blink API and the files, and V8, and so on.

07:07 SHARON: So you have this many layer API or product? And, at the bottom,
we have things like Net, Blink, and those probably have dependencies on them
that I don't know about. And on top of that, we have content, and then, on top
of that, we have Chrome?

07:23 JOHN: Right. And so, Chrome as an embedder of content can include directory
in the content/public API. But since content can have multiple embedders, it
can't include Chrome. If content reached out directly to Chrome, then other
people wouldn't be able to use it. Because if you try to bring in this code, it
includes files from a directory that you're not using. So, instead, the content/public
API, it has APIs going two different directions. One direction is going
into content, and then, one direction are these abstract interfaces that go out
from content. And any embedder has to implement them. And so, these usually end
up in terms like client or delegate. And these are implemented by Chrome, and
that's how content is able to call back to it. But then, any other, of course,
product or embedder can also implement these same interfaces.

08:23 SHARON: You mentioned Blink and also some things called delegate and
whatever. So we have a lot of things called something something host in
content. Can you talk a bit about what the relationship between content and
Blink is? Because there's a lot of mirroring in terms of how they might be set
up, and how they relate to each other.

08:37 JOHN: So Blink was the rendering engine that originally started as WebKit.
And we forked, and we named it Blink a number of years ago. And that did
not have any concept of processes. So it was something that you call it in one
process, and it does its job. And you give it whatever data it needs, and it
gives you back the rendered data. And you can poke at it or whatever you want
to do with it. But you needed to wrap that with some - you needed a bunch of
code around it to make it multi-process. And also, to figure out when it needs
something that's not available in the sandbox that it runs in, you have to
provide that data. And so, this is where the content layer comes in. It's the
one that wraps the rendering engine and uses the networking library and other
things to be able to create a fully working browser.

09:33 SHARON: More about processes. So it's easy to think, maybe, that the
content - the relationship between the content layer and the browser process.
So can you just talk a bit about how processes work in content? And what the
content API provides in terms of accessing these processes?

09:54 JOHN: So the content code runs in - it's the initial process that runs.
Content starts up, and then - and so, it's in the browser process. But it also
creates the render processes for where Blink runs. It creates a GPU process
that talks to the GPU and where a bunch of the compositing happens. It creates
a network process where we do networking. It creates other processes, things
like audio on some platforms, storage process to isolate storage. And then, a
lot of short lived processes for security and stability reasons. And so, you
can have processes that run content code, but, sometimes, an embedder wants to
run its own code in a different process. So it could re-use the same helpers
that content has for creating a process, and we'll use that. And then, I think
I didn't fully answer your previous question yet, which was the host part. So,
often, you'll have classes in Blink that are running in the renderer process,
and you need an equivalent class to drive it from the browser process. And
that's where we often have the host suffix. So it'd be like a class for -

11:11 SHARON: Can you give an example of -

11:11 JOHN: Yes. So, for example, every renderer process has a class in content/browser
called `RenderProcessHost`. And then, every tab object in Blink will
have this class called `RenderView`, and then, in content/browser, it will have
this class called `RenderViewHost`.

11:36 SHARON: Those are classes that, depending on what you work on, you might
see pop up quite a bit. And there's a lot of them. They're all called render
something host, and it's a bit tough to keep them straight. But that makes
sense as to why they're called render and - why render and host are in the
names for them. So you just listed a bunch of different process types. The GPU
process, the browser process, render processes. And, usually, whenever we have
different processes, we have some security boundary between them. Can you talk
a bit about how security and the content layer overlap? Is the content API a
security boundary? What happens if someone calls it maliciously? What could go
wrong if they do and do it successfully?

12:26 JOHN: So the security boundaries in any browser built on top of content
is the processes. We separate things to not just have render processes per tab,
but there are multiple render processes per tab thanks to the amazing work of
the Site Isolation project. And that's what split up different iframes into
different processes. And so, how they talk, all these processes talk through
IPC, and our current IPC system's called Mojo. And so, any time you talk, you
use Mojo between processes. You're usually talking from between processes of
different privileges. And so, one could be sandboxed and the other one not
sandboxed. Or one could be sandboxed, and the other one only partially
sandboxed. So you have to scrutinize any time you use these Mojo calls to make
sure that they can't inadvertently lead to a security vulnerability. Now, even
those, as hard as you can, people could still misuse code. Or, also, embedders
like Chrome or other content embedders can add their own IPCs. So content
obviously doesn't know about the IPCs from other layers, and so, it's possible
that it could be an embedder of content that has security vulnerability in
their own Mojo calls. And so, content doesn't know about them, so it can't do
anything about them. You could write insecure code in content. You can also
write insecure code in an embedder, and if someone finds a vulnerability - so
let's say someone finds a vulnerability in Blink, and maybe they're only
running their code in a minimal content shell. Maybe they can't find any other
Mojo calls that they can abuse to be able to get access to the browser process.
But maybe someone else, an embedder, is a more full-featured browser. It has
more IPC surface, and that could be more of an attack surface for that - to
start with that Blink vulnerability and then to hop into the browser process.

14:38 SHARON: And if you gain control of the browser process, that's a very
highly privileged process.

14:44 JOHN: Because that has full access to your system. So that's the point
where you can leave persistent changes to the user system, which is pretty bad.

14:55 SHARON: That sounds not great. So if you're an average, say, Chrome
engineer, that could be anyone. This is probably not too much of a concern. All
the stuff we mentioned, this is good to know. How would a Chrome engineer who
doesn't directly work on content or in the content directory interact with the
content layer?

15:20 JOHN: Well, they might need a signal from Blink, for example. That's
often how someone will do that. They'll be working on a feature in the browser,
and everything works great. But then, they'll be like, I just need something
from Blink. But it's not there. And so, sometimes, they'll have to add an IPC
between processes, and that might interact. They'll be like, how do I get it?
It's in Blink. It's in the `RenderView` class. so I need an interface that talks
between each `RenderViewHost` and each `RenderView`. And that's how they might
get - well, that would be how they get interaction with the multiprocess part
of it. But if someone is just working on something only in a browser process,
they might still be trying to get information about the current tab. And that's
represented by a `WebContents` class in content. So they'll look in
content/public/browser, and they'll see `WebContents`. And there will be a lot of
interfaces that hang off it. So they'll be looking at it, going through a trail
of interfaces and classes to be able to get more information on what's going on
in the current tab.

16:29 SHARON: Can you give us a quick overview of the `WebContents` class?
Because it is one, massive, and two, called something like `WebContents`. Which
suggests it's important because content plus the web, and it's also something
you see all over the place. So can you just give us a quick overview of what
that class does? What it's for? What it represents?

16:46 JOHN: Yes. Things now are a lot more complicated than before, but if you
go back in a time machine and see how these things started, you can roughly
think in initial Chrome. Every tab had a class to represent the content in that
tab, and that was called `WebContents`. And then, it was called `WebContents`
because we had other classes. We used to be able to put native stuff in a tab.
And so, that would be called `TabContents`. But that's gone now, and we just
have `WebContents`. So that's where the name comes from. And then even, for
example, there was `RenderProcessHost`, which I mentioned earlier. And then,
each tab, each `WebContents` roughly translate into one render process. And so,
now, it's a bit more complicated. There are examples where you can have
`WebContents` inside of `WebContents`, and that's more esoteric that most people
don't have to deal with. And then, so that's what `WebContents` is for. It will
do things like take input and feed it to the page. Every time there's a
permission prompt, you usually go through that. If a page wants access to a
microphone, or video, and so on. It keeps track if there's navigation going on.
What's the current URL? What's the pending URL? It uses other classes to drive
all that stuff as you send out the network request and get it back. And that's
not inside of `WebContents` itself, but it's driven by other helper classes.

18:28 SHARON: I tend to think of content as being the home of navigation, which
I think is a decent way to think about it and also is maybe biased because of
the stuff I've been working on. But you have Chrome, and navigation, and
content, and all the stuff here. And then, separately, you have the actual web,
the internet. And that has things like actual websites. And there are web
standards, and there's things like HTML. And these two things somehow have to
intersect. But being on the Chrome side, working on Chrome, apart from writing
some browser tests, maybe, you never really interact with any of the more web
things. JavaScript, you don't really touch. That's more Blink and HTML only in
a test kind of thing. So how do these web standards - there's navigation web
standards and all that. How do we actually make sure that they're implemented
in Chrome? And where does that happen?

19:32 JOHN: So that happens all over the code, but there's a few critical
directories. If you look at net at a low level, a lot of IETF - and some
web specs will be implemented there at that layer. Either net or in the network
service, which is a code that runs inside the network process. Then you've got
V8, of course, our JavaScript engine, and that has to follow the ECMAScript
standards. And then, there's a lot of the platform standards. Either some of
them only don't need multiple processes to be - to implement them, so they'll
just be completely inside Blink. But some of them require multiple processes,
things that need access to devices and so on. And so, that implementation will
be split across Blink and content/browser. But then, how do you ensure that,
not only do you implement this correctly, but also that you don't regress it?
So there's a whole slew of tests. There's the Blink tests, which used to be
called the layout tests. And those run across the simple, simple test cases for
many features to make sure that each one works. And there's also this cool
thing where we share now a lot of these tests with other embedders, and that
way, you run the same test in every browser. And so, when you write a test, you
don't have to write it n times. You can just write it once. So that's how we
ensure that we meet the specs.

21:10 SHARON: That makes sense. Because I've been pointed - when I was looking
into a class. What does this do? I've been linked to, say, one of the HTML
specs or web specs. But the whole time, I'm just thinking, how do we make
sure - or who's checking that we're actually implementing this and correctly?
But these tests seem like a good way to do it and also ensure some level of
consistency across browsers. Assuming you know whether or not the browser you
use chooses to run these tests or not, I guess.

21:41 JOHN: And as an engineer on a project like that, the first time you'll
hit them is when you're breaking them. You'll make a change, and I think this
is fine. And then, you send it to the commit queue, and you break some layout
tests. What's happening to me today? And then, you have to drill into it. And
the nice thing about layout test is because each one is small, you - it's
faster to figure out what you broke because it's just like, hopefully, you only
broke a small number of tests.

22:06 SHARON: For sure, and it's a good example of why we have all these tests,
is to make sure things don't break. So that is pretty much all the questions I
have written down. Is there anything else generally content layer, content/public
API-ish related that is interesting that maybe we didn't get a chance to
cover?

22:31 JOHN: Yes. The most common questions is people will be like, well, does
this belong in content or not? So I can have a chance to point people towards
their README files and [//content/README.md] that describes what's supposed to go in
or not. And then, there's also a [//content/public/README.md] that describes the
guidelines we have for the API to make it consistent.

22:59 SHARON: I've definitely seen those questions before. You're updating one
of the content/public APIs. Does this belong? While we're here, can you give us
a quick breakdown heuristic of what things generally would belong in the
content/public API versus you put it up for review, and the reviewer's like,
no. This does not belong in content/public?

23:24 JOHN: So sometimes, for example, for convenience, maybe the Chrome layer
wants to call other parts of Chrome layer, but they don't have a direct
connection. Or maybe a Chrome layer wants to talk to a different component. And
so, they'll be like, we'll add something to the content API, and then, that
way, Chrome can talk to this other part of Chrome or this other component
through content as a shortcut. We don't allow that, and the reason for that is
anybody who's gone through the content/public directory, it's already huge. And
so, we feel that if Chrome wants to talk to Chrome or to another layer, they
should have their own API to each other directly instead of hopping through
content. Just because the content API's already very large, very complex, hard
to understand. So we don't want to add things that are absolutely not necessary
to it. And another thing we try to do is to not add multiple ways of doing
something. We only add something to the content API when there's no other way
of getting this data from inside content, or there's no other way of getting
this data from them better to content. But if there's something similar that
can do the same thing, we push back on that.

24:39 SHARON: And also, test-only things? Are those generally OK, or do you
want to generally avoid those?

24:45 JOHN: Well, yes. test-only methods, we try really hard - not just for the
public API, but inside, because we don't want to bloat the binary. But we do
have content/public/test, which is - gives you a lot more leeway to poke at
things in your browser test, for example, or your unit tests. Another thing is,
we also have guidelines for how the API should be. We don't have, really,
concrete classes. It's mostly abstract interfaces. And so, there's a bunch of
rules there, and they're all listed in content/public/README. Just so people
know the guidelines we have for interfaces there.

25:28 SHARON: On the Chrome binary point, how much is the size of the binary
dependent on the size of the content/public API? Is that a big part of the
binary, or is it small enough where, sure, we want to keep it from being
unnecessarily large but not too much of an issue?

25:48 JOHN: The size is not going to come as much from the content/public API
but just from the entire content and all its dependencies. And those are in the
tens of megabytes. So, sometimes, for example, if you're bundling the content
layer, you're not going to be a small binary. You'll just start off in the 30
megabyte range or 40 megabyte range once you put everything together.

26:12 SHARON: And I guess that's something you have to be more conscious of if
you're working in content versus another directory even in Chrome, is that you
have to be wary of your dependencies more so than anywhere else. Not only for
Chrome, but also, any other embedders who might want to use content.

26:31 JOHN: Yes. And so, for example, if someone's trying to add something in
Chrome, we also ask, does this have to be in content? Or can this be part of
Chrome, so that not every embedder has to pay that cost if they don't need it?
Maybe we'll have an interface, and the embedder can plug the data in through
that way but still not have it in content. Another problem, of course, with
having data inside content is that not all embedders update at the same speed.
So if you're putting something in content, it can quickly go stale, the
content, whatever the data is if you're not updating quickly.

27:08 SHARON: That make sense. So we mentioned a bit of what content is, a bit
of the history of it. Can you tell us anything about what are upcoming changes
that might happen in content? What is the future of the content directory, the
layer, the API?

27:28 JOHN: Well, it's always changing. It's not static, driven by the needs of
the product. And so, you look at big changes happening today like MPArch to
support various use cases that we didn't have, or we never thought about
initially. And that's where the `WebContents` inside `WebContents`, some of that
comes in. There are big changes like banning, for example, pointers and
replacing them with a `raw_ptr`. So we can try to address some of the
security problems we have with Use-After-Frees. So that's where, when you look
at the content code or the Chrome code in general, too, you might see a little
bit different than that average C++ project that you see. You'll be like, I'm
getting errors if I try to have a raw pointer, and that's why.

28:15 SHARON: Check out episode one for more on that. We'll link it below.
Anything else random content-related or otherwise you would like to share with
us?

28:27 JOHN: I think the only other thing I would add is familiarize yourself
with the READMEs in content/README and content/public/README before making
changes. That will make the author and reviewer's time more efficient. And if
you're working on content and below, you can build Content Shell instead of
Chrome. That would be faster to build and debug and hopefully make you more
productive.

28:52 SHARON: Good tips. Hopefully, our viewers follow them. They would never
try to change a content/public API without reading the READMEs first. Well,
thank you so much, John, for sitting down and chatting with me about content.
This was great, and, hopefully, people find it useful.

29:14 JOHN: And thank you for hosting me, Sharon.

29:23 SHARON: Did you start working on Chrome from the very start, or just -
obviously, pre-launch. Because, I think, based on your profile pictures, the
picture of that comic book that released when Chrome did - which I was lucky
enough to get a copy of when I was an intern. Shout-out Peter. But that
obviously suggests you were a major contributor before the public launch of
Chrome. So were you working on Chrome from the very beginning?

29:47 JOHN: I was not. It took about six months. I tried to join from the
beginning, but I couldn't join right at the beginning. So my sneaky way was I
found another project under that same director who was running Chrome, and
then, once that project finished in six months, then I jumped into Chrome.

30:09 SHARON: And do you ever think about how crazy it is from this thing that
you worked on, effectively, from the start before the public launch? To what it
is now where Chrome is one of the foundational pieces of the internet at large?
Any time the internet gets run period, probably something in Chrome is running
like the next stack, if not, obviously, the browser? Do you ever think about
that, and how crazy that is? And your place in that?

30:38 JOHN: Yes. It's amazing how far Chrome has come, and it's really humbling
to see it be the number one browser, the most widely-used browser. Because when
we were working on Chrome at the beginning, we were just trying to guess what
market share it would have. And people would be like, it'll be 10%, and we're
like, no way. Even the people working on it, we didn't think that was going to
be possible. So to see users really enjoy using it, and for us to keep
demonstrating value by sticking to our four principles, security and stability,
simplicity and speed. And seeing people not just adopt Chrome as a product, but
Chromium as a platform is - it's beyond our wildest dreams. And it's a
responsibility that we have every time we make a change to Chrome to all these
users and developers using it. You were asking earlier, how does it feel to be
here from the start? There's almost a sense of feeling super lucky. But also
this humbling feeling where we started in Chrome when it was really small, and
our knowledge built up incrementally as it got more complicated. But so, it's
like, well, what if I was to jump in Chrome today? It seems like way too many -
the code is so complicated now compared to before. This almost responsibility
we have as being in Chrome for a long time to share knowledge, to help people
pick it up. Because we would ourselves struggle if we were to jump in now.

32:22 SHARON: Yes. As those people, we certainly did struggle. But people are
pretty smart, I think, and they can figure it out. But that doesn't mean you
can't make it easier for the people in the future figuring it out. Or even
people who - you just work on a different part. If I were to do anything in
Blink, I'm just like -

32:44 JOHN: Same. I've been on it for a long time. I don't touch Blink.

32:50 SHARON: Yes. Yes.

[//content/README.md]: https://crsrc.org/c/content/README.md
[//content/public/README.md]: https://crsrc.org/c/content/public/README.md
[What's Up With Pointers]: https://www.youtube.com/watch?v=MpwbWSEDfjM