chromium/third_party/blink/renderer/bindings/scripts/bind_gen/README.md

# Blink-V8 bindings generator (bind_gen package)

[TOC]

## What's bind_gen?

Python package
[`bind_gen`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/)
is the core part of Blink-V8 bindings code generator.
[`generate_bindings.py`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/generate_bindings.py)
is the driver script, which takes a Web IDL database (`web_idl_database.pickle`
generated by
[`web_idl_database`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/BUILD.gn?q=content:%5C%22web_idl_database%5C%22&ss=chromium)
GN target) as an input and produces a set of C++ source files of Blink-V8
bindings (v8_\*.h, v8_\*.cc).

## Design and code structure

The bindings code generator is implemented as a tree builder of `CodeNode`
which is a fundamental building block. The following sub sections describe
what `CodeNode` is and how the code generator builds a tree of `CodeNode`.

### [`CodeNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ECodeNode$&ss=chromium)

The code generator produces C++ source files (text files) but the content of
each file is not represented as a single giant string nor a list of strings.
The content of each file is represented as a CodeNode tree.

`CodeNode` is a fundamental building block that represents a text fragment in
the tree structure. A text file is represented as a tree of CodeNodes, each of
which represents a corresponding text fragment. The code generator is the
CodeNode tree builder.

Here is a simple example to build a CodeNode tree.
```python
# SequenceNode and TextNode are subclasses of CodeNode.

def make_prologue():
  return SequenceNode([
    TextNode("// Prologue"),
    TextNode("SetUp();"),
  ])

def make_epilogue():
  return SequenceNode([
    TextNode("// Epilogue"),
    TextNode("CleanUp();"),
  ])

def main():
  root_node = SequenceNode([
    make_prologue(),
    TextNode("LOG(INFO) << \"hello, world\";"),
    make_epilogue(),
  ])
```
The `root_node` above represents the following text.

```c++
// Prologue
SetUp();
LOG(INFO) << "hello, world";
// Epilogue
CleanUp();
```

The basic features of CodeNode are implemented in
[code_node.py](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py).
Just for convenience, CodeNode subclasses corresponding to C++ constructs are
provided in
[code_node_cxx.py](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node_cxx.py).

`CodeNode` has an object-oriented design and has internal states (not only the
parent / child nodes but also more states to support advanced features).

### CodeNode tree builders

The bindings code generator consists of multiple sub code generators. For
example, `interface.py` is a sub code generator of Web IDL interface and
`enumeration.py` is a sub code generator of Web IDL enumeration. Each Web IDL
definition has its own sub code generator.

This sub section describes how a sub code generator builds a CodeNode tree and
produces C++ source files by looking at
[`enumeration.py`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/enumeration.py)
as an example. The example code snippet below is simplified for explanation.

```python
def generate_enumerations(task_queue):
    for enumeration in web_idl_database.enumerations:
        task_queue.post_task(generate_enumeration, enumeration.identifier)
```

[`generate_enumerations`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/enumeration.py?q=function:%5Egenerate_enumerations$&ss=chromium)
is the entry point to this sub code generator. In favor of parallel processing,
`task_queue` is used. `generate_enumeration` (singular form) actually produces
a pair of C++ source files (\*.h and \*.cc).

```python
def generate_enumeration(enumeration_identifier):
    # Filepaths
    header_path = path_manager.api_path(ext="h")
    source_path = path_manager.api_path(ext="cc")

    # Root nodes
    header_node = ListNode(tail="\n")
    source_node = ListNode(tail="\n")

    # ... fill the contents of `header_node` and `source_node` ...

    # Write down to the files.
    write_code_node_to_file(header_node, path_manager.gen_path_to(header_path))
    write_code_node_to_file(source_node, path_manager.gen_path_to(source_path))
```

The main task of
[`generate_enumeration`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/enumeration.py?q=function:%5Egenerate_enumeration$&ss=chromium)
is to build CodeNode trees and write them down to files. A key point here
is to build two trees in parallel;
one for \*.h and the other for \*.cc. We can add a function declaration to the
header file while adding the corresponding function definition to the source
file. The following code snippet is an example to add constructors into the
header file and the source file.

```python
    # Namespaces
    header_blink_ns = CxxNamespaceNode(name_style.namespace("blink"))
    source_blink_ns = CxxNamespaceNode(name_style.namespace("blink"))
    # {header,source}_blink_ns are added to {header,source}_node (the root
    # nodes) respectively.

    # Class definition
    class_def = CxxClassDefNode(cg_context.class_name,
                                base_class_names=["bindings::EnumerationBase"],
                                final=True,
                                export=component_export(
                                    api_component, for_testing))

    ctor_decls, ctor_defs = make_constructors(cg_context)

    # Define the class in 'blink' namespace.
    header_blink_ns.body.append(class_def)

    # Add constructors to public: section of the class.
    class_def.public_section.append(ctor_decls)
    # Add constructors (function definitions) into 'blink' namespace in the
    # source file.
    source_blink_ns.body.append(ctor_defs)
```

In the above code snippet,
[`make_constructors`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/enumeration.py?q=function:%5Emake_constructors$&ss=chromium)
creates and returns a CodeNode tree for the header file and another CodeNode
tree for the source file. For most cases, functions named `make_xxx` creates
and returns a pair of the CodeNode trees. These functions are subtree builders
of the CodeNode trees.

These subtree builders are implemented in a way of functional programming
(unlike CodeNodes themselves are implemented in a way of object-oriented
programming). These subtree builders create a pair of new CodeNode trees at
every function call (returned code node instances are different per call, so
their internal states are separate), but the contents are 100% determined
solely by the input arguments. This property is very important when we use
closures in advanced use cases.

So far, the typical code structure of the sub code generators is covered.
`enumeration.py` consists of several `make_xxx` functions (subtree builders) +
`generate_enumeration` (the top-level tree builder + file writer).

### Advanced: Two-step code generation and declarative style

#### Typical problems of (simple) code generation

Bindings code generation has the following typical problems. Suppose we have
the following simple code generator.
```python
# Example of simple code generation

def make_foo():
  return SequenceNode([
    TextNode("HeavyResource* res = HeavyFunc();"),
    TextNode("Foo(res);"),
  ])

def make_bar():
  return SequenceNode([
    TextNode("HeavyResource* res = HeavyFunc();"),
    TextNode("Bar(res);"),
  ])

def main():
  root_node = SequenceNode([
    make_foo(),
    make_bar(),
  ])
```
This produces the following C++ code, where we have two major problems. The
first problem is a symbol conflict: `res` is defined twice. Even if we gave
different names like `res1` and `res2`, we have the second problem: the
produced code calls `HeavyFunc` twice, which is not efficient.
```c++
// Output of simple code generation example
HeavyResource* res = HeavyFunc();
Foo(res);
HeavyResource* res = HeavyFunc();
Bar(res);
```
Ideally we'd like to have the following code, without introducing tight coupling
between `make_foo` and `make_bar`.
```c++
// Ideal generated code
HeavyResource* res = HeavyFunc();
Foo(res);
Bar(res);
```

#### Two-step code generation as a solution

In order to resolve the above problems, the bindings code generator supports
two-step code generation. This way may look like declarative programming.
```python
# Example of two-step code generation

def bind_vars(code_node):
  local_vars = [
    SymbolNode("heavy_resource",
               "HeavyResource* ${heavy_resource} = HeavyFunc(${address}, ${phone_number});"),
    SymbolNode("address",
               "String ${address} = GetAddress();"),
    SymbolNode("phone",
               "String ${phone_number} = GetPhoneNumber();"),
  ]
  for symbol_node in local_vars:
    code_node.register_code_symbol(symbol_node)

def make_foo():
  return SequenceNode([
    TextNode("Foo(${heavy_resource});"),
  ])

def make_bar():
  return SequenceNode([
    TextNode("Bar(${heavy_resource});"),
  ])

def main():
  root_node = SymbolScopeNode()
  bind_vars(root_node)
  root_node.extend([
    make_foo(),
    make_bar(),
  ])
```
The above code generator has two kinds of code generation. One kind is
`make_foo` and `make_bar`, which are almost the same as before except for use
of a template variable (`${heavy_resource}`). The other kind is `bind_vars`,
which provides a catalogue of symbol definitions. We can make the definitions
of `make_foo` and `make_bar` simple with using the catalogue of symbol
definitions. This code generator produces the following C++ code
without producing duplicated function calls.
```c++
// Output of two-step code generation example
String address = GetAddress();
String phone_number = GetPhoneNumber();
HeavyResource* heavy_resource = HeavyFunc(address, phone_number);
Foo(heavy_resource);
Bar(heavy_resource);
```
The mechanism of two-step code generation is simple.
SymbolNode(name, definition) consists of a symbol name and code fragment that
defines the symbol. When a symbol name is referenced as `${symbol_name}`, it's
simply replaced with `symbol_name`, plus it triggers insertion of the symbol
definition into a surrounding `SequenceNode`. This step happens recursively.
So not only `heavy_resource`'s definition but also `address` and
`phone_number`'s definitions are inserted, too.

With the two-step code generation, it's possible (and expected) to write code
generators in the declarative programming style, which works better in general
than the imperative programming style.

#### Important subclasses of CodeNode for two-step code generation

- [`SymbolNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ESymbolNode$&ss=chromium)

SymbolNode consists of a symbol name and its definition. You can reference a
symbol as `${symbol_name}` in TextNode and FormatNode. It's okay that you
never reference a symbol. The symbol definition will be automatically inserted
only when you reference the symbol.

For simple use cases, a SymbolNode can be constructed from a pair of a symbol
name and a plain text (which can contain references in the form of `${...}`) as
the definition.
```python
# Example of simple use cases
addr_symbol = SymbolNode("address",
                         "void* ${address} = ${base} + ${offset};")
```
For more complicated use cases, SymbolNode's definition can be a callable that
returns a SymbolDefinitionNode instead. This is useful when the definition has
a complex structure of code node tree, since a plain text definition cannot
represent a code node tree structure.

```python
# Example of complicated use cases
def create_address(symbol_node):
  node = SymbolDefinitionNode(symbol_node)
  node.extend([
    TextNode("void* ${address} = ${base} + ${offset};"),
    CxxUnlikelyIfNode(
      cond="!${address}",
      attribute=None,
      body=[
        TextNode("${exception_state}.ThrowRangeError(\"...\");"),
        TextNode("return;"),
      ]),
  ])
  return node

addr_symbol = SymbolNode("address",
                         definition_constructor=create_address)
```
where CxxUnlikelyIfNode represents a C++ if statement with an unlikely condition
(defined in code_node_cxx.py). This definition is better than a plain text
definition because it inserts the definition of ${exception_state} at the best
position depending on how much likely ${exception_state} is actually used.
```c++
// Output of the example of complicated use cases
void* base = ...;  // ${base}'s definition is automatically inserted.
void* offset = ...;  // ${offset}'s definition is automatically inserted.
// ${exception_state}'s definition may be inserted here if it's used often or
// outside of the following if statement.
// ExceptionState exception_state(...);
void* address = base + offset;
if (!address) {
  // ${exception_state}'s definition may be inserted here if it's not used often
  // or outside of this if statement.
  ExceptionState exception_state(...);
  exception_state.ThrowRangeError("...");
  return;
}
```

- [`SymbolDefinitionNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ESymbolDefinitionNode$&ss=chromium)

SymbolDefinitionNode represents the code fragment that defines a symbol.
The code generator automatically inserts symbol definitions at the best
positions heuristically.
However it's hard to determine the best position in one path calculation, so
the code generator iterates symbol definition insertions/relocations until it
finds the heuristically best positions.
SymbolDefinitionNode is used to identify a subtree
of code nodes that defines its symbol (i.e. used to distinguish automatically
inserted code nodes from the original code node tree).

- [`SequenceNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ESequenceNode$&ss=chromium)

SequenceNode represents not only a list of CodeNodes but also insertion points
of SymbolDefinitionNode. SymbolDefinitionNodes will be inserted between
elements within a SequenceNode.

- [`ListNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5EListNode$&ss=chromium)

Compared to SequenceNode, ListNode represents just a list of CodeNodes that does
not support automatic insertion of symbol definitions, i.e. ListNode is
indivisible. SequenceNode should be used when your code nodes represent a
series of C++ statements, otherwise ListNode is preferred over SequenceNode so
that nothing will be inserted in between. See the following example.
```python
# Example of SequenceNode vs ListNode
int_array = ListNode([
  TextNode("int int_array[] = {"),
  ListNode([
    TextNode("${foo}"),
    TextNode("${bar}"),
  ], separator=","),
  TextNode("};"),
])

node = SequenceNode([
  int_array,
  TextNode("PrintIntArray(int_array);"),
])
```
This example produces the following C++ code. Since symbol definitions are
inserted only between elements of SequenceNode, ${foo} and ${bar}'s definitions
won't be inserted within `int_array`'s definition.
```c++
// Output of SequenceNode vs ListNode example
int foo = ...;  // ${foo}'s definition is automatically inserted here.
int bar = ...;  // ${bar}'s definition is automatically inserted here.
int array[] = {
  // ${foo}'s definition is _not_ inserted here.
  foo,
  // ${bar}'s definition is _not_ inserted here.
  bar
};
PrintIntArray(int_array);
```

- [`SymbolScopeNode`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/code_node.py?q=class:%5ESymbolScopeNode$&ss=chromium)

You can register SymbolNodes only into a SymbolScopeNode. Registered symbols
are effective only inside the SymbolScopeNode. This behavior reflects that
C++ variables are effective only inside the closest containing C++ block
(`{...}`).

## Tips for debugging and code reading

The driver script
[`generate_bindings.py`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/generate_bindings.py)
supports two useful command line flags:
`--format_generated_files` and `--enable_code_generation_tracing`.

`--format_generated_files` runs clang-format for the generated files so that
they are easy for developers to read.

`--enable_code_generation_tracing` outputs code comments (e.g.
`/* make_wrapper_type_info:6304 */` in addition to the regular output in order
to clarify which line of the code generator code generated which line of
generated code.
This is useful to understand the correspondence between the code generator and
generated code.

When the tracing comments show functions which are too common and uninteresting
to you (e.g. `make_blink_to_v8_value`), you can exclude such functions
module-by-module basis by using
[`CodeGenTracing.add_modules_to_be_ignored`](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/bindings/scripts/bind_gen/package_initializer.py?q=CodeGenTracing%5C.add_modules_to_be_ignored&ss=chromium).

Here is an example command line to run the script with the options
(working fine as of 2024 May).
```shell
# Run generate_bindings.py with --format_generated_files and
# --enable_code_generation_tracing.
#
# web_idl_database.pickle must have already been generated and updated.
# Or, run 'autoninja -C out/Default web_idl_database' in advance.

$ cd out/Default
$ python3 ../../third_party/blink/renderer/bindings/scripts/generate_bindings.py \
async_iterator callback_function callback_interface dictionary enumeration interface namespace observable_array sync_iterator typedef union \
--web_idl_database gen/third_party/blink/renderer/bindings/web_idl_database.pickle \
--root_src_dir=../.. \
--root_gen_dir=gen \
--output_reldir=core=third_party/blink/renderer/bindings/core/v8/ \
--output_reldir=modules=third_party/blink/renderer/bindings/modules/v8/ \
--output_reldir=extensions_chromeos=third_party/blink/renderer/bindings/extensions_chromeos/v8/ \
--format_generated_files \
--enable_code_generation_tracing
```