chromium/tools/binary_size/libsupersize/docs/data_model.md

# Data Model

The SuperSize data model is a sorted flat list of symbols. Using a flat list is
simple, and allows arbitrary queries to be made on symbols.

[//tools/binary_size/libsupersize/models.py] contains the definition of all data
classes.

[//tools/binary_size/libsupersize/models.py]: /tools/binary_size/libsupersize/models.py

[TOC]

## Python API Reference

### SizeInfo

Represents the data within a `.size` file. Contains:

 * `build_config`: JSON metadata applicable to all symbols.
 * `containers`: List of Container instances used by symbols in this SizeInfo.
 * `raw_symbols`: List of Symbols.

### Symbol

Each symbol contains the following fields:

 * `container`: A (shared) Container instance.
 * `section_name`: E.g. ".text", ".rodata", ".data.rel.local"
 * `section`: The single character abbreviation of `section_name`.
    E.g. "t", "r", "d".
 * `size`: The number of bytes this symbol takes up, including padding that
    comes before |address|.
 * `padding`: The number of bytes of padding before |address|.
 * `address` (optional): The start address of the symbol.
 * `source_path` (optional): Path to the source file that caused this symbol to
    exist (e.g. `base64.cc`, `SomeClass.java`).
 * `object_path` (optional):
    * For native and pak: Path to associated object file. E.g.: `base/base64.o`
    * For dex: Package path. E.g.: `$APK/org/chromium/chrome/SomeClass.class`
 * `aliases`: List of symbols that represent the same bytes. The |aliases| of
   each symbol in this list points to the same list instance.
 * `num_aliases`: The number of symbols with the same address (including self).
 * `pss`: `size` / `num_aliases`.
 * `padding_pss`: `padding` / `num_aliases`.
 * `full_name`: Name for this symbol.
    * Symbols are not required to have unique names, or names as all (empty
      string is valid).
 * `template_name`: Derived from `full_name`. Name with parameter list removed,
       but template parameters present.
 * `name`: Derived from `full_name`. Names with templates and parameter list
       removed.
 * `component`: The team that owns this feature (optional, maybe be empty).
 * `flags`: Bitmask of flags. See `FLAG_*` constants in `models.py`.
 * `disassembly` (optional): The disassembly code for the symbol.

### Diffs

Diffs are represented in Python using `DeltaSizeInfo`, which contains a list of
`DeltaSymbol` instances. `DeltaSymbols` maintain the full fidelity of symbols in
the diff by storing a pointer to the before / after symbol that they represent.
See [diffs.md](diffs.md) for more details.

## Concepts

### Symbol Aliases

Aliases occur when multiple symbols refer to the same bytes (have the same
`address`, `size`, and `padding`).

Examples of where aliases are used:

 * Functions with identical code are de-deuped via identical code folding.
 * Functions that appear in multiple translation units (e.g. functions with
   inline linkage). These have the same name, but different paths.
   * Represented as one alias per path, but are collapsed into a single symbol
     with a path of `$COMMON_PREFIX/{shared}/$SYMBOL_COUNT` when the number of
     aliases is large.
     * E.g.: `base/{shared}/3`
 * String literals that are de-duped by identical code folding.
 * Pak entries with identical payloads.

### Path Normalization

 * Prefixes are removed: `out/Release/`, `gen/`, `obj/`
   * This causes generated files to overlay non-generated source tree, which is
     useful for attribution since the two generally mirror one another.
   * Generated symbols have the `FLAG_GENERATED` bit set.

### Overhead and Star Symbols

**Overhead symbols** are symbols with a name that starts with "Overhead:". They
track bytes that are generally unactionable. They are recorded as padding-only
symbols (e.g.: `size=10`, `padding=10`, `size_without_padding=0`) because
"padding" better associates with "overhead" vs. size.

* `Overhead: ELF file`: `elf_file_size - sum(elf_sections)`.
  * Captures bytes taken up by ELF headers and section alignment.
* `Overhead: APK file`: `apk_file_size - sum(compressed_file_sizes)`
  * Captures bytes taken up by `.zip` metadata and zipalign padding.
* `Overhead: ${NAME}.pak`: `pak_file_size - sum(pak_entries)`
* `Overhead: aggregate padding of diff'ed symbols`: Appears in symbol diffs to
  represent the per-section cumulative delta in padding.

**Star symbols** are symbols with a name that starts with "\*\*". They represent
sections of binary that are unattributed.

Examples:

 * `** Merge Globals` - Taken from linker map file. A section of data
   containing unnamed constants.
 * `** Symbol gap`: A gap between symbols that is larger than what could be
   plausibly be due to alignment.
 * `** ELF Section: .rel.dyn`: A native code ELF section that is not broken down
   into smaller symbols.