# Investigating Out of Memory crashes
A large fraction of process crashes in Chromium are due to Out Of Memory (OOM)
conditions. This page is meant to help Chromium developers understand stack
traces, and investigate. Note that some of the documentation here will only be
applicable to Google Chrome, as it is specific to the way Google's crash
reporting infrastructure aggregates and reports crashes.
Some of the following also assumes that the `malloc()` implementation is
PartitionAlloc, which is as of 2022 the case on most platforms.
[TOC]
## Identifying OOM crashes
When a process crashes due to an Out Of Memory condition, this is usually
signaled by the presence of `base::internal::OnNoMemoryInternal()` on the stack.
**Google Chrome only:** crash report infrastructure tags these as "[Out of
Memory]" based on this, and other function names. The full list is determined in
the (internal) crash server's code.
Since Chromium configures its memory allocators to prefer crashing rather than
returning `nullptr`, an OOM crash can be triggered from anywhere in the code,
and most commonly from within the allocator, or higher-level functions such as
`operator new` in C++.
## Distinguishing between underlying causes
### Different causes
A process can reach an OOM condition for several reasons:
* **The OS is truly out of memory**, regardless of how much memory the *current*
process is using
* **Some limit inside the OS is reached**. For instance, on Windows, there
exists a global "commit limit", which is the amount of memory that the system
can commit. Note that it is possible to commit more memory than what is
actually in use. This may also happen on Linux systems configured with no or
limited "overcommit", though the majority of systems don't have a limit.
* **Virtual address space exhaustion**. This is most likely to happen for relatively
large allocations, on 32 bit systems, where total addressable space is
typically 2GiB (most Windows systems), 3GiB (e.g. some Windows configurations,
Linux) or 4GiB (e.g. WoW64). However, it may also happen on 64 bit systems,
either due to:
* Limited virtual addressable space in the CPU/OS. For instance most Android
ARM64 systems have only 40 bits of address space as of 2022.
* "Cage" exhaustion. This is most likely to happen with PartitionAlloc on 64
bit systems, where all allocations are grouped into a single contiguous
virtual address space "cage".
* **Sandbox per-process memory limit**. For some process types (e.g. Renderers)
and on most platforms, the sandbox enforces a maximum per-process memory
limit. Given that this limit is typically set at the OS level, it may not be
distinguishable from e.g. commit limit exhaustion.
* **Excessive allocation size**. Some allocators (notably PartitionAlloc)
purposely limit the maximum allocation size.
### Identifying the cause
In the case of PartitionAlloc, it is possible to distinguish some of these cases:
* **Virtual address space exhaustion**. This is identified by the presence of
`PartitionOutOfMemoryMappingFailure()` on the stack. It means that the
allocator was unable to find enough address space, either for its internal
memory allocation unit size, or the requested size. Since memory is *not*
committed as this step, this signals an address space issue.
* **Commit**. This is identified by the presence of
`PartitionOutOfMemoryCommitFailure()` on the stack. This signals that either
the OS or the sandbox limit has been reached.
* **Excessive allocation size**. Shown by `PartitionExcessiveAllocationSize()`
on the stack.
## What to do?
### Commit Limit Reached
The process is "truly" out of memory, or the system is. Some amount of these
crashes is expected, and the crashing location is not necessarily the
culprit. Indeed, as a rough approximation, the failing allocation is more likely
to be from a component naturally allocating a lot of memory, e.g. V8 or
rendering.
However, if there is a spike, and many stack traces come from an unusual
location (e.g. newly added code), this may signal a memory leak in the component
on the stack, or excessive temporary allocations.
Also, if `PartitionAllocDirectMap()` is on the stack, the memory allocation was
large. It may come from a large buffer, and potentially made worse by buffer
resizing. For instance, `std::vector` often double their size when out of
capacity. In which case, `reserve()`-ing the right size ahead of time may help.
### Excessive allocation size
Is the calling code expected to allocate more than 2GiB? Or it is an underflow
somewhere in the calling code?
### Virtual address space
On 32 bit systems, this is most likely to occur when overall memory usage is
high, or when the allocation size request is large. Is the calling code
allocating a very large buffer?
## Debugging
### General
On Windows, the allocation size is added into the exception record. In Google
Chrome's crash dashboard, this is shown in "Parameter[0]" of the exception
info. On other operating systems, the allocation size if put on the stack before
crashing, and thus visible in minidumps.
### PartitionAlloc and Google specific
1. Starting from a specific report, click on the bug icon to start a cloud lldb
instance
2. Locate the `PartitionRoot<true>::OutOfMemory()` frame on the stack, move to it with `f 5`
3. Locate the stack addresses by printing registers `re re`
4. Show the stack content with `x <stack_pointer> <frame pointer>`
Below is an example for a crash on x86_64:
```
( lizeb ) bt
* thread #1, stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x10c45912f)
* frame #0: 0x000000010c45912f Google Chrome Framework`base::internal::OnNoMemoryInternal(unsigned long) at memory.cc:62
frame #1: 0x000000010c459149 Google Chrome Framework`base::TerminateBecauseOutOfMemory(unsigned long) at memory.cc:69
frame #2: 0x000000010c4f39c6 Google Chrome Framework`OnNoMemory(unsigned long) at oom.cc:17
frame #3: 0x000000010d7e5794 Google Chrome Framework`WTF::PartitionsOutOfMemoryUsing2G(unsigned long) at partitions.cc:281
frame #4: 0x000000010d7e4d2c Google Chrome Framework`WTF::Partitions::HandleOutOfMemory(unsigned long) at partitions.cc:415
frame #5: 0x000000010c4f7474 Google Chrome Framework`base::PartitionRoot<true>::OutOfMemory(unsigned long) at partition_root.cc:521
[...]
( lizeb ) f 5
frame #5: 0x000000010c4f7474 Google Chrome Framework`base::PartitionRoot<true>::OutOfMemory(unsigned long) at partition_root.cc:521
( lizeb ) re re
General Purpose Registers:
rbp = 0x00007ffee7012c50
rsp = 0x00007ffee7012bf0
rip = 0x000000010c4f7474 Google Chrome Framework`base::PartitionRoot<true>::OutOfMemory(unsigned long) + 196 at partition_root.cc:522
21 registers were unavailable.
( lizeb ) x 0x00007ffee7012bf0 0x00007ffee7012c50
0x7ffee7012bf0: 76 61 5f 73 69 7a 65 00 00 00 00 07 00 00 00 00 va_size.........
0x7ffee7012c00: 61 6c 6c 6f 63 00 20 20 00 2d 2d 01 00 00 00 00 alloc. .--.....
0x7ffee7012c10: 63 6f 6d 6d 69 74 00 20 00 a0 9d 01 00 00 00 00 commit. ........
0x7ffee7012c20: 73 69 7a 65 00 20 20 20 00 00 20 00 00 00 00 00 size. .. .....
0x7ffee7012c30: aa aa aa aa aa aa aa aa 00 18 b0 12 01 00 00 00 ................
0x7ffee7012c40: 00 00 20 00 00 00 00 00 48 22 b0 12 01 00 00 00 .. .....H"......
```
The results here can help the PartitionAlloc team to identify issues, as
important metrics from PartitionAlloc are saved above. For instance virtual
address space usage is (in little endian) 0x70000000.