# IndexedDB Data Path
This document is a quick overview of the Blink implementation of IndexedDB
read/write requests.
[TOC]
## Introduction
Chrome's IndexedDB implementation is logically split into two components.
* The Blink side, also called _the frontend_ in older code, implements the
interfaces in [the IndexedDB specification](https://w3c.github.io/IndexedDB/),
translates requests from Web applications into lower-level requests for the
IndexedDB backing stores, and performs a fair amount of error checking.
* The browser side, also called _the backend_ in older code, implements the
IndexedDB backing store, which executes the low-level requests coming from the
Blink side.
The two components are currently (Q4 2017) hosted in separate processes and
bridged by a couple of glue layers. As part of the OnionSoup 2.0 effort, we hope
to most of the backing store implementation in Blink, and remove the glue
layers.
The backing store implementation is built on top of two storage systems:
* [Blobs](https://developer.mozilla.org/en-US/docs/Web/API/Blob), managed by
[the Blob system](https://chromium.googlesource.com/chromium/src/+/main/storage/browser/blob/README.md),
are stored as individual files in a per-origin directory. Blobs are
specifically designed for storing large amounts of data.
* [LevelDB](https://github.com/google/leveldb) is a key-value store optimized
for small keys (10s-100s of bytes) and fairly small values (10s-1000s of
bytes). Chrome creates a per-origin LevelDB database that holds the data for
all the origin's IndexedDB databases. The LevelDB database also holds
references to the Blobs stored in the Blob system.
## Value Serialization
Storing a JavaScript value in IndexedDB is specified at a high level in the
[HTML Structured Data
Specification](https://html.spec.whatwg.org/C/#serializable-objects).
Blink's implementation of the specification is responsible for converting
between [V8](https://developers.google.com/v8/) values and the byte sequences in
IndexedDB's backing store. The implementation is in `SerializedScriptValue`
(SSV), which delegates to `v8::ValueSerializer` and `v8::ValueDeserializer`. A
serialized value handled by the backing store is essentially a data buffer that
stores a sequence of bytes, and a list (technically, an ordered set) of Blobs.
While V8 drives the serialization process, Blink implements the serialization of
objects not covered by the JavaScript specification, such as `Blob` and
`ImageData`. This is accomplished by having V8 expose the interfaces
`v8::ValueSerializer::Delegate` and `v8::ValueDeserializer::Delegate`, which are
implemented by Blink. The canonical example methods of these interfaces are
`v8::ValueSerializer::Delegate::WriteHostObject()` and
`v8::ValueDeserializer::Delegate::ReadHostObject()`, which are used to
completely delegate the serialization of a V8 object to Blink.
Changes to the IndexedDB serialization format are delicate because our backing
store does not have any form of data migration. Once written to the backing
store, an IndexedDB value's format will never change. It follows that the
SerializedScriptValue implementation must be able to read serialized values
written by all previous versions of Chrome. To avoid data corruption, the SSV
implementation should also detect (and reject) serialized values written by
future Chrome versions, which can happen when a user downgrades the browser
(e.g., by switching channels from beta to stable) and [when serialization
changes are reverted](https://crbug.com/700603). For the reasons above,
technical debt introduced by unnecessary complexity in the serialization format
is much more difficult to pay than in most of the Chrome codebase.
*** aside
IndexedDB is not the sole user of the on-disk SSV format. In Chrome, SSV is also
currently (Q4 2017) used by the implementations for the
[Push API](https://developer.mozilla.org/en-US/docs/Web/API/Push_API) and the
[History API](https://developer.mozilla.org/en-US/docs/Web/API/History_API).
***
IndexedDB serialization changes must take the following subtleties into account:
* The SerializedScriptValue code is tightly coupled with
v8::ValueSerializer. For this reason, SSV should not host logic that might
later be moved to the browser process (e.g., to the IndexedDB backing store).
Such moves are bound to be difficult, because operating on V8 values (in the
manner required by the serialization specification) requires a V8 execution
context, which can only be hosted in a renderer process.
* The SerializedScriptValue API, which is synchronous, is incompatible with
reading Blobs (or any sort of files), which must be done asynchronously. All
the information needed by SSV deserialization must be fetched before the
deserialization is invoked.
## Small Values
Small IndexedDB values (whose serialized size below 64KB) are stored directly in
the backing store.
### Write Path
All the IndexedDB write operations
([put](https://developer.mozilla.org/en-US/docs/Web/API/IDBObjectStore/put),
[add](https://developer.mozilla.org/en-US/docs/Web/API/IDBObjectStore/add), and
[update](https://developer.mozilla.org/en-US/docs/Web/API/IDBCursor/update)) are
currently (Q4 2017) routed through an `IDBObjectStore::put` overload.
All IndexedDB requests, including read/write operations, are translated by the
Blink side into lower-level requests, then sent via
[Mojo](https://chromium.googlesource.com/chromium/src/+/main/mojo/README.md)
IPC to the browser process, where they are executed by the backing store. Most
of the data associated with an IndexedDB write operation is transferred from the
renderer to the browser using one Mojo call, and is therefore subject to the
Mojo message limit. Blobs are an exception, as they are transferred to the
browser process by the [Blob
subsystem](https://chromium.googlesource.com/chromium/src/+/main/storage/browser/blob/README.md).
![IDB Write Path](./idb_data_flow_write.svg)
*** aside
Images in this document embed the data needed for editing using
[draw.io](https://github.com/jgraph/drawio).
***
### Read Path
The Web platform has a simple, [synchronous API for creating a
Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob), which can be
used in one line of code. Conversely, [reading a Blob's
content](https://developer.mozilla.org/en-US/docs/Web/API/FileReader) is an
asynchronous process that requires creating an intermediate FileReader instance,
and setting up a handler for its [loadend
event](https://developer.mozilla.org/en-US/docs/Web/Events/loadend). This is not
an accident. When a Blob is constructed, all the information needed to build its
content is available in the renderer calling the constructor. Once constructed,
a Blob instance only stores a handle to the content -- for example, most Blobs
in Chrome point to on-disk files. This is the core reason behind the significant
complexity gap between IndexedDB value wrapping (write-side changes) and
unwrapping (read-side changes).
An IndexedDB read operation, like
[IDBObjectStore.get](https://developer.mozilla.org/en-US/docs/Web/API/IDBObjectStore/get),
creates an
[IDBRequest](https://developer.mozilla.org/en-US/docs/Web/API/IDBRequest) that
tracks the status of the operation. Blink's `IDBRequest` implementation creates
a `WebIDBCallbacks` instance, and passes the request and the WebIDBCallbacks to
[the browser-side IndexedDB
API](https://cs.chromium.org/chromium/src/content/browser/indexed_db/indexed_db_database.h).
The browser-side IndexedDB implementation executes requests from the Blink side
in a single-threaded loop, and relies on Mojo to queue incoming requests. The
[IndexedDB backing
store](https://cs.chromium.org/chromium/src/content/browser/indexed_db/indexed_db_backing_store.h)
retrieves the desired value(s). Each
[IndexedDBValue](https://cs.chromium.org/chromium/src/content/browser/indexed_db/indexed_db_value.h)
contains the SSV data (treated as an opaque sequence of bits, on the
browser-side) and a vector of Blob handles.
The result of each read operation is sent from the browser process to the
renderer process via a callback (a Mojo call to an interface associated with the
database receiving the request). In the renderer process, the result is
converted to a `WebIDBValue` and passed to the `WebIDBCallbacks` instance, which
further passses it on to the corresponding IDBRequest. The IDBRequest updates
the Blink-side IndexedDB state, attaches the `IDBValue` result to the
IDBRequest, creates a DOM event representing the result and queues the event.
![IDB Read Path](./idb_data_flow_read.svg)
*** aside
This description glosses over a couple of layers that will hopefully be
eliminated or merged in the not-too-distant future.
***
The Blink-side result processing has a few subtleties that are relevant to this
design.
* [The IndexedDB specification](https://w3c.github.io/IndexedDB/) states that
IDBRequests within the same transaction must be executed in the order in which
they are created, and the events indicating their success / failure must be
delivered according to the same order. Chrome's implementation relies on the
following to meet the ordering demands:
- IDBRequests are turned into Mojo calls to the browser process
synchronously, when they are created. All the calls for a transaction are
made to the same database interface, so Mojo guarantees that they're
ordered.
- On the browser side, all requests are processed on the same thread, and
hop through threads in exactly the same way, so the requests ordering is
preserved.
- Results (IndexedDBValue → Value → WebIDBValue) are passed to the
browser process via Callbacks interfaces associated with the database
interface, so Mojo guarantees that the calls go over the same Mojo pipe,
and therefore are ordered.
- Each result is processed and turned into a DOM event synchronously, so DOM
events for a transaction are queued up in the same order as the results
received from the browser.
* The IDBValue attached to an IDBRequest is lazily de-serialized when the Web
application reads the IDBRequest's [result
property](https://developer.mozilla.org/en-US/docs/Web/API/IDBRequest/result)
for the first time, which (for most applications) happens in the IDBRequest
success event handler. The SSV deserialization logic is invoked at that point,
so SSVs must be deserialized synchronously.
* The `ExecutionContext` used to dispatch DOM events may be suspended, which
happens when the user creates a JavaScript breakpoint in
[DevTools](https://developers.google.com/web/tools/chrome-devtools/), and the
breakpoint is hit. At the time of this writing (Q4 2017), each Blink feature
deals with suspended execution contexts individually. In most cases (think
input events), the simple strategy of dropping the events on the floor is
acceptable. Unfortunately, this is not acceptable for IndexedDB (the
specification demands that each request gets a result or an error), so
IDBRequest events must be queued and dispatched in-order when the
ExecutionContext is resumed.
*** aside
At this time (Q4 2017), [IndexedDB events are not queued up correctly when the
context is suspended](https://crbug.com/732524).
***
## Large Values
Blink wraps large IndexedDB values in Blobs before sending them to the browser's
LevelDB-based backing store. The large value threshold (serialized value size at
or above 64KB, as of Q4 2017) takes the following factors into account:
* Storing large values in LevelDB would result in large internal data structures
(SSTable blocks), which can impact the efficiency and memory consumption of
database operations (especially of compaction). For example, large SSTable
blocks led to browser OOMs in [this P0 issue](https://crbug.com/702787).
When small values are stored in LevelDB, the default SSTable block size is
32KB.
* The Mojo message limit is currently (Q4 2017) Web-exposed as an IndexedDB
limit, because each write request is sent as a single Mojo call.
* Value wrapping is currently (Q4 2017) implemented entirely inside Blink. While
this approach reduces the amount of code running in the browser, it also adds
a full IPC round-trip of latency to reads. The extra latency is less
significant (as a proportion) when reading large values. Furthermore, the
system was designed to make it easy to push value-wrapping into the browser
process, if this becomes desirable in the future.
Blobs that contain SSV data use the MIME type
[application/vnd.blink-idb-value-wrapper](https://www.iana.org/assignments/media-types/application/vnd.blink-idb-value-wrapper).
In order to be as user-friendly as possible (for the unlikely event that a
developer is exposed to a Blob wrapping an SSV data buffer), the MIME type was
chosen to be easily searchable and fairly self-explanatory, and was registered
with IANA.
### Write Path
`IDBValueWrapper` contains all the logic for serializing an IndexedDB value via
`SerializedScriptValue`. `IDBObjectStore::put` passes the V8 value into
IDBValueWrapper, and gets back the SSV data that is passed to the browser-side
IndexedDB implementation. When given a large IndexedDB value, `IDBValueWrapper`
creates a Blob that holds the serialized value, and stores a reference to that
Blob in the IndexedDB backing store.
![IDB Write Path for Large Values](./idb_data_flow_write_wrapping.svg)
### Read Path for Blobs in Small Values
Large IndexedDB values are unwrapped in Blink using a fairly close emulation to
the process used by a Web application to read the contents of a Blob stored
inside an IndexedDB value, so it is instructive to understand what happens in
that case.
1. The Web application's JavaScript (most likely, the IDBRequest success event
handler) extracts a
[Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob) from the
request's
[result](https://developer.mozilla.org/en-US/docs/Web/API/IDBRequest/result).
The Blob instance only stores metadata about the Blob's content, represented
as a `blink::BlobDataHandle`.
2. The Web application creates a
[FileReader](https://developer.mozilla.org/en-US/docs/Web/API/FileReader),
and calls one of its read methods, most likely
[readAsArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsArrayBuffer).
Blink's `FileReader` implementation uses a `FileReaderLoader` to retrieve the
Blob's content from
[the Blob system](https://chromium.googlesource.com/chromium/src/+/main/storage/browser/blob/README.md)
in the browser process.
3. When the Blob's contents is completely transferred to the renderer process,
FileReaderLoader's `DidFinishLoading` is called, which eventually causes the
FileReader to queue an
[onload event](https://developer.mozilla.org/en-US/docs/Web/API/FileReader/onload).
4. The Web application's onload event handler retrieves the Blob data from the
FileReader's result property.
![IDB Read Path with App-Read Blobs](./idb_data_flow_read_webapp_blobs.svg)
### Read Path for Large Values
The IndexedDB read path uses classes below to detect and unwrap Blob-wrapped
IDBValues. Reading Blob contents must be asynchronous, because Blobs can be
disk-backed. In fact, all Blobs coming from IndexedDB are currently (Q4 2017)
disk-backed.
* `IDBValueUnwrapper` knows how to decode the serialization format used by
wrapped data markers. It can tell whether an IDBValue contains a wrapped
data marker and, if so, it can extract a BlobDataHandler pointing to the
Blob that contains the wrapped SSV data.
* `IDBRequestLoader` coordinates a FileReaderLoader and an IDBValueUnwrapper to
map an array of IDBValues that may contain wrapped SSV data into IDBValues
that are guaranteed to be unwrapped. IDBRequestLoader operates on an array of
values because some requests, like
[IDBObjectStore::getAll](https://developer.mozilla.org/en-US/docs/Web/API/IDBObjectStore/getAll)
return an array of results. Single-result requests, like
[IDBObjectStore::get](https://developer.mozilla.org/en-US/docs/Web/API/IDBObjectStore/get)
are handled by wrapping the result in a one-element array.
* `IDBRequestQueueItem` holds on to an IDBRequest for which Blink has received
an IDBValue from the browser process, but hasn't queued up a corresponding
event in the DOMWindow event queue.
IDBValue unwrapping relies on the following data in existing IndexedDB objects.
* Each `IDBTransaction` owns a queue of IDBRequestQueueItems, where the queue
ordering reflects the order in which the requests were issued by the Web
application.
* `IDBRequest` exposes `HandleResponse` methods (overloaded to account for
different response types), in addition to `EnqueueResponse` methods.
WebIDBCallbacks calls into a HandleResponse method, which handles SSV
unwrapping and queueing. EnqueueResponse is responsible for updating the
IDBRequest's status (e.g., its result property) and enqueueing a DOM event in
the appropriate queue.
Reading large values follows a slightly more complex process than reading small
values. For simplicity, we describe the single-IDBValue case. Extending the
logic to an IDBValue array is fairly straightforward.
1. When a WebIDBCallbacks instance receives the result of an IndexedDB
operation from the browser-side implementation, it passes the result's
IDBValue to a HandleResponse overload on its associated IDBRequest.
2. HandleResponse asks IDBValueUnwrapper if the IDBValue's SSV data is wrapped
in a Blob.
* Fast path: If the IDBValue's SSV data is not wrapped, and the
IDBTransaction associated with the request doesn't have any queued result,
an EnqueueResponse overload is called.
* Slow path: An IDBRequestQueueItem is created for the IDBRequest and added
to the IDBTransaction's result queue.
3. If the IDBValue's SSV data is wrapped in a Blob, an IDBRequestLoader instance
is created and associated with the newly created IDBRequestQueueItem. The
IDBRequestLoader is given the IDBValue that needs to be unwrapped.
4. If an IDBRequestLoader was created above, the loading process is started.
The IDBRequestLoader uses IDBValueUnwrapper to obtain a reference to the Blob
that contains the SSV data, and then uses an embedded FileReader instance to
fetch the Blob's contents from the browser process.
5. When an IDBRequestLoader finishes retrieving the Blob's contents, it marks
the IDBRequestQueueItem as ready, and notifies the IDBTransaction that an
item in the result queue has become ready.
6. When the head item in an IDBTransaction's result queue is ready, it is
removed from the queue, and an EnqueueResult overload is called on the
IDBRequest associated with the IDBRequestQueueItem.
![IDB Read Path with Large Values](./idb_data_flow_read_unwrapping.svg)