llm-security-guidelines.md | Explore in Territory

# Security Guidelines for LLMs and other large models in Chrome

Large language models (LLMs), generative artificial intelligence (GenAI) models,
and other large machine learning (ML) models will find uses in Chromium and the
web. We will refer to all of these as _models_. This document outlines some
guidelines to help safely implement features using large models.

Our main security goals are to prevent arbitrary code execution, and prevent
user information disclosure between origins. It is not possible to prevent
people using Chrome from seeing model weights or predictions as this is not
feasible on the client devices where Chrome runs.

# Memory Safety

Models are, abstractly, layers of mathematical operations that mix inputs from
trustworthy and untrustworthy sources and produce output that will be used
elsewhere in Chrome. In practice these models are implemented in memory-unsafe
languages and may include convenience functions to parse complex data formats as
part of their pipelines. They should be treated the same way as other
memory-unsafe code implementing a feature in Chrome to comply with the
[rule-of-2](rule-of-2.md). Models processing untrustworthy complex data must be
sandboxed, and data should be provided using safe types.

## Complex formats

Models processing complex data -- such as images, audio or video -- could be
implemented using format helpers in their pipelines. To ensure memory safety any
parsing of complex formats should happen in a sandboxed, site-isolated process.
Either by sandboxing the model, or by parsing complex formats into accepted safe
formats before sending them to the process hosting the model.

### Exception - Tokenization

Where the only function of the model is to tokenize a string of text before
performing inference to produce an output this is not considered to be complex
processing.

## Untrustworthy input -> untrustworthy output

If an attacker can control any input to a model it must be assumed that they can
control all of its output. Models cannot be used to sanitize data, and their
output must be treated as untrustworthy content with an untrustworthy format.

Model output will either need to be parsed in a sandboxed process, or limited to
only outputting safe types (e.g. an array of floats).

## Mitigations

Models exposed to untrustworthy input can reduce the risk of exposing memory
safety flaws.

  * Use a tight sandbox
  * Provide model inputs over safe mojo types
  * Validate the size and format of input
  * Use a pipeline that only tokenizes then performs inference
  * Ensure input is in the same format as training data
  * Disable custom ops that might parse complex formatted data
  * Limit the size of the model output
  * Fuzz exposed APIs

# Side-Channels

Large models will necessarily be reused for several purposes. Where this happens
it is important that appropriate sessionization is used. It is likely that side
channels will exist that could leak some information about previous inputs.

# Model APIs

Models themselves are complex formats that represent complex graphs of
computation. APIs that allow web sites to specify and run models should be
designed so that these graphs and model inputs can be provided safely. Model
hosting should be managed by a trusted process to ensure only the right set of
operations can be reached by an untrustworthy model.

If a model's provenance can be verified (such as with Chrome's Component
Updater) then we can assume it is as safe as other Chrome code. This means that
where it runs is determined by what the model does, and the safety of the data
it consumes. Googlers should refer to internal guidelines for approved delivery
mechanisms in Chrome (go/tf-security-in-chrome,
go/chrome-genai-security-prompts).

# Other safety considerations

Models can output very convincing text. They may be used to summarize important
information (e.g. translating a legal form), or to produce writing for people
using Chrome (e.g. a letter to a bank). Models can produce incorrect output even
if they are not being deliberately steered to do so. People using Chrome should
have obvious indications that model output is being used, information about the
source of its inputs, and opportunity to review any text generated on their
behalf before it is submitted to a third party.

Models may output inappropriate material and where possible their output should
be filtered using reasonable safety filters and people should have mechanisms to
report and improve model outputs.

Model weights trained from on-device data may embody information about a person
using Chrome and should be treated like other sensitive data.
chromium/docs/security/llm-security-guidelines.md