chromium/third_party/blink/renderer/platform/fonts/LocaleInFonts.md

# Locale uses in Fonts

This document summarizes how locales are used in fonts.

[TOC]

## Parsing the Language Tag

The [lang attribute] spec defines to use [BCP 47] as the language tag.
Since ICU Locale class is not fully compatbile with [BCP 47],
Blink uses its own simple parser
in [platform/text/LocaleToScriptMapping.cpp](../text/LocaleToScriptMapping.cpp).

`LayoutLocale::get()` parses [BCP 47] language tags
and provides methods related with fonts and layout.

`ComputedStyle::getFontDescription().locale()`
computes the [language of a node] as defined in the spec.
Note that this includes not only
the value of the `lang` attribute of the element and its ancestors,
but also the language of the document from the [content-language] header.
Refer to the [language of a node] spec for more details.

The [language of a node] could still be unknown.
`localeOrDefault()` gives you the default language in such case.
The default language is what Chrome uses for its UI,
which is passed to the renderer through `--lang` command line argument.
This could be the same or different from the language of the platform.

Note that `ComputedStyle::locale()` is an `AtomicString`
for the style system to work without special casing,
while `FontDescription::locale()` is a pointer to `LayoutLocale`.

[lang attribute]: https://html.spec.whatwg.org/C/#the-lang-and-xml:lang-attributes
[BCP 47]: https://tools.ietf.org/html/bcp47
[language of a node]: https://html.spec.whatwg.org/C/#language
[content-language]: https://html.spec.whatwg.org/C/#pragma-set-default-language

## Generic Family

Users can configure their preferred fonts for [generic-family]
using the [Advanced Font Settings].
Blink has this settings in `GenericFontFamilySettings`.
In this class, each [generic-family] has a `ScriptFontFamilyMap`,
which is a map to fonts with `UScriptCode` as the key.

To look up the font to use for a [generic-family],
Blink uses the following prioritized list to determine the script.

1. The [language of a node] as defined in HTML, if known.
2. The default language.

This result is available at `ComputedStyle::getFontDescription().localeOrDefault().script()`.

[generic-family]: https://drafts.csswg.org/css-fonts/#generic-family-value
[Advanced Font Settings]: https://chrome.google.com/webstore/detail/advanced-font-settings/caclkomlalccbpcdllchkeecicepbmbm

## Installed Font Fallback

[CSS Fonts] defines a concept of [installed font fallback],
though its behavior is UA dependent.

As Blink tries to match the font fallback behavior
to the one in the platform,
the logic varies by platforms.
While the complete logic varies by platforms,
we try to share parts of the logic where possible.

[CSS Fonts]: https://drafts.csswg.org/css-fonts/
[installed font fallback]: https://drafts.csswg.org/css-fonts/#installed-font-fallback

### Emojis

If we've determined that a character is [emoji-default], also known as "emoji
in emoji" representation, we treat the character a bit differently. The goal is
to not only find a font that supports emojis, but also to prioritize color
emoji fonts over traditional monochrome fonts that happen to have the glyph.

On Android/Skia, Linux, and Windows, Blink will pass the special locale
`und-Zsye` to the operating system when looking for an emoji font. The [Zsye]
script tag is defined by UTS #51 as "prefer emoji style for characters that
have both text and emoji styles available", which is precisely what we need.

On Linux, Blink will additionally always use U+1F46A FAMILY (👪) when matching
potential candidates to increase the odds of finding the right emoji font, in
case the installed emoji font doesn't support the actual emoji in question.

[emoji-default]: https://unicode.org/reports/tr51/#Presentation_Style
[Zsye]: https://unicode.org/reports/tr51/#Emoji_Script

### Unified Han Ideographs

As seen in [CJK Unified Ideographs code charts] in Unicode,
glyphs of Han Ideographs vary by locales.

To render correct glyphs,
the installed font fallback uses the following prioritized list of locales.

1. The [language of a node] as defined in HTML, if known.
2. The list of languages the browser sends in the [Accept-Language] header.
3. The default language.
4. The system locale (Windows only.)

The prioritized list alone may not help the Unified Han Ideographs.
For instance, when the top of the list is "en-US",
it gives no clue to choose the correct font for the Unified Han Ideographs.
For this purpose,
`LayoutLocale::hasScriptForHan()` determines whether
the locale can choose the correct font for the Unified Han Ideographs or not.

When the installed font fallback needs to determine the font
for a Unified Han Ideograph,
it uses `scriptForHan()` of the first locale in the prioritized list
that has `hasScriptForHan()` true.

`scriptForHan()` may be different from `script()`,
in cases such as "en-JP", which indicates an English user in Japan.
Such locale is not major but is not rare either.
Some organizations are known to require to use English versions of the OS,
but their region is not US.

The `script()` of "en-JP" is Latin for the [generic-family] to work correctly,
but its `scriptForHan()` can indicate that
the user prefers Japanese variants of glyphs for the Unified Han Ideographs.

There are multiple cases where such locale can appear in the list:

* A site can use such language tag in HTML or in the [content-language] header
when its UI is in English,
but knows that the user is in Japan,
either by IP address, user preferences of the logged on user,
or any other methods.
* The system (e.g., Windows) can produce such language tag
when its language and region are set differently.

This algorithm is currently used in Windows and Linux.
Android, before N, does not have the language settings and thus
unable to provide the [Accept-Language] list for this algorithm to consume,
but Android N Preview supports multi-locale
and the work to feed the list from OS to the [Accept-Language] list is going on.
Mac relies on Core Graphics to do the job.

The prioritized list is not consistent across platforms today
and this is being addressed.

[CJK Unified Ideographs code charts]: http://unicode.org/charts/PDF/U4E00.pdf
[Accept-Language]: https://tools.ietf.org/html/rfc7231#section-5.3.5