README.md | Explore in Territory

# Image Capture API

This folder contains the implementation of the [W3C Image Capture API].
Image Capture was shipped in Chrome M59; please consult the
[Implementation Status] if you think a feature should be available and isn't.

[W3C Image Capture API]: https://w3c.github.io/mediacapture-image/
[Implementation Status]: https://github.com/w3c/mediacapture-image/blob/main/implementation-status.md

This API is structured around the [ImageCapture class] _and_ a number of
[extensions] to the `MediaStreamTrack` feeding it (let's call them
`theImageCapturer` and `theTrack`, respectively).

[ImageCapture class]: https://w3c.github.io/mediacapture-image/#imagecaptureapi
[extensions]: https://w3c.github.io/mediacapture-image/#extensions


## API Mechanics

### `takePhoto()` and `grabFrame()`

*   `takePhoto()` returns the result of a single photographic exposure as a
    `Blob` which can be downloaded, stored by the browser or displayed in an
    `img` element. This method uses the highest available photographic camera
    resolution.

*   `grabFrame()` returns a snapshot of the live video in `theTrack` as an
    `ImageBitmap` object  which could (for example) be drawn on a `canvas` and
    then post-processed to selectively change color values. Note that the
    `ImageBitmap` will only have the resolution of the video track — which
    will generally be lower than the camera's still-image resolution.

(_Adapted from the [blog post](https://developer.chrome.com/blog/imagecapture/)_)


### Photo settings and capabilities

The photo-specific options and settings are associated to `theImageCapturer` or
`theTrack` depending on whether a given capability/setting has an immediately
recognisable effect on `theTrack`, in other words if it's "live" or not. For
example, changing the zoom level is instantly reflected on the `theTrack`,
while enabling red eye reduction, if available, is not.

| Object                   |Type                 | Example                                 |
|:------------------------ |:------------------- | ---------------------------------------:|
|[`PhotoCapabilities`]     |non-live capabilities|`theImageCapturer.getPhotoCapabilities()`|
|[`MediaTrackCapabilities`]|live capabilities    |`theTrack.getCapabilities()`             |
|                          |                     |                                         |
|[`PhotoSettings`]         |non-live settings    |`theImageCapturer.takePhoto(photoSettings)`|
|[`MediaTrackSettings`]    |live settings        |`theTrack.getSettings()`                 |

[`PhotoCapabilities`]: https://w3c.github.io/mediacapture-image/#photocapabilities-section
[`MediaTrackCapabilities`]: https://w3c.github.io/mediacapture-image/#mediatrackcapabilities-section
[`PhotoSettings`]: https://w3c.github.io/mediacapture-image/#photosettings-section
[`MediaTrackSettings`]: https://w3c.github.io/mediacapture-image/#mediatracksettings-section

## Other topics

### Are `takePhoto()` and `grabFrame()` the same?

These methods would not produce the same results as explained in
[this issue comment](
https://bugs.chromium.org/p/chromium/issues/detail?id=655107#c8):


>  Let me reconstruct the conversion steps each image goes through in CrOs/Linux;
>  [...]
>
>  a) Live video capture produces frames via `V4L2CaptureDelegate::DoCapture()` [1].
>  The original data (from the WebCam) comes in either YUY2 (a 4:2:2 format) or
>  MJPEG, depending if the capture is smaller than 1280x720p or not, respectively.

>  b) This `V4L2CaptureDelegate` sends the capture frame to a conversion stage to
>  I420 [2].  I420 is a 4:2:0 format, so it has lost some information
>  irretrievably.  This I420 format is the one used for transporting video frames
>  to the rendered.

>  c) This I420 is the input to `grabFrame()`, which produces a JS ImageBitmap,
>  unencoded, after converting the I420 into RGBA [3] of the appropriate endian-ness.

> What happens to `takePhoto()`? It takes the data from the Webcam in a) and
> either returns a JPEG Blob [4] or converts the YUY2 [5] and encodes it to PNG
>  using the default compression value (6 in a 0-10 scale IIRC) [6].

>  IOW:

```
  - for smaller video resolutions:

  OS -> YUY2 ---> I420 --> RGBA --> ImageBitmap     grabFrame()
             |
             +--> RGBA --> PNG ---> Blob            takePhoto()

  - and for larger video resolutions:

  OS -> MJPEG ---> I420 --> RGBA --> ImageBitmap    grabFrame()
              |
              +--> JPG ------------> Blob           takePhoto()
```


> Where every conversion to-I420 loses information and so does the encoding to
> PNG.  Even a conversion `RGBA --> I420 --> RGBA` would not produce the original
> image.  (Plus, when you show `ImageBitmap` and/or Blob on an `<img>` or `<canvas>`
> there are more stages of decoding and even colour correction involved!)

> With all that, I'm not surprised at all that the images are not pixel
> accurate!  :-)


### Why are `PhotoCapabilities.fillLightMode` and `MediaTrackCapabilities.torch` separated?

Because they are different things: `torch` means flash constantly on/off whereas
`fillLightMode` means flash always-on/always-off/auto _when taking a
photographic exposure_.

`torch` lives in `theTrack` because the effect can be seen "live" on it,
whereas `fillLightMode` lives in `theImageCapture` object because the effect
of modifying it can only be seen after taking a picture.



## Testing

Image Capture web tests are located in [web_tests/external/mediacapture-image].

[web_tests/external/mediacapture-image]: https://chromium.googlesource.com/chromium/src/+/main/third_party/blink/web_tests/external/wpt/mediacapture-image/
chromium/third_party/blink/renderer/modules/imagecapture/README.md