linux/Documentation/block/writeback_cache_control.rst

==========================================
Explicit volatile write back cache control
==========================================

Introduction
------------

Many storage devices, especially in the consumer market, come with volatile
write back caches.  That means the devices signal I/O completion to the
operating system before data actually has hit the non-volatile storage.  This
behavior obviously speeds up various workloads, but it means the operating
system needs to force data out to the non-volatile storage when it performs
a data integrity operation like fsync, sync or an unmount.

The Linux block layer provides two simple mechanisms that let filesystems
control the caching behavior of the storage device.  These mechanisms are
a forced cache flush, and the Force Unit Access (FUA) flag for requests.


Explicit cache flushes
----------------------

The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
the filesystem and will make sure the volatile cache of the storage device
has been flushed before the actual I/O operation is started.  This explicitly
guarantees that previously completed write requests are on non-volatile
storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
set on an otherwise empty bio structure, which causes only an explicit cache
flush without any dependent I/O.  It is recommend to use
the blkdev_issue_flush() helper for a pure cache flush.


Forced Unit Access
------------------

The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
filesystem and will make sure that I/O completion for this request is only
signaled after the data has been committed to non-volatile storage.


Implementation details for filesystems
--------------------------------------

Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
worry if the underlying devices need any explicit cache flushing and how
the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
may both be set on a single bio.

Feature settings for block drivers
----------------------------------

For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_PREFLUSH requests before
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
requests that have a payload.

For devices with volatile write caches the driver needs to tell the block layer
that it supports flushing caches by setting the

   BLK_FEAT_WRITE_CACHE

flag in the queue_limits feature field.  For devices that also support the FUA
bit the block layer needs to be told to pass on the REQ_FUA bit by also setting
the

   BLK_FEAT_FUA

flag in the features field of the queue_limits structure.

Implementation details for bio based block drivers
--------------------------------------------------

For bio based drivers the REQ_PREFLUSH and REQ_FUA bit are simply passed on to
the driver if the driver sets the BLK_FEAT_WRITE_CACHE flag and the driver
needs to handle them.

*NOTE*: The REQ_FUA bit also gets passed on when the BLK_FEAT_FUA flags is
_not_ set.  Any bio based driver that sets BLK_FEAT_WRITE_CACHE also needs to
handle REQ_FUA.

For remapping drivers the REQ_FUA bits need to be propagated to underlying
devices, and a global flush needs to be implemented for bios with the
REQ_PREFLUSH bit set.

Implementation details for blk-mq drivers
-----------------------------------------

When the BLK_FEAT_WRITE_CACHE flag is set, REQ_OP_WRITE | REQ_PREFLUSH requests
with a payload are automatically turned into a sequence of a REQ_OP_FLUSH
request followed by the actual write by the block layer.

When the BLK_FEAT_FUA flags is set, the REQ_FUA bit is simply passed on for the
REQ_OP_WRITE request, else a REQ_OP_FLUSH request is sent by the block layer
after the completion of the write request for bio submissions with the REQ_FUA
bit set.