dedupe.c | Explore in Territory

// SPDX-License-Identifier: GPL-2.0-only
/*
 * Copyright 2023 Red Hat
 */

/**
 * DOC:
 *
 * Hash Locks:
 *
 * A hash_lock controls and coordinates writing, index access, and dedupe among groups of data_vios
 * concurrently writing identical blocks, allowing them to deduplicate not only against advice but
 * also against each other. This saves on index queries and allows those data_vios to concurrently
 * deduplicate against a single block instead of being serialized through a PBN read lock. Only one
 * index query is needed for each hash_lock, instead of one for every data_vio.
 *
 * Hash_locks are assigned to hash_zones by computing a modulus on the hash itself. Each hash_zone
 * has a single dedicated queue and thread for performing all operations on the hash_locks assigned
 * to that zone. The concurrency guarantees of this single-threaded model allow the code to omit
 * more fine-grained locking for the hash_lock structures.
 *
 * A hash_lock acts like a state machine perhaps more than as a lock. Other than the starting and
 * ending states INITIALIZING and BYPASSING, every state represents and is held for the duration of
 * an asynchronous operation. All state transitions are performed on the thread of the hash_zone
 * containing the lock. An asynchronous operation is almost always performed upon entering a state,
 * and the callback from that operation triggers exiting the state and entering a new state.
 *
 * In all states except DEDUPING, there is a single data_vio, called the lock agent, performing the
 * asynchronous operations on behalf of the lock. The agent will change during the lifetime of the
 * lock if the lock is shared by more than one data_vio. data_vios waiting to deduplicate are kept
 * on a wait queue. Viewed a different way, the agent holds the lock exclusively until the lock
 * enters the DEDUPING state, at which point it becomes a shared lock that all the waiters (and any
 * new data_vios that arrive) use to share a PBN lock. In state DEDUPING, there is no agent. When
 * the last data_vio in the lock calls back in DEDUPING, it becomes the agent and the lock becomes
 * exclusive again. New data_vios that arrive in the lock will also go on the wait queue.
 *
 * The existence of lock waiters is a key factor controlling which state the lock transitions to
 * next. When the lock is new or has waiters, it will always try to reach DEDUPING, and when it
 * doesn't, it will try to clean up and exit.
 *
 * Deduping requires holding a PBN lock on a block that is known to contain data identical to the
 * data_vios in the lock, so the lock will send the agent to the duplicate zone to acquire the PBN
 * lock (LOCKING), to the kernel I/O threads to read and verify the data (VERIFYING), or to write a
 * new copy of the data to a full data block or a slot in a compressed block (WRITING).
 *
 * Cleaning up consists of updating the index when the data location is different from the initial
 * index query (UPDATING, triggered by stale advice, compression, and rollover), releasing the PBN
 * lock on the duplicate block (UNLOCKING), and if the agent is the last data_vio referencing the
 * lock, releasing the hash_lock itself back to the hash zone (BYPASSING).
 *
 * The shortest sequence of states is for non-concurrent writes of new data:
 *   INITIALIZING -> QUERYING -> WRITING -> BYPASSING
 * This sequence is short because no PBN read lock or index update is needed.
 *
 * Non-concurrent, finding valid advice looks like this (endpoints elided):
 *   -> QUERYING -> LOCKING -> VERIFYING -> DEDUPING -> UNLOCKING ->
 * Or with stale advice (endpoints elided):
 *   -> QUERYING -> LOCKING -> VERIFYING -> UNLOCKING -> WRITING -> UPDATING ->
 *
 * When there are not enough available reference count increments available on a PBN for a data_vio
 * to deduplicate, a new lock is forked and the excess waiters roll over to the new lock (which
 * goes directly to WRITING). The new lock takes the place of the old lock in the lock map so new
 * data_vios will be directed to it. The two locks will proceed independently, but only the new
 * lock will have the right to update the index (unless it also forks).
 *
 * Since rollover happens in a lock instance, once a valid data location has been selected, it will
 * not change. QUERYING and WRITING are only performed once per lock lifetime. All other
 * non-endpoint states can be re-entered.
 *
 * The function names in this module follow a convention referencing the states and transitions in
 * the state machine. For example, for the LOCKING state, there are start_locking() and
 * finish_locking() functions.  start_locking() is invoked by the finish function of the state (or
 * states) that transition to LOCKING. It performs the actual lock state change and must be invoked
 * on the hash zone thread.  finish_locking() is called by (or continued via callback from) the
 * code actually obtaining the lock. It does any bookkeeping or decision-making required and
 * invokes the appropriate start function of the state being transitioned to after LOCKING.
 *
 * ----------------------------------------------------------------------
 *
 * Index Queries:
 *
 * A query to the UDS index is handled asynchronously by the index's threads. When the query is
 * complete, a callback supplied with the query will be called from one of the those threads. Under
 * heavy system load, the index may be slower to respond than is desirable for reasonable I/O
 * throughput. Since deduplication of writes is not necessary for correct operation of a VDO
 * device, it is acceptable to timeout out slow index queries and proceed to fulfill a write
 * request without deduplicating. However, because the uds_request struct itself is supplied by the
 * caller, we can not simply reuse a uds_request object which we have chosen to timeout. Hence,
 * each hash_zone maintains a pool of dedupe_contexts which each contain a uds_request along with a
 * reference to the data_vio on behalf of which they are performing a query.
 *
 * When a hash_lock needs to query the index, it attempts to acquire an unused dedupe_context from
 * its hash_zone's pool. If one is available, that context is prepared, associated with the
 * hash_lock's agent, added to the list of pending contexts, and then sent to the index. The
 * context's state will be transitioned from DEDUPE_CONTEXT_IDLE to DEDUPE_CONTEXT_PENDING. If all
 * goes well, the dedupe callback will be called by the index which will change the context's state
 * to DEDUPE_CONTEXT_COMPLETE, and the associated data_vio will be enqueued to run back in the hash
 * zone where the query results will be processed and the context will be put back in the idle
 * state and returned to the hash_zone's available list.
 *
 * The first time an index query is launched from a given hash_zone, a timer is started. When the
 * timer fires, the hash_zone's completion is enqueued to run in the hash_zone where the zone's
 * pending list will be searched for any contexts in the pending state which have been running for
 * too long. Those contexts are transitioned to the DEDUPE_CONTEXT_TIMED_OUT state and moved to the
 * zone's timed_out list where they won't be examined again if there is a subsequent time out). The
 * data_vios associated with timed out contexts are sent to continue processing their write
 * operation without deduplicating. The timer is also restarted.
 *
 * When the dedupe callback is run for a context which is in the timed out state, that context is
 * moved to the DEDUPE_CONTEXT_TIMED_OUT_COMPLETE state. No other action need be taken as the
 * associated data_vios have already been dispatched.
 *
 * If a hash_lock needs a dedupe context, and the available list is empty, the timed_out list will
 * be searched for any contexts which are timed out and complete. One of these will be used
 * immediately, and the rest will be returned to the available list and marked idle.
 */

#include "dedupe.h"

#include <linux/atomic.h>
#include <linux/jiffies.h>
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/ratelimit.h>
#include <linux/spinlock.h>
#include <linux/timer.h>

#include "logger.h"
#include "memory-alloc.h"
#include "numeric.h"
#include "permassert.h"
#include "string-utils.h"

#include "indexer.h"

#include "action-manager.h"
#include "admin-state.h"
#include "completion.h"
#include "constants.h"
#include "data-vio.h"
#include "int-map.h"
#include "io-submitter.h"
#include "packer.h"
#include "physical-zone.h"
#include "slab-depot.h"
#include "statistics.h"
#include "types.h"
#include "vdo.h"
#include "wait-queue.h"

#define DEDUPE_QUERY_TIMER_IDLE …
#define DEDUPE_QUERY_TIMER_RUNNING …
#define DEDUPE_QUERY_TIMER_FIRED …

enum dedupe_context_state { … };

/* Possible index states: closed, opened, or transitioning between those two. */
enum index_state { … };

static const char *CLOSED = …;
static const char *CLOSING = …;
static const char *ERROR = …;
static const char *OFFLINE = …;
static const char *ONLINE = …;
static const char *OPENING = …;
static const char *SUSPENDED = …;
static const char *UNKNOWN = …;

/* Version 2 uses the kernel space UDS index and is limited to 16 bytes */
#define UDS_ADVICE_VERSION …
/* version byte + state byte + 64-bit little-endian PBN */
#define UDS_ADVICE_SIZE …

enum hash_lock_state { … };

static const char * const LOCK_STATE_NAMES[] = …;

struct hash_lock { … };

#define LOCK_POOL_CAPACITY …

struct hash_zones { … };

/* These are in milliseconds. */
unsigned int vdo_dedupe_index_timeout_interval = …;
unsigned int vdo_dedupe_index_min_timer_interval = …;
/* Same two variables, in jiffies for easier consumption. */
static u64 vdo_dedupe_index_timeout_jiffies;
static u64 vdo_dedupe_index_min_timer_jiffies;

static inline struct hash_zone *as_hash_zone(struct vdo_completion *completion)
{ … }

static inline struct hash_zones *as_hash_zones(struct vdo_completion *completion)
{ … }

static inline void assert_in_hash_zone(struct hash_zone *zone, const char *name)
{ … }

static inline bool change_context_state(struct dedupe_context *context, int old, int new)
{ … }

static inline bool change_timer_state(struct hash_zone *zone, int old, int new)
{ … }

/**
 * return_hash_lock_to_pool() - (Re)initialize a hash lock and return it to its pool.
 * @zone: The zone from which the lock was borrowed.
 * @lock: The lock that is no longer in use.
 */
static void return_hash_lock_to_pool(struct hash_zone *zone, struct hash_lock *lock)
{ … }

/**
 * vdo_get_duplicate_lock() - Get the PBN lock on the duplicate data location for a data_vio from
 *                            the hash_lock the data_vio holds (if there is one).
 * @data_vio: The data_vio to query.
 *
 * Return: The PBN lock on the data_vio's duplicate location.
 */
struct pbn_lock *vdo_get_duplicate_lock(struct data_vio *data_vio)
{ … }

/**
 * hash_lock_key() - Return hash_lock's record name as a hash code.
 * @lock: The hash lock.
 *
 * Return: The key to use for the int map.
 */
static inline u64 hash_lock_key(struct hash_lock *lock)
{ … }

/**
 * get_hash_lock_state_name() - Get the string representation of a hash lock state.
 * @state: The hash lock state.
 *
 * Return: The short string representing the state
 */
static const char *get_hash_lock_state_name(enum hash_lock_state state)
{ … }

/**
 * assert_hash_lock_agent() - Assert that a data_vio is the agent of its hash lock, and that this
 *                            is being called in the hash zone.
 * @data_vio: The data_vio expected to be the lock agent.
 * @where: A string describing the function making the assertion.
 */
static void assert_hash_lock_agent(struct data_vio *data_vio, const char *where)
{ … }

/**
 * set_duplicate_lock() - Set the duplicate lock held by a hash lock. May only be called in the
 *                        physical zone of the PBN lock.
 * @hash_lock: The hash lock to update.
 * @pbn_lock: The PBN read lock to use as the duplicate lock.
 */
static void set_duplicate_lock(struct hash_lock *hash_lock, struct pbn_lock *pbn_lock)
{ … }

/**
 * dequeue_lock_waiter() - Remove the first data_vio from the lock's waitq and return it.
 * @lock: The lock containing the wait queue.
 *
 * Return: The first (oldest) waiter in the queue, or NULL if the queue is empty.
 */
static inline struct data_vio *dequeue_lock_waiter(struct hash_lock *lock)
{ … }

/**
 * set_hash_lock() - Set, change, or clear the hash lock a data_vio is using.
 * @data_vio: The data_vio to update.
 * @new_lock: The hash lock the data_vio is joining.
 *
 * Updates the hash lock (or locks) to reflect the change in membership.
 */
static void set_hash_lock(struct data_vio *data_vio, struct hash_lock *new_lock)
{ … }

/* There are loops in the state diagram, so some forward decl's are needed. */
static void start_deduping(struct hash_lock *lock, struct data_vio *agent,
			   bool agent_is_done);
static void start_locking(struct hash_lock *lock, struct data_vio *agent);
static void start_writing(struct hash_lock *lock, struct data_vio *agent);
static void unlock_duplicate_pbn(struct vdo_completion *completion);
static void transfer_allocation_lock(struct data_vio *data_vio);

/**
 * exit_hash_lock() - Bottleneck for data_vios that have written or deduplicated and that are no
 *                    longer needed to be an agent for the hash lock.
 * @data_vio: The data_vio to complete and send to be cleaned up.
 */
static void exit_hash_lock(struct data_vio *data_vio)
{ … }

/**
 * set_duplicate_location() - Set the location of the duplicate block for data_vio, updating the
 *                            is_duplicate and duplicate fields from a zoned_pbn.
 * @data_vio: The data_vio to modify.
 * @source: The location of the duplicate.
 */
static void set_duplicate_location(struct data_vio *data_vio,
				   const struct zoned_pbn source)
{ … }

/**
 * retire_lock_agent() - Retire the active lock agent, replacing it with the first lock waiter, and
 *                       make the retired agent exit the hash lock.
 * @lock: The hash lock to update.
 *
 * Return: The new lock agent (which will be NULL if there was no waiter)
 */
static struct data_vio *retire_lock_agent(struct hash_lock *lock)
{ … }

/**
 * wait_on_hash_lock() - Add a data_vio to the lock's queue of waiters.
 * @lock: The hash lock on which to wait.
 * @data_vio: The data_vio to add to the queue.
 */
static void wait_on_hash_lock(struct hash_lock *lock, struct data_vio *data_vio)
{ … }

/**
 * abort_waiter() - waiter_callback_fn function that shunts waiters to write their blocks without
 *                  optimization.
 * @waiter: The data_vio's waiter link.
 * @context: Not used.
 */
static void abort_waiter(struct vdo_waiter *waiter, void *context __always_unused)
{ … }

/**
 * start_bypassing() - Stop using the hash lock.
 * @lock: The hash lock.
 * @agent: The data_vio acting as the agent for the lock.
 *
 * Stops using the hash lock. This is the final transition for hash locks which did not get an
 * error.
 */
static void start_bypassing(struct hash_lock *lock, struct data_vio *agent)
{ … }

void vdo_clean_failed_hash_lock(struct data_vio *data_vio)
{ … }

/**
 * finish_unlocking() - Handle the result of the agent for the lock releasing a read lock on
 *                      duplicate candidate.
 * @completion: The completion of the data_vio acting as the lock's agent.
 *
 * This continuation is registered in unlock_duplicate_pbn().
 */
static void finish_unlocking(struct vdo_completion *completion)
{ … }

/**
 * unlock_duplicate_pbn() - Release a read lock on the PBN of the block that may or may not have
 *                          contained duplicate data.
 * @completion: The completion of the data_vio acting as the lock's agent.
 *
 * This continuation is launched by start_unlocking(), and calls back to finish_unlocking() on the
 * hash zone thread.
 */
static void unlock_duplicate_pbn(struct vdo_completion *completion)
{ … }

/**
 * start_unlocking() - Release a read lock on the PBN of the block that may or may not have
 *                     contained duplicate data.
 * @lock: The hash lock.
 * @agent: The data_vio currently acting as the agent for the lock.
 */
static void start_unlocking(struct hash_lock *lock, struct data_vio *agent)
{ … }

static void release_context(struct dedupe_context *context)
{ … }

static void process_update_result(struct data_vio *agent)
{ … }

/**
 * finish_updating() - Process the result of a UDS update performed by the agent for the lock.
 * @completion: The completion of the data_vio that performed the update
 *
 * This continuation is registered in start_querying().
 */
static void finish_updating(struct vdo_completion *completion)
{ … }

static void query_index(struct data_vio *data_vio, enum uds_request_type operation);

/**
 * start_updating() - Continue deduplication with the last step, updating UDS with the location of
 *                    the duplicate that should be returned as advice in the future.
 * @lock: The hash lock.
 * @agent: The data_vio currently acting as the agent for the lock.
 */
static void start_updating(struct hash_lock *lock, struct data_vio *agent)
{ … }

/**
 * finish_deduping() - Handle a data_vio that has finished deduplicating against the block locked
 *                     by the hash lock.
 * @lock: The hash lock.
 * @data_vio: The lock holder that has finished deduplicating.
 *
 * If there are other data_vios still sharing the lock, this will just release the data_vio's share
 * of the lock and finish processing the data_vio. If this is the last data_vio holding the lock,
 * this makes the data_vio the lock agent and uses it to advance the state of the lock so it can
 * eventually be released.
 */
static void finish_deduping(struct hash_lock *lock, struct data_vio *data_vio)
{ … }

/**
 * acquire_lock() - Get the lock for a record name.
 * @zone: The zone responsible for the hash.
 * @hash: The hash to lock.
 * @replace_lock: If non-NULL, the lock already registered for the hash which should be replaced by
 *                the new lock.
 * @lock_ptr: A pointer to receive the hash lock.
 *
 * Gets the lock for the hash (record name) of the data in a data_vio, or if one does not exist (or
 * if we are explicitly rolling over), initialize a new lock for the hash and register it in the
 * zone. This must only be called in the correct thread for the zone.
 *
 * Return: VDO_SUCCESS or an error code.
 */
static int __must_check acquire_lock(struct hash_zone *zone,
				     const struct uds_record_name *hash,
				     struct hash_lock *replace_lock,
				     struct hash_lock **lock_ptr)
{ … }

/**
 * enter_forked_lock() - Bind the data_vio to a new hash lock.
 *
 * Implements waiter_callback_fn. Binds the data_vio that was waiting to a new hash lock and waits
 * on that lock.
 */
static void enter_forked_lock(struct vdo_waiter *waiter, void *context)
{ … }

/**
 * fork_hash_lock() - Fork a hash lock because it has run out of increments on the duplicate PBN.
 * @old_lock: The hash lock to fork.
 * @new_agent: The data_vio that will be the agent for the new lock.
 *
 * Transfers the new agent and any lock waiters to a new hash lock instance which takes the place
 * of the old lock in the lock map. The old lock remains active, but will not update advice.
 */
static void fork_hash_lock(struct hash_lock *old_lock, struct data_vio *new_agent)
{ … }

/**
 * launch_dedupe() - Reserve a reference count increment for a data_vio and launch it on the dedupe
 *                   path.
 * @lock: The hash lock.
 * @data_vio: The data_vio to deduplicate using the hash lock.
 * @has_claim: true if the data_vio already has claimed an increment from the duplicate lock.
 *
 * If no increments are available, this will roll over to a new hash lock and launch the data_vio
 * as the writing agent for that lock.
 */
static void launch_dedupe(struct hash_lock *lock, struct data_vio *data_vio,
			  bool has_claim)
{ … }

/**
 * start_deduping() - Enter the hash lock state where data_vios deduplicate in parallel against a
 *                    true copy of their data on disk.
 * @lock: The hash lock.
 * @agent: The data_vio acting as the agent for the lock.
 * @agent_is_done: true only if the agent has already written or deduplicated against its data.
 *
 * If the agent itself needs to deduplicate, an increment for it must already have been claimed
 * from the duplicate lock, ensuring the hash lock will still have a data_vio holding it.
 */
static void start_deduping(struct hash_lock *lock, struct data_vio *agent,
			   bool agent_is_done)
{ … }

/**
 * increment_stat() - Increment a statistic counter in a non-atomic yet thread-safe manner.
 * @stat: The statistic field to increment.
 */
static inline void increment_stat(u64 *stat)
{ … }

/**
 * finish_verifying() - Handle the result of the agent for the lock comparing its data to the
 *                      duplicate candidate.
 * @completion: The completion of the data_vio used to verify dedupe
 *
 * This continuation is registered in start_verifying().
 */
static void finish_verifying(struct vdo_completion *completion)
{ … }

static bool blocks_equal(char *block1, char *block2)
{ … }

static void verify_callback(struct vdo_completion *completion)
{ … }

static void uncompress_and_verify(struct vdo_completion *completion)
{ … }

static void verify_endio(struct bio *bio)
{ … }

/**
 * start_verifying() - Begin the data verification phase.
 * @lock: The hash lock (must be LOCKING).
 * @agent: The data_vio to use to read and compare candidate data.
 *
 * Continue the deduplication path for a hash lock by using the agent to read (and possibly
 * decompress) the data at the candidate duplicate location, comparing it to the data in the agent
 * to verify that the candidate is identical to all the data_vios sharing the hash. If so, it can
 * be deduplicated against, otherwise a data_vio allocation will have to be written to and used for
 * dedupe.
 */
static void start_verifying(struct hash_lock *lock, struct data_vio *agent)
{ … }

/**
 * finish_locking() - Handle the result of the agent for the lock attempting to obtain a PBN read
 *                    lock on the candidate duplicate block.
 * @completion: The completion of the data_vio that attempted to get the read lock.
 *
 * This continuation is registered in lock_duplicate_pbn().
 */
static void finish_locking(struct vdo_completion *completion)
{ … }

static bool acquire_provisional_reference(struct data_vio *agent, struct pbn_lock *lock,
					  struct slab_depot *depot)
{ … }

/**
 * lock_duplicate_pbn() - Acquire a read lock on the PBN of the block containing candidate
 *                        duplicate data (compressed or uncompressed).
 * @completion: The completion of the data_vio attempting to acquire the physical block lock on
 *              behalf of its hash lock.
 *
 * If the PBN is already locked for writing, the lock attempt is abandoned and is_duplicate will be
 * cleared before calling back. This continuation is launched from start_locking(), and calls back
 * to finish_locking() on the hash zone thread.
 */
static void lock_duplicate_pbn(struct vdo_completion *completion)
{ … }

/**
 * start_locking() - Continue deduplication for a hash lock that has obtained valid advice of a
 *                   potential duplicate through its agent.
 * @lock: The hash lock (currently must be QUERYING).
 * @agent: The data_vio bearing the dedupe advice.
 */
static void start_locking(struct hash_lock *lock, struct data_vio *agent)
{ … }

/**
 * finish_writing() - Re-entry point for the lock agent after it has finished writing or
 *                    compressing its copy of the data block.
 * @lock: The hash lock, which must be in state WRITING.
 * @agent: The data_vio that wrote its data for the lock.
 *
 * The agent will never need to dedupe against anything, so it's done with the lock, but the lock
 * may not be finished with it, as a UDS update might still be needed.
 *
 * If there are other lock holders, the agent will hand the job to one of them and exit, leaving
 * the lock to deduplicate against the just-written block. If there are no other lock holders, the
 * agent either exits (and later tears down the hash lock), or it remains the agent and updates
 * UDS.
 */
static void finish_writing(struct hash_lock *lock, struct data_vio *agent)
{ … }

/**
 * select_writing_agent() - Search through the lock waiters for a data_vio that has an allocation.
 * @lock: The hash lock to modify.
 *
 * If an allocation is found, swap agents, put the old agent at the head of the wait queue, then
 * return the new agent. Otherwise, just return the current agent.
 */
static struct data_vio *select_writing_agent(struct hash_lock *lock)
{ … }

/**
 * start_writing() - Begin the non-duplicate write path.
 * @lock: The hash lock (currently must be QUERYING).
 * @agent: The data_vio currently acting as the agent for the lock.
 *
 * Begins the non-duplicate write path for a hash lock that had no advice, selecting a data_vio
 * with an allocation as a new agent, if necessary, then resuming the agent on the data_vio write
 * path.
 */
static void start_writing(struct hash_lock *lock, struct data_vio *agent)
{ … }

/*
 * Decode VDO duplicate advice from the old_metadata field of a UDS request.
 * Returns true if valid advice was found and decoded
 */
static bool decode_uds_advice(struct dedupe_context *context)
{ … }

static void process_query_result(struct data_vio *agent)
{ … }

/**
 * finish_querying() - Process the result of a UDS query performed by the agent for the lock.
 * @completion: The completion of the data_vio that performed the query.
 *
 * This continuation is registered in start_querying().
 */
static void finish_querying(struct vdo_completion *completion)
{ … }

/**
 * start_querying() - Start deduplication for a hash lock.
 * @lock: The initialized hash lock.
 * @data_vio: The data_vio that has just obtained the new lock.
 *
 * Starts deduplication for a hash lock that has finished initializing by making the data_vio that
 * requested it the agent, entering the QUERYING state, and using the agent to perform the UDS
 * query on behalf of the lock.
 */
static void start_querying(struct hash_lock *lock, struct data_vio *data_vio)
{ … }

/**
 * report_bogus_lock_state() - Complain that a data_vio has entered a hash_lock that is in an
 *                             unimplemented or unusable state and continue the data_vio with an
 *                             error.
 * @lock: The hash lock.
 * @data_vio: The data_vio attempting to enter the lock.
 */
static void report_bogus_lock_state(struct hash_lock *lock, struct data_vio *data_vio)
{ … }

/**
 * vdo_continue_hash_lock() - Continue the processing state after writing, compressing, or
 *                            deduplicating.
 * @data_vio: The data_vio to continue processing in its hash lock.
 *
 * Asynchronously continue processing a data_vio in its hash lock after it has finished writing,
 * compressing, or deduplicating, so it can share the result with any data_vios waiting in the hash
 * lock, or update the UDS index, or simply release its share of the lock.
 *
 * Context: This must only be called in the correct thread for the hash zone.
 */
void vdo_continue_hash_lock(struct vdo_completion *completion)
{ … }

/**
 * is_hash_collision() - Check to see if a hash collision has occurred.
 * @lock: The lock to check.
 * @candidate: The data_vio seeking to share the lock.
 *
 * Check whether the data in data_vios sharing a lock is different than in a data_vio seeking to
 * share the lock, which should only be possible in the extremely unlikely case of a hash
 * collision.
 *
 * Return: true if the given data_vio must not share the lock because it doesn't have the same data
 *         as the lock holders.
 */
static bool is_hash_collision(struct hash_lock *lock, struct data_vio *candidate)
{ … }

static inline int assert_hash_lock_preconditions(const struct data_vio *data_vio)
{ … }

/**
 * vdo_acquire_hash_lock() - Acquire or share a lock on a record name.
 * @data_vio: The data_vio acquiring a lock on its record name.
 *
 * Acquire or share a lock on the hash (record name) of the data in a data_vio, updating the
 * data_vio to reference the lock. This must only be called in the correct thread for the zone. In
 * the unlikely case of a hash collision, this function will succeed, but the data_vio will not get
 * a lock reference.
 */
void vdo_acquire_hash_lock(struct vdo_completion *completion)
{ … }

/**
 * vdo_release_hash_lock() - Release a data_vio's share of a hash lock, if held, and null out the
 *                           data_vio's reference to it.
 * @data_vio: The data_vio releasing its hash lock.
 *
 * If the data_vio is the only one holding the lock, this also releases any resources or locks used
 * by the hash lock (such as a PBN read lock on a block containing data with the same hash) and
 * returns the lock to the hash zone's lock pool.
 *
 * Context: This must only be called in the correct thread for the hash zone.
 */
void vdo_release_hash_lock(struct data_vio *data_vio)
{ … }

/**
 * transfer_allocation_lock() - Transfer a data_vio's downgraded allocation PBN lock to the
 *                              data_vio's hash lock, converting it to a duplicate PBN lock.
 * @data_vio: The data_vio holding the allocation lock to transfer.
 */
static void transfer_allocation_lock(struct data_vio *data_vio)
{ … }

/**
 * vdo_share_compressed_write_lock() - Make a data_vio's hash lock a shared holder of the PBN lock
 *                                     on the compressed block to which its data was just written.
 * @data_vio: The data_vio which was just compressed.
 * @pbn_lock: The PBN lock on the compressed block.
 *
 * If the lock is still a write lock (as it will be for the first share), it will be converted to a
 * read lock. This also reserves a reference count increment for the data_vio.
 */
void vdo_share_compressed_write_lock(struct data_vio *data_vio,
				     struct pbn_lock *pbn_lock)
{ … }

static void start_uds_queue(void *ptr)
{ … }

static void finish_uds_queue(void *ptr __always_unused)
{ … }

static void close_index(struct hash_zones *zones)
	__must_hold(&zones->lock)
{ … }

static void open_index(struct hash_zones *zones)
	__must_hold(&zones->lock)
{ … }

static void change_dedupe_state(struct vdo_completion *completion)
{ … }

static void start_expiration_timer(struct dedupe_context *context)
{ … }

/**
 * report_dedupe_timeouts() - Record and eventually report that some dedupe requests reached their
 *                            expiration time without getting answers, so we timed them out.
 * @zones: the hash zones.
 * @timeouts: the number of newly timed out requests.
 */
static void report_dedupe_timeouts(struct hash_zones *zones, unsigned int timeouts)
{ … }

static int initialize_index(struct vdo *vdo, struct hash_zones *zones)
{ … }

/**
 * finish_index_operation() - This is the UDS callback for index queries.
 * @request: The uds request which has just completed.
 */
static void finish_index_operation(struct uds_request *request)
{ … }

/**
 * check_for_drain_complete() - Check whether this zone has drained.
 * @zone: The zone to check.
 */
static void check_for_drain_complete(struct hash_zone *zone)
{ … }

static void timeout_index_operations_callback(struct vdo_completion *completion)
{ … }

static void timeout_index_operations(struct timer_list *t)
{ … }

static int __must_check initialize_zone(struct vdo *vdo, struct hash_zones *zones,
					zone_count_t zone_number)
{ … }

/** get_thread_id_for_zone() - Implements vdo_zone_thread_getter_fn. */
static thread_id_t get_thread_id_for_zone(void *context, zone_count_t zone_number)
{ … }

/**
 * vdo_make_hash_zones() - Create the hash zones.
 *
 * @vdo: The vdo to which the zone will belong.
 * @zones_ptr: A pointer to hold the zones.
 *
 * Return: VDO_SUCCESS or an error code.
 */
int vdo_make_hash_zones(struct vdo *vdo, struct hash_zones **zones_ptr)
{ … }

void vdo_finish_dedupe_index(struct hash_zones *zones)
{ … }

/**
 * vdo_free_hash_zones() - Free the hash zones.
 * @zones: The zone to free.
 */
void vdo_free_hash_zones(struct hash_zones *zones)
{ … }

static void initiate_suspend_index(struct admin_state *state)
{ … }

/**
 * suspend_index() - Suspend the UDS index prior to draining hash zones.
 *
 * Implements vdo_action_preamble_fn
 */
static void suspend_index(void *context, struct vdo_completion *completion)
{ … }

/**
 * initiate_drain() - Initiate a drain.
 *
 * Implements vdo_admin_initiator_fn.
 */
static void initiate_drain(struct admin_state *state)
{ … }

/**
 * drain_hash_zone() - Drain a hash zone.
 *
 * Implements vdo_zone_action_fn.
 */
static void drain_hash_zone(void *context, zone_count_t zone_number,
			    struct vdo_completion *parent)
{ … }

/** vdo_drain_hash_zones() - Drain all hash zones. */
void vdo_drain_hash_zones(struct hash_zones *zones, struct vdo_completion *parent)
{ … }

static void launch_dedupe_state_change(struct hash_zones *zones)
	__must_hold(&zones->lock)
{ … }

/**
 * resume_index() - Resume the UDS index prior to resuming hash zones.
 *
 * Implements vdo_action_preamble_fn
 */
static void resume_index(void *context, struct vdo_completion *parent)
{ … }

/**
 * resume_hash_zone() - Resume a hash zone.
 *
 * Implements vdo_zone_action_fn.
 */
static void resume_hash_zone(void *context, zone_count_t zone_number,
			     struct vdo_completion *parent)
{ … }

/**
 * vdo_resume_hash_zones() - Resume a set of hash zones.
 * @zones: The hash zones to resume.
 * @parent: The object to notify when the zones have resumed.
 */
void vdo_resume_hash_zones(struct hash_zones *zones, struct vdo_completion *parent)
{ … }

/**
 * get_hash_zone_statistics() - Add the statistics for this hash zone to the tally for all zones.
 * @zone: The hash zone to query.
 * @tally: The tally
 */
static void get_hash_zone_statistics(const struct hash_zone *zone,
				     struct hash_lock_statistics *tally)
{ … }

static void get_index_statistics(struct hash_zones *zones,
				 struct index_statistics *stats)
{ … }

/**
 * vdo_get_dedupe_statistics() - Tally the statistics from all the hash zones and the UDS index.
 * @hash_zones: The hash zones to query
 *
 * Return: The sum of the hash lock statistics from all hash zones plus the statistics from the UDS
 *         index
 */
void vdo_get_dedupe_statistics(struct hash_zones *zones, struct vdo_statistics *stats)

{ … }

/**
 * vdo_select_hash_zone() - Select the hash zone responsible for locking a given record name.
 * @zones: The hash_zones from which to select.
 * @name: The record name.
 *
 * Return: The hash zone responsible for the record name.
 */
struct hash_zone *vdo_select_hash_zone(struct hash_zones *zones,
				       const struct uds_record_name *name)
{ … }

/**
 * dump_hash_lock() - Dump a compact description of hash_lock to the log if the lock is not on the
 *                    free list.
 * @lock: The hash lock to dump.
 */
static void dump_hash_lock(const struct hash_lock *lock)
{ … }

static const char *index_state_to_string(struct hash_zones *zones,
					 enum index_state state)
{ … }

/**
 * dump_hash_zone() - Dump information about a hash zone to the log for debugging.
 * @zone: The zone to dump.
 */
static void dump_hash_zone(const struct hash_zone *zone)
{ … }

/**
 * vdo_dump_hash_zones() - Dump information about the hash zones to the log for debugging.
 * @zones: The zones to dump.
 */
void vdo_dump_hash_zones(struct hash_zones *zones)
{ … }

void vdo_set_dedupe_index_timeout_interval(unsigned int value)
{ … }

void vdo_set_dedupe_index_min_timer_interval(unsigned int value)
{ … }

/**
 * acquire_context() - Acquire a dedupe context from a hash_zone if any are available.
 * @zone: the hash zone
 *
 * Return: A dedupe_context or NULL if none are available
 */
static struct dedupe_context * __must_check acquire_context(struct hash_zone *zone)
{ … }

static void prepare_uds_request(struct uds_request *request, struct data_vio *data_vio,
				enum uds_request_type operation)
{ … }

/*
 * The index operation will inquire about data_vio.record_name, providing (if the operation is
 * appropriate) advice from the data_vio's new_mapped fields. The advice found in the index (or
 * NULL if none) will be returned via receive_data_vio_dedupe_advice(). dedupe_context.status is
 * set to the return status code of any asynchronous index processing.
 */
static void query_index(struct data_vio *data_vio, enum uds_request_type operation)
{ … }

static void set_target_state(struct hash_zones *zones, enum index_state target,
			     bool change_dedupe, bool dedupe, bool set_create)
{ … }

const char *vdo_get_dedupe_index_state_name(struct hash_zones *zones)
{ … }

/* Handle a dmsetup message relevant to the index. */
int vdo_message_dedupe_index(struct hash_zones *zones, const char *name)
{ … }

void vdo_set_dedupe_state_normal(struct hash_zones *zones)
{ … }

/* If create_flag, create a new index without first attempting to load an existing index. */
void vdo_start_dedupe_index(struct hash_zones *zones, bool create_flag)
{ … }
linux/drivers/md/dm-vdo/dedupe.c