MemorySanitizer.cpp | Explore in Territory

//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
/// \file
/// This file is a part of MemorySanitizer, a detector of uninitialized
/// reads.
///
/// The algorithm of the tool is similar to Memcheck
/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
/// We associate a few shadow bits with every byte of the application memory,
/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
/// bits on every memory read, propagate the shadow bits through some of the
/// arithmetic instruction (including MOV), store the shadow bits on every memory
/// write, report a bug on some other instructions (e.g. JMP) if the
/// associated shadow is poisoned.
///
/// But there are differences too. The first and the major one:
/// compiler instrumentation instead of binary instrumentation. This
/// gives us much better register allocation, possible compiler
/// optimizations and a fast start-up. But this brings the major issue
/// as well: msan needs to see all program events, including system
/// calls and reads/writes in system libraries, so we either need to
/// compile *everything* with msan or use a binary translation
/// component (e.g. DynamoRIO) to instrument pre-built libraries.
/// Another difference from Memcheck is that we use 8 shadow bits per
/// byte of application memory and use a direct shadow mapping. This
/// greatly simplifies the instrumentation code and avoids races on
/// shadow updates (Memcheck is single-threaded so races are not a
/// concern there. Memcheck uses 2 shadow bits per byte with a slow
/// path storage that uses 8 bits per byte).
///
/// The default value of shadow is 0, which means "clean" (not poisoned).
///
/// Every module initializer should call __msan_init to ensure that the
/// shadow memory is ready. On error, __msan_warning is called. Since
/// parameters and return values may be passed via registers, we have a
/// specialized thread-local shadow for return values
/// (__msan_retval_tls) and parameters (__msan_param_tls).
///
///                           Origin tracking.
///
/// MemorySanitizer can track origins (allocation points) of all uninitialized
/// values. This behavior is controlled with a flag (msan-track-origins) and is
/// disabled by default.
///
/// Origins are 4-byte values created and interpreted by the runtime library.
/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
/// of application memory. Propagation of origins is basically a bunch of
/// "select" instructions that pick the origin of a dirty argument, if an
/// instruction has one.
///
/// Every 4 aligned, consecutive bytes of application memory have one origin
/// value associated with them. If these bytes contain uninitialized data
/// coming from 2 different allocations, the last store wins. Because of this,
/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
/// practice.
///
/// Origins are meaningless for fully initialized values, so MemorySanitizer
/// avoids storing origin to memory when a fully initialized value is stored.
/// This way it avoids needless overwriting origin of the 4-byte region on
/// a short (i.e. 1 byte) clean store, and it is also good for performance.
///
///                            Atomic handling.
///
/// Ideally, every atomic store of application value should update the
/// corresponding shadow location in an atomic way. Unfortunately, atomic store
/// of two disjoint locations can not be done without severe slowdown.
///
/// Therefore, we implement an approximation that may err on the safe side.
/// In this implementation, every atomically accessed location in the program
/// may only change from (partially) uninitialized to fully initialized, but
/// not the other way around. We load the shadow _after_ the application load,
/// and we store the shadow _before_ the app store. Also, we always store clean
/// shadow (if the application store is atomic). This way, if the store-load
/// pair constitutes a happens-before arc, shadow store and load are correctly
/// ordered such that the load will get either the value that was stored, or
/// some later value (which is always clean).
///
/// This does not work very well with Compare-And-Swap (CAS) and
/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
/// must store the new shadow before the app operation, and load the shadow
/// after the app operation. Computers don't work this way. Current
/// implementation ignores the load aspect of CAS/RMW, always returning a clean
/// value. It implements the store part as a simple atomic store by storing a
/// clean shadow.
///
///                      Instrumenting inline assembly.
///
/// For inline assembly code LLVM has little idea about which memory locations
/// become initialized depending on the arguments. It can be possible to figure
/// out which arguments are meant to point to inputs and outputs, but the
/// actual semantics can be only visible at runtime. In the Linux kernel it's
/// also possible that the arguments only indicate the offset for a base taken
/// from a segment register, so it's dangerous to treat any asm() arguments as
/// pointers. We take a conservative approach generating calls to
///   __msan_instrument_asm_store(ptr, size)
/// , which defer the memory unpoisoning to the runtime library.
/// The latter can perform more complex address checks to figure out whether
/// it's safe to touch the shadow memory.
/// Like with atomic operations, we call __msan_instrument_asm_store() before
/// the assembly call, so that changes to the shadow memory will be seen by
/// other threads together with main memory initialization.
///
///                  KernelMemorySanitizer (KMSAN) implementation.
///
/// The major differences between KMSAN and MSan instrumentation are:
///  - KMSAN always tracks the origins and implies msan-keep-going=true;
///  - KMSAN allocates shadow and origin memory for each page separately, so
///    there are no explicit accesses to shadow and origin in the
///    instrumentation.
///    Shadow and origin values for a particular X-byte memory location
///    (X=1,2,4,8) are accessed through pointers obtained via the
///      __msan_metadata_ptr_for_load_X(ptr)
///      __msan_metadata_ptr_for_store_X(ptr)
///    functions. The corresponding functions check that the X-byte accesses
///    are possible and returns the pointers to shadow and origin memory.
///    Arbitrary sized accesses are handled with:
///      __msan_metadata_ptr_for_load_n(ptr, size)
///      __msan_metadata_ptr_for_store_n(ptr, size);
///    Note that the sanitizer code has to deal with how shadow/origin pairs
///    returned by the these functions are represented in different ABIs. In
///    the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
///    returned in r3 and r4, and in the SystemZ ABI they are written to memory
///    pointed to by a hidden parameter.
///  - TLS variables are stored in a single per-task struct. A call to a
///    function __msan_get_context_state() returning a pointer to that struct
///    is inserted into every instrumented function before the entry block;
///  - __msan_warning() takes a 32-bit origin parameter;
///  - local variables are poisoned with __msan_poison_alloca() upon function
///    entry and unpoisoned with __msan_unpoison_alloca() before leaving the
///    function;
///  - the pass doesn't declare any global variables or add global constructors
///    to the translation unit.
///
/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
/// calls, making sure we're on the safe side wrt. possible false positives.
///
///  KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
///  moment.
///
//
// FIXME: This sanitizer does not yet handle scalable vectors
//
//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Instrumentation/MemorySanitizer.h"
#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"
#include "llvm/IR/AttributeMask.h"
#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallingConv.h"
#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/IntrinsicsAArch64.h"
#include "llvm/IR/IntrinsicsX86.h"
#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"
#include "llvm/IR/ValueMap.h"
#include "llvm/Support/Alignment.h"
#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/DebugCounter.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/TargetParser/Triple.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Instrumentation.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"
#include <algorithm>
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <memory>
#include <string>
#include <tuple>

usingnamespacellvm;

#define DEBUG_TYPE …

DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
              "Controls which checks to insert");

DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
              "Controls which instruction to instrument");

static const unsigned kOriginSize = …;
static const Align kMinOriginAlignment = …;
static const Align kShadowTLSAlignment = …;

// These constants must be kept in sync with the ones in msan.h.
static const unsigned kParamTLSSize = …;
static const unsigned kRetvalTLSSize = …;

// Accesses sizes are powers of two: 1, 2, 4, 8.
static const size_t kNumberOfAccessSizes = …;

/// Track origins of uninitialized values.
///
/// Adds a section to MemorySanitizer report that points to the allocation
/// (stack or heap) the uninitialized bits came from originally.
static cl::opt<int> ClTrackOrigins(
    "msan-track-origins",
    cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
    cl::init(0));

static cl::opt<bool> ClKeepGoing("msan-keep-going",
                                 cl::desc("keep going after reporting a UMR"),
                                 cl::Hidden, cl::init(false));

static cl::opt<bool>
    ClPoisonStack("msan-poison-stack",
                  cl::desc("poison uninitialized stack variables"), cl::Hidden,
                  cl::init(true));

static cl::opt<bool> ClPoisonStackWithCall(
    "msan-poison-stack-with-call",
    cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
    cl::init(false));

static cl::opt<int> ClPoisonStackPattern(
    "msan-poison-stack-pattern",
    cl::desc("poison uninitialized stack variables with the given pattern"),
    cl::Hidden, cl::init(0xff));

static cl::opt<bool>
    ClPrintStackNames("msan-print-stack-names",
                      cl::desc("Print name of local stack variable"),
                      cl::Hidden, cl::init(true));

static cl::opt<bool> ClPoisonUndef("msan-poison-undef",
                                   cl::desc("poison undef temps"), cl::Hidden,
                                   cl::init(true));

static cl::opt<bool>
    ClHandleICmp("msan-handle-icmp",
                 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
                 cl::Hidden, cl::init(true));

static cl::opt<bool>
    ClHandleICmpExact("msan-handle-icmp-exact",
                      cl::desc("exact handling of relational integer ICmp"),
                      cl::Hidden, cl::init(false));

static cl::opt<bool> ClHandleLifetimeIntrinsics(
    "msan-handle-lifetime-intrinsics",
    cl::desc(
        "when possible, poison scoped variables at the beginning of the scope "
        "(slower, but more precise)"),
    cl::Hidden, cl::init(true));

// When compiling the Linux kernel, we sometimes see false positives related to
// MSan being unable to understand that inline assembly calls may initialize
// local variables.
// This flag makes the compiler conservatively unpoison every memory location
// passed into an assembly call. Note that this may cause false positives.
// Because it's impossible to figure out the array sizes, we can only unpoison
// the first sizeof(type) bytes for each type* pointer.
static cl::opt<bool> ClHandleAsmConservative(
    "msan-handle-asm-conservative",
    cl::desc("conservative handling of inline assembly"), cl::Hidden,
    cl::init(true));

// This flag controls whether we check the shadow of the address
// operand of load or store. Such bugs are very rare, since load from
// a garbage address typically results in SEGV, but still happen
// (e.g. only lower bits of address are garbage, or the access happens
// early at program startup where malloc-ed memory is more likely to
// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
static cl::opt<bool> ClCheckAccessAddress(
    "msan-check-access-address",
    cl::desc("report accesses through a pointer which has poisoned shadow"),
    cl::Hidden, cl::init(true));

static cl::opt<bool> ClEagerChecks(
    "msan-eager-checks",
    cl::desc("check arguments and return values at function call boundaries"),
    cl::Hidden, cl::init(false));

static cl::opt<bool> ClDumpStrictInstructions(
    "msan-dump-strict-instructions",
    cl::desc("print out instructions with default strict semantics"),
    cl::Hidden, cl::init(false));

static cl::opt<int> ClInstrumentationWithCallThreshold(
    "msan-instrumentation-with-call-threshold",
    cl::desc(
        "If the function being instrumented requires more than "
        "this number of checks and origin stores, use callbacks instead of "
        "inline checks (-1 means never use callbacks)."),
    cl::Hidden, cl::init(3500));

static cl::opt<bool>
    ClEnableKmsan("msan-kernel",
                  cl::desc("Enable KernelMemorySanitizer instrumentation"),
                  cl::Hidden, cl::init(false));

static cl::opt<bool>
    ClDisableChecks("msan-disable-checks",
                    cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
                    cl::init(false));

static cl::opt<bool>
    ClCheckConstantShadow("msan-check-constant-shadow",
                          cl::desc("Insert checks for constant shadow values"),
                          cl::Hidden, cl::init(true));

// This is off by default because of a bug in gold:
// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
static cl::opt<bool>
    ClWithComdat("msan-with-comdat",
                 cl::desc("Place MSan constructors in comdat sections"),
                 cl::Hidden, cl::init(false));

// These options allow to specify custom memory map parameters
// See MemoryMapParams for details.
static cl::opt<uint64_t> ClAndMask("msan-and-mask",
                                   cl::desc("Define custom MSan AndMask"),
                                   cl::Hidden, cl::init(0));

static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
                                   cl::desc("Define custom MSan XorMask"),
                                   cl::Hidden, cl::init(0));

static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
                                      cl::desc("Define custom MSan ShadowBase"),
                                      cl::Hidden, cl::init(0));

static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
                                      cl::desc("Define custom MSan OriginBase"),
                                      cl::Hidden, cl::init(0));

static cl::opt<int>
    ClDisambiguateWarning("msan-disambiguate-warning-threshold",
                          cl::desc("Define threshold for number of checks per "
                                   "debug location to force origin update."),
                          cl::Hidden, cl::init(3));

const char kMsanModuleCtorName[] = …;
const char kMsanInitName[] = …;

namespace {

// Memory map parameters used in application-to-shadow address calculation.
// Offset = (Addr & ~AndMask) ^ XorMask
// Shadow = ShadowBase + Offset
// Origin = OriginBase + Offset
struct MemoryMapParams { … };

struct PlatformMemoryMapParams { … };

} // end anonymous namespace

// i386 Linux
static const MemoryMapParams Linux_I386_MemoryMapParams = …;

// x86_64 Linux
static const MemoryMapParams Linux_X86_64_MemoryMapParams = …;

// mips64 Linux
static const MemoryMapParams Linux_MIPS64_MemoryMapParams = …;

// ppc64 Linux
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = …;

// s390x Linux
static const MemoryMapParams Linux_S390X_MemoryMapParams = …;

// aarch64 Linux
static const MemoryMapParams Linux_AArch64_MemoryMapParams = …;

// loongarch64 Linux
static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = …;

// aarch64 FreeBSD
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = …;

// i386 FreeBSD
static const MemoryMapParams FreeBSD_I386_MemoryMapParams = …;

// x86_64 FreeBSD
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = …;

// x86_64 NetBSD
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = …;

static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = …;

static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = …;

static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = …;

static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = …;

static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = …;

static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = …;

static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = …;

static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = …;

static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = …;

namespace {

/// Instrument functions of a module to detect uninitialized reads.
///
/// Instantiating MemorySanitizer inserts the msan runtime library API function
/// declarations into the module if they don't exist already. Instantiating
/// ensures the __msan_init function is in the list of global constructors for
/// the module.
class MemorySanitizer { … };

void insertModuleCtor(Module &M) { … }

template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) { … }

} // end anonymous namespace

MemorySanitizerOptions::MemorySanitizerOptions(int TO, bool R, bool K,
                                               bool EagerChecks)
    : … { … }

PreservedAnalyses MemorySanitizerPass::run(Module &M,
                                           ModuleAnalysisManager &AM) { … }

void MemorySanitizerPass::printPipeline(
    raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) { … }

/// Create a non-const global initialized with the given string.
///
/// Creates a writable global for Str so that we can pass it to the
/// run-time lib. Runtime uses first 4 bytes of the string to store the
/// frame ID, so the string needs to be mutable.
static GlobalVariable *createPrivateConstGlobalForString(Module &M,
                                                         StringRef Str) { … }

template <typename... ArgsTy>
FunctionCallee
MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
                                                 ArgsTy... Args) { … }

/// Create KMSAN API callbacks.
void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) { … }

static Constant *getOrInsertGlobal(Module &M, StringRef Name, Type *Ty) { … }

/// Insert declarations for userspace-specific functions and globals.
void MemorySanitizer::createUserspaceApi(Module &M, const TargetLibraryInfo &TLI) { … }

/// Insert extern declaration of runtime-provided functions and globals.
void MemorySanitizer::initializeCallbacks(Module &M, const TargetLibraryInfo &TLI) { … }

FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
                                                             int size) { … }

/// Module-level initialization.
///
/// inserts a call to __msan_init to the module's constructor list.
void MemorySanitizer::initializeModule(Module &M) { … }

namespace {

/// A helper class that handles instrumentation of VarArg
/// functions on a particular platform.
///
/// Implementations are expected to insert the instrumentation
/// necessary to propagate argument shadow through VarArg function
/// calls. Visit* methods are called during an InstVisitor pass over
/// the function, and should avoid creating new basic blocks. A new
/// instance of this class is created for each instrumented function.
struct VarArgHelper { … };

struct MemorySanitizerVisitor;

} // end anonymous namespace

static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
                                        MemorySanitizerVisitor &Visitor);

static unsigned TypeSizeToSizeIndex(TypeSize TS) { … }

namespace {

/// Helper class to attach debug information of the given instruction onto new
/// instructions inserted after.
class NextNodeIRBuilder : public IRBuilder<> { … };

/// This class does all the work for a given function. Store and Load
/// instructions store and load corresponding shadow and origin
/// values. Most instructions propagate shadow from arguments to their
/// return values. Certain instructions (most importantly, BranchInst)
/// test their argument shadow and print reports (with a runtime call) if it's
/// non-zero.
struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> { … };

struct VarArgHelperBase : public VarArgHelper { … };

/// AMD64-specific implementation of VarArgHelper.
struct VarArgAMD64Helper : public VarArgHelperBase { … };

/// MIPS64-specific implementation of VarArgHelper.
/// NOTE: This is also used for LoongArch64.
struct VarArgMIPS64Helper : public VarArgHelperBase { … };

/// AArch64-specific implementation of VarArgHelper.
struct VarArgAArch64Helper : public VarArgHelperBase { … };

/// PowerPC64-specific implementation of VarArgHelper.
struct VarArgPowerPC64Helper : public VarArgHelperBase { … };

/// SystemZ-specific implementation of VarArgHelper.
struct VarArgSystemZHelper : public VarArgHelperBase { … };

// Loongarch64 is not a MIPS, but the current vargs calling convention matches
// the MIPS.
VarArgLoongArch64Helper;

/// A no-op implementation of VarArgHelper.
struct VarArgNoOpHelper : public VarArgHelper { … };

} // end anonymous namespace

static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
                                        MemorySanitizerVisitor &Visitor) { … }

bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) { … }
llvm/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp