//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===// // // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // See https://llvm.org/LICENSE.txt for license information. // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // //===----------------------------------------------------------------------===// // /// \file /// This file is a part of MemorySanitizer, a detector of uninitialized /// reads. /// /// The algorithm of the tool is similar to Memcheck /// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html) /// We associate a few shadow bits with every byte of the application memory, /// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow, /// bits on every memory read, propagate the shadow bits through some of the /// arithmetic instruction (including MOV), store the shadow bits on every memory /// write, report a bug on some other instructions (e.g. JMP) if the /// associated shadow is poisoned. /// /// But there are differences too. The first and the major one: /// compiler instrumentation instead of binary instrumentation. This /// gives us much better register allocation, possible compiler /// optimizations and a fast start-up. But this brings the major issue /// as well: msan needs to see all program events, including system /// calls and reads/writes in system libraries, so we either need to /// compile *everything* with msan or use a binary translation /// component (e.g. DynamoRIO) to instrument pre-built libraries. /// Another difference from Memcheck is that we use 8 shadow bits per /// byte of application memory and use a direct shadow mapping. This /// greatly simplifies the instrumentation code and avoids races on /// shadow updates (Memcheck is single-threaded so races are not a /// concern there. Memcheck uses 2 shadow bits per byte with a slow /// path storage that uses 8 bits per byte). /// /// The default value of shadow is 0, which means "clean" (not poisoned). /// /// Every module initializer should call __msan_init to ensure that the /// shadow memory is ready. On error, __msan_warning is called. Since /// parameters and return values may be passed via registers, we have a /// specialized thread-local shadow for return values /// (__msan_retval_tls) and parameters (__msan_param_tls). /// /// Origin tracking. /// /// MemorySanitizer can track origins (allocation points) of all uninitialized /// values. This behavior is controlled with a flag (msan-track-origins) and is /// disabled by default. /// /// Origins are 4-byte values created and interpreted by the runtime library. /// They are stored in a second shadow mapping, one 4-byte value for 4 bytes /// of application memory. Propagation of origins is basically a bunch of /// "select" instructions that pick the origin of a dirty argument, if an /// instruction has one. /// /// Every 4 aligned, consecutive bytes of application memory have one origin /// value associated with them. If these bytes contain uninitialized data /// coming from 2 different allocations, the last store wins. Because of this, /// MemorySanitizer reports can show unrelated origins, but this is unlikely in /// practice. /// /// Origins are meaningless for fully initialized values, so MemorySanitizer /// avoids storing origin to memory when a fully initialized value is stored. /// This way it avoids needless overwriting origin of the 4-byte region on /// a short (i.e. 1 byte) clean store, and it is also good for performance. /// /// Atomic handling. /// /// Ideally, every atomic store of application value should update the /// corresponding shadow location in an atomic way. Unfortunately, atomic store /// of two disjoint locations can not be done without severe slowdown. /// /// Therefore, we implement an approximation that may err on the safe side. /// In this implementation, every atomically accessed location in the program /// may only change from (partially) uninitialized to fully initialized, but /// not the other way around. We load the shadow _after_ the application load, /// and we store the shadow _before_ the app store. Also, we always store clean /// shadow (if the application store is atomic). This way, if the store-load /// pair constitutes a happens-before arc, shadow store and load are correctly /// ordered such that the load will get either the value that was stored, or /// some later value (which is always clean). /// /// This does not work very well with Compare-And-Swap (CAS) and /// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW /// must store the new shadow before the app operation, and load the shadow /// after the app operation. Computers don't work this way. Current /// implementation ignores the load aspect of CAS/RMW, always returning a clean /// value. It implements the store part as a simple atomic store by storing a /// clean shadow. /// /// Instrumenting inline assembly. /// /// For inline assembly code LLVM has little idea about which memory locations /// become initialized depending on the arguments. It can be possible to figure /// out which arguments are meant to point to inputs and outputs, but the /// actual semantics can be only visible at runtime. In the Linux kernel it's /// also possible that the arguments only indicate the offset for a base taken /// from a segment register, so it's dangerous to treat any asm() arguments as /// pointers. We take a conservative approach generating calls to /// __msan_instrument_asm_store(ptr, size) /// , which defer the memory unpoisoning to the runtime library. /// The latter can perform more complex address checks to figure out whether /// it's safe to touch the shadow memory. /// Like with atomic operations, we call __msan_instrument_asm_store() before /// the assembly call, so that changes to the shadow memory will be seen by /// other threads together with main memory initialization. /// /// KernelMemorySanitizer (KMSAN) implementation. /// /// The major differences between KMSAN and MSan instrumentation are: /// - KMSAN always tracks the origins and implies msan-keep-going=true; /// - KMSAN allocates shadow and origin memory for each page separately, so /// there are no explicit accesses to shadow and origin in the /// instrumentation. /// Shadow and origin values for a particular X-byte memory location /// (X=1,2,4,8) are accessed through pointers obtained via the /// __msan_metadata_ptr_for_load_X(ptr) /// __msan_metadata_ptr_for_store_X(ptr) /// functions. The corresponding functions check that the X-byte accesses /// are possible and returns the pointers to shadow and origin memory. /// Arbitrary sized accesses are handled with: /// __msan_metadata_ptr_for_load_n(ptr, size) /// __msan_metadata_ptr_for_store_n(ptr, size); /// Note that the sanitizer code has to deal with how shadow/origin pairs /// returned by the these functions are represented in different ABIs. In /// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are /// returned in r3 and r4, and in the SystemZ ABI they are written to memory /// pointed to by a hidden parameter. /// - TLS variables are stored in a single per-task struct. A call to a /// function __msan_get_context_state() returning a pointer to that struct /// is inserted into every instrumented function before the entry block; /// - __msan_warning() takes a 32-bit origin parameter; /// - local variables are poisoned with __msan_poison_alloca() upon function /// entry and unpoisoned with __msan_unpoison_alloca() before leaving the /// function; /// - the pass doesn't declare any global variables or add global constructors /// to the translation unit. /// /// Also, KMSAN currently ignores uninitialized memory passed into inline asm /// calls, making sure we're on the safe side wrt. possible false positives. /// /// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the /// moment. /// // // FIXME: This sanitizer does not yet handle scalable vectors // //===----------------------------------------------------------------------===// #include "llvm/Transforms/Instrumentation/MemorySanitizer.h" #include "llvm/ADT/APInt.h" #include "llvm/ADT/ArrayRef.h" #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/DepthFirstIterator.h" #include "llvm/ADT/SetVector.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" #include "llvm/Analysis/GlobalsModRef.h" #include "llvm/Analysis/TargetLibraryInfo.h" #include "llvm/Analysis/ValueTracking.h" #include "llvm/IR/Argument.h" #include "llvm/IR/AttributeMask.h" #include "llvm/IR/Attributes.h" #include "llvm/IR/BasicBlock.h" #include "llvm/IR/CallingConv.h" #include "llvm/IR/Constant.h" #include "llvm/IR/Constants.h" #include "llvm/IR/DataLayout.h" #include "llvm/IR/DerivedTypes.h" #include "llvm/IR/Function.h" #include "llvm/IR/GlobalValue.h" #include "llvm/IR/GlobalVariable.h" #include "llvm/IR/IRBuilder.h" #include "llvm/IR/InlineAsm.h" #include "llvm/IR/InstVisitor.h" #include "llvm/IR/InstrTypes.h" #include "llvm/IR/Instruction.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/Intrinsics.h" #include "llvm/IR/IntrinsicsAArch64.h" #include "llvm/IR/IntrinsicsX86.h" #include "llvm/IR/MDBuilder.h" #include "llvm/IR/Module.h" #include "llvm/IR/Type.h" #include "llvm/IR/Value.h" #include "llvm/IR/ValueMap.h" #include "llvm/Support/Alignment.h" #include "llvm/Support/AtomicOrdering.h" #include "llvm/Support/Casting.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Support/DebugCounter.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/raw_ostream.h" #include "llvm/TargetParser/Triple.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h" #include "llvm/Transforms/Utils/Instrumentation.h" #include "llvm/Transforms/Utils/Local.h" #include "llvm/Transforms/Utils/ModuleUtils.h" #include <algorithm> #include <cassert> #include <cstddef> #include <cstdint> #include <memory> #include <string> #include <tuple> usingnamespacellvm; #define DEBUG_TYPE … DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check", "Controls which checks to insert"); DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction", "Controls which instruction to instrument"); static const unsigned kOriginSize = …; static const Align kMinOriginAlignment = …; static const Align kShadowTLSAlignment = …; // These constants must be kept in sync with the ones in msan.h. static const unsigned kParamTLSSize = …; static const unsigned kRetvalTLSSize = …; // Accesses sizes are powers of two: 1, 2, 4, 8. static const size_t kNumberOfAccessSizes = …; /// Track origins of uninitialized values. /// /// Adds a section to MemorySanitizer report that points to the allocation /// (stack or heap) the uninitialized bits came from originally. static cl::opt<int> ClTrackOrigins( "msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0)); static cl::opt<bool> ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false)); static cl::opt<bool> ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true)); static cl::opt<bool> ClPoisonStackWithCall( "msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false)); static cl::opt<int> ClPoisonStackPattern( "msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff)); static cl::opt<bool> ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true)); static cl::opt<bool> ClPoisonUndef("msan-poison-undef", cl::desc("poison undef temps"), cl::Hidden, cl::init(true)); static cl::opt<bool> ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true)); static cl::opt<bool> ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(false)); static cl::opt<bool> ClHandleLifetimeIntrinsics( "msan-handle-lifetime-intrinsics", cl::desc( "when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true)); // When compiling the Linux kernel, we sometimes see false positives related to // MSan being unable to understand that inline assembly calls may initialize // local variables. // This flag makes the compiler conservatively unpoison every memory location // passed into an assembly call. Note that this may cause false positives. // Because it's impossible to figure out the array sizes, we can only unpoison // the first sizeof(type) bytes for each type* pointer. static cl::opt<bool> ClHandleAsmConservative( "msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true)); // This flag controls whether we check the shadow of the address // operand of load or store. Such bugs are very rare, since load from // a garbage address typically results in SEGV, but still happen // (e.g. only lower bits of address are garbage, or the access happens // early at program startup where malloc-ed memory is more likely to // be zeroed. As of 2012-08-28 this flag adds 20% slowdown. static cl::opt<bool> ClCheckAccessAddress( "msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true)); static cl::opt<bool> ClEagerChecks( "msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false)); static cl::opt<bool> ClDumpStrictInstructions( "msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics"), cl::Hidden, cl::init(false)); static cl::opt<int> ClInstrumentationWithCallThreshold( "msan-instrumentation-with-call-threshold", cl::desc( "If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500)); static cl::opt<bool> ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false)); static cl::opt<bool> ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false)); static cl::opt<bool> ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true)); // This is off by default because of a bug in gold: // https://sourceware.org/bugzilla/show_bug.cgi?id=19002 static cl::opt<bool> ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false)); // These options allow to specify custom memory map parameters // See MemoryMapParams for details. static cl::opt<uint64_t> ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0)); static cl::opt<uint64_t> ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0)); static cl::opt<uint64_t> ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0)); static cl::opt<uint64_t> ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0)); static cl::opt<int> ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3)); const char kMsanModuleCtorName[] = …; const char kMsanInitName[] = …; namespace { // Memory map parameters used in application-to-shadow address calculation. // Offset = (Addr & ~AndMask) ^ XorMask // Shadow = ShadowBase + Offset // Origin = OriginBase + Offset struct MemoryMapParams { … }; struct PlatformMemoryMapParams { … }; } // end anonymous namespace // i386 Linux static const MemoryMapParams Linux_I386_MemoryMapParams = …; // x86_64 Linux static const MemoryMapParams Linux_X86_64_MemoryMapParams = …; // mips64 Linux static const MemoryMapParams Linux_MIPS64_MemoryMapParams = …; // ppc64 Linux static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = …; // s390x Linux static const MemoryMapParams Linux_S390X_MemoryMapParams = …; // aarch64 Linux static const MemoryMapParams Linux_AArch64_MemoryMapParams = …; // loongarch64 Linux static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = …; // aarch64 FreeBSD static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = …; // i386 FreeBSD static const MemoryMapParams FreeBSD_I386_MemoryMapParams = …; // x86_64 FreeBSD static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = …; // x86_64 NetBSD static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = …; static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = …; static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = …; static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = …; static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = …; static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = …; static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = …; static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = …; static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = …; static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = …; namespace { /// Instrument functions of a module to detect uninitialized reads. /// /// Instantiating MemorySanitizer inserts the msan runtime library API function /// declarations into the module if they don't exist already. Instantiating /// ensures the __msan_init function is in the list of global constructors for /// the module. class MemorySanitizer { … }; void insertModuleCtor(Module &M) { … } template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) { … } } // end anonymous namespace MemorySanitizerOptions::MemorySanitizerOptions(int TO, bool R, bool K, bool EagerChecks) : … { … } PreservedAnalyses MemorySanitizerPass::run(Module &M, ModuleAnalysisManager &AM) { … } void MemorySanitizerPass::printPipeline( raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) { … } /// Create a non-const global initialized with the given string. /// /// Creates a writable global for Str so that we can pass it to the /// run-time lib. Runtime uses first 4 bytes of the string to store the /// frame ID, so the string needs to be mutable. static GlobalVariable *createPrivateConstGlobalForString(Module &M, StringRef Str) { … } template <typename... ArgsTy> FunctionCallee MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name, ArgsTy... Args) { … } /// Create KMSAN API callbacks. void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) { … } static Constant *getOrInsertGlobal(Module &M, StringRef Name, Type *Ty) { … } /// Insert declarations for userspace-specific functions and globals. void MemorySanitizer::createUserspaceApi(Module &M, const TargetLibraryInfo &TLI) { … } /// Insert extern declaration of runtime-provided functions and globals. void MemorySanitizer::initializeCallbacks(Module &M, const TargetLibraryInfo &TLI) { … } FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore, int size) { … } /// Module-level initialization. /// /// inserts a call to __msan_init to the module's constructor list. void MemorySanitizer::initializeModule(Module &M) { … } namespace { /// A helper class that handles instrumentation of VarArg /// functions on a particular platform. /// /// Implementations are expected to insert the instrumentation /// necessary to propagate argument shadow through VarArg function /// calls. Visit* methods are called during an InstVisitor pass over /// the function, and should avoid creating new basic blocks. A new /// instance of this class is created for each instrumented function. struct VarArgHelper { … }; struct MemorySanitizerVisitor; } // end anonymous namespace static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor); static unsigned TypeSizeToSizeIndex(TypeSize TS) { … } namespace { /// Helper class to attach debug information of the given instruction onto new /// instructions inserted after. class NextNodeIRBuilder : public IRBuilder<> { … }; /// This class does all the work for a given function. Store and Load /// instructions store and load corresponding shadow and origin /// values. Most instructions propagate shadow from arguments to their /// return values. Certain instructions (most importantly, BranchInst) /// test their argument shadow and print reports (with a runtime call) if it's /// non-zero. struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> { … }; struct VarArgHelperBase : public VarArgHelper { … }; /// AMD64-specific implementation of VarArgHelper. struct VarArgAMD64Helper : public VarArgHelperBase { … }; /// MIPS64-specific implementation of VarArgHelper. /// NOTE: This is also used for LoongArch64. struct VarArgMIPS64Helper : public VarArgHelperBase { … }; /// AArch64-specific implementation of VarArgHelper. struct VarArgAArch64Helper : public VarArgHelperBase { … }; /// PowerPC64-specific implementation of VarArgHelper. struct VarArgPowerPC64Helper : public VarArgHelperBase { … }; /// SystemZ-specific implementation of VarArgHelper. struct VarArgSystemZHelper : public VarArgHelperBase { … }; // Loongarch64 is not a MIPS, but the current vargs calling convention matches // the MIPS. VarArgLoongArch64Helper; /// A no-op implementation of VarArgHelper. struct VarArgNoOpHelper : public VarArgHelper { … }; } // end anonymous namespace static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor) { … } bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) { … }