Let’s remember the function that we want to implement:

void traceMemory(void *Addr, uint64_t Value, bool IsLoad) {
    if (IsLoad)
        fprintf(_MemoryTraceFP, "[Read] Read value 0x%lx from address %p\n", Value, Addr);
    else
        fprintf(_MemoryTraceFP, "[Write] Wrote value 0x%lx to address %p\n", Value, Addr);
}

We will need to implement this in LLVM IR using llvm::IRBuilder - which as we saw in the previous section is an extremely powerful tool for building arbitrary LLVM IR sequences.

Let’s write this function into a .c file and compile it with clang -emit-llvm to get a sense of the LLVM IR we should write (toggled away for brevity):

We can see that our function centers on an icmp instruction, followed by a br - which will jump to either the load or store trace based on the comparison result. Each branch then ends with an unconditional jump to a ret void.

Seems simple enough, let’s start coding this in our pass:

Step One - Creating the Function

We’ve already seen usage of llvm::Module::getOrInsertFunction back when we wanted to make sure that we had access to fprintf. But whilst fprintf is a C standard library function - with the C standard library available to us at link time - this time we’ll be inserting our very own function!

Once again, we’ll be inserting directly into our main module - and as our function will be externally linked, we’ll be able to access it from all compilation modules.

We’ll add a function call to our run method:

llvm::PreservedAnalyses run(llvm::Module &M,
                        llvm::ModuleAnalysisManager &) {
    Function *main = M.getFunction("main");
    if (main) {
            addGlobalMemoryTraceFP(M);
            addMemoryTraceFPInitialization(M, *main);
            addTraceMemoryFunction(M);
            errs() << "Found main in module " << M.getName() << "\n";
            return llvm::PreservedAnalyses::none();
    } else {
            errs() << "Did not find main in " << M.getName() << "\n";
            return llvm::PreservedAnalyses::all();
    }
}

We’ll start by implementing an empty externally-linked function that just calls ret void:

const std::string TraceMemoryFunctionName = "_TraceMemory";
 
void addTraceMemoryFunction(llvm::Module &M) {
    auto &CTX = M.getContext();
 
    std::vector<llvm::Type*> TraceMemoryArgs{
        PointerType::getUnqual(Type::getInt8Ty(CTX)),
        Type::getInt64Ty(CTX),
        Type::getInt32Ty(CTX)
    };
 
    FunctionType *TraceMemoryTy = FunctionType::get(Type::getVoidTy(CTX),
                                                    TraceMemoryArgs,
                                                    false);
 
    FunctionCallee TraceMemory = M.getOrInsertFunction(TraceMemoryFunctionName, TraceMemoryTy);
 
    llvm::Function *TraceMemoryFunction = dyn_cast<llvm::Function>(TraceMemory.getCallee());
    TraceMemoryFunction->setLinkage(GlobalValue::ExternalLinkage);
 
    llvm::BasicBlock *BB = llvm::BasicBlock::Create(CTX, "entry", TraceMemoryFunction);
    IRBuilder<> Builder(BB);
 
    Builder.CreateRetVoid();
}

The first part of addTraceMemoryFunction is familiar - it’s similar to how we included fprintf - we just need to define the parameters we want and the function signature we’re defining, and then we can call getOrInsertFunction.

The second part is more interesting - we need to explicitly create a basic block we can add opcodes to, because by default LLVM will just emit an empty function:

declare void @_TraceMemory(i8*, i64, i32)

By explicitly creating a basic block and adding our ret void opcode, we get:

define void @_TraceMemory(i8* %0, i64 %1, i32 %2) {
entry:
  ret void
}

Cool! This is the exact function definition from when we compiled our C program. Now to implement the logic:

Step Two - Adding LLVM IR Sequences to traceMemory

Rather than just have a ret void, let’s start coding real logic:

llvm::BasicBlock *BB = llvm::BasicBlock::Create(CTX, "entry", TraceMemoryFunction);
IRBuilder<> Builder(BB);
 
llvm::Value *CmpResult = Builder.CreateICmpNE(TraceMemoryFunction->getArg(2), Builder.getInt32(0));
 
llvm::BasicBlock *TraceLoadBB = llvm::BasicBlock::Create(CTX, "traceLoad", TraceMemoryFunction);
 
llvm::BasicBlock *TraceStoreBB = llvm::BasicBlock::Create(CTX, "traceStore", TraceMemoryFunction);
Builder.CreateCondBr(CmpResult, TraceLoadBB, TraceStoreBB);
 
Builder.SetInsertPoint(TraceLoadBB);
Builder.CreateRetVoid();
 
Builder.SetInsertPoint(TraceStoreBB);
Builder.CreateRetVoid();

This is our skeleton CFG - we insert a != conditional check, and then we create two basic blocks for each branch of the check. We create a conditional branch to those basic blocks, and then we use IRBuilder<>::SetInsertPoint to redirect our Builder object to add dummy ret void instructions to each branch.

We get this LLVM IR:

define void @_TraceMemory(i8* %0, i64 %1, i32 %2) {
entry:
  %3 = icmp ne i32 %2, 0
  br i1 %3, label %traceLoad, label %traceStore
 
traceLoad:                                        ; preds = %entry
  ret void
 
traceStore:                                       ; preds = %entry
  ret void
}

And finally let’s flesh both of these basic blocks out:

Builder.SetInsertPoint(TraceLoadBB);
addMemoryTraceToBB(Builder, *TraceMemoryFunction, M, /*IsLoad=*/true);
Builder.CreateRetVoid();
 
Builder.SetInsertPoint(TraceStoreBB);
addMemoryTraceToBB(Builder, *TraceMemoryFunction, M, /*IsLoad=*/false);
Builder.CreateRetVoid();

And let’s define addMemoryTraceToBB as follows:

void addMemoryTraceToBB(IRBuilder<> &Builder, llvm::Function &Function, llvm::Module &M, bool IsLoad) {
    auto &CTX = M.getContext();
 
    std::vector<llvm::Type*> FprintfArgs{
                                    PointerType::getUnqual(Type::getInt8Ty(CTX)),
                                    PointerType::getUnqual(Type::getInt8Ty(CTX))
                                    };
 
    FunctionType *FprintfTy = FunctionType::get(Type::getInt32Ty(CTX),
                                            FprintfArgs,
                                            true);
 
    FunctionCallee Fprintf = M.getOrInsertFunction("fprintf", FprintfTy);
 
    Constant *TraceLoadStr = llvm::ConstantDataArray::getString(CTX, "[Read] Read value 0x%lx from address %p\n");
    Constant *TraceLoadStrVar = M.getOrInsertGlobal("TraceLoadStr", TraceLoadStr->getType());
    dyn_cast<GlobalVariable>(TraceLoadStrVar)->setInitializer(TraceLoadStr);
 
    Constant *TraceStoreStr = llvm::ConstantDataArray::getString(CTX, "[Write] Wrote value 0x%lx to address %p\n");
    Constant *TraceStoreStrVar = M.getOrInsertGlobal("TraceStoreStr", TraceStoreStr->getType());
    dyn_cast<GlobalVariable>(TraceStoreStrVar)->setInitializer(TraceStoreStr);
 
    llvm::Value *StrPtr;
    if (IsLoad)
            StrPtr = Builder.CreatePointerCast(TraceLoadStrVar, FprintfArgs[1], "loadStrPtr");
    else
            StrPtr = Builder.CreatePointerCast(TraceStoreStrVar, FprintfArgs[1], "storeStrPtr");
 
    GlobalVariable *FPGlobal = M.getNamedGlobal(FilePointerVarName);
    llvm::LoadInst *FP = Builder.CreateLoad(PointerType::getUnqual(Type::getInt8Ty(CTX)), FPGlobal);
    Builder.CreateCall(Fprintf, {FP, StrPtr, Function.getArg(1), Function.getArg(0)});
}

This code does a few things:

  • It defines fprintf's signature using LLVM primitives, and then retrieves it using llvm::Module::getOrInsertFunction
  • It adds our tracing strings as global variables in the module
  • It retrieves our global file pointer - and adds a call to fprintf, passing in the file pointer, the appropriate tracing strings, and the arguments passed to our traceMemory function

And we can actually see right away that all of our code works!

> cat main.c
#include <stdint.h>
 
void _TraceMemory(void *Addr, uint64_t Value, int IsLoad);
 
int main() {
        _TraceMemory((void*)0x1234, 0x400, 1);
        return 0;
}
 
> clang -S -emit-llvm main.c
 
> opt -load-pass-plugin ./lib/libMemoryTrace.so -passes=memory-trace main.ll -S -o modified_main.ll
Found main in module main.ll
 
> clang modified_main.ll -o modified_main
 
> ./modified_main
 
> cat memory-traces.log
[Read] Read value 0x400 from address 0x1234

It works!!! All we had to do was define the function signature in our main.c - and our LLVM Pass did the rest of the work!

The next step is an exciting one - patching all memory access opcodes to call our function.