Let’s remember the function that we want to implement:
void traceMemory(void *Addr, uint64_t Value, bool IsLoad) {
if (IsLoad)
fprintf(_MemoryTraceFP, "[Read] Read value 0x%lx from address %p\n", Value, Addr);
else
fprintf(_MemoryTraceFP, "[Write] Wrote value 0x%lx to address %p\n", Value, Addr);
}
We will need to implement this in LLVM IR using llvm::IRBuilder
- which as we saw in the previous
section is an extremely powerful tool for building arbitrary LLVM IR sequences.
Let’s write this function into a .c
file and compile it with clang -emit-llvm
to get a sense
of the LLVM IR we should write (toggled away for brevity):
Note
The decision to trace both memory stores and memory loads from the same function is not a trivial one.
Since our LLVM Pass gives us full control, we could easily define two separate functions -
traceMemoryLoad
and traceMemoryStore
- and call the appropriate function for load and store
instructions. This would save us from having to branch in our memory trace function - we’d be
“offloading” the branching to compile time, which is pretty cool.
I chose to implement a shared function because I thought it would be a bit more interesting to
implement a branched function in our pass - and saving on a branch seems like an unnecessary
optimization given that we’re calling the very heavy fprintf
from our function anyways.
We can see that our function centers on an icmp
instruction, followed by a br
- which will jump
to either the load or store trace based on the comparison result. Each branch then ends with an
unconditional jump to a ret void
.
Seems simple enough, let’s start coding this in our pass:
Step One - Creating the Function
We’ve already seen usage of llvm::Module::getOrInsertFunction
back when we wanted to make sure
that we had access to fprintf
. But whilst fprintf
is a C standard library function - with the
C standard library available to us at link time - this time we’ll be inserting our very own
function!
Once again, we’ll be inserting directly into our main
module - and as our function will be
externally linked, we’ll be able to access it from all compilation modules.
We’ll add a function call to our run
method:
llvm::PreservedAnalyses run(llvm::Module &M,
llvm::ModuleAnalysisManager &) {
Function *main = M.getFunction("main");
if (main) {
addGlobalMemoryTraceFP(M);
addMemoryTraceFPInitialization(M, *main);
addTraceMemoryFunction(M);
errs() << "Found main in module " << M.getName() << "\n";
return llvm::PreservedAnalyses::none();
} else {
errs() << "Did not find main in " << M.getName() << "\n";
return llvm::PreservedAnalyses::all();
}
}
We’ll start by implementing an empty externally-linked function that just calls ret void
:
const std::string TraceMemoryFunctionName = "_TraceMemory";
void addTraceMemoryFunction(llvm::Module &M) {
auto &CTX = M.getContext();
std::vector<llvm::Type*> TraceMemoryArgs{
PointerType::getUnqual(Type::getInt8Ty(CTX)),
Type::getInt64Ty(CTX),
Type::getInt32Ty(CTX)
};
FunctionType *TraceMemoryTy = FunctionType::get(Type::getVoidTy(CTX),
TraceMemoryArgs,
false);
FunctionCallee TraceMemory = M.getOrInsertFunction(TraceMemoryFunctionName, TraceMemoryTy);
llvm::Function *TraceMemoryFunction = dyn_cast<llvm::Function>(TraceMemory.getCallee());
TraceMemoryFunction->setLinkage(GlobalValue::ExternalLinkage);
llvm::BasicBlock *BB = llvm::BasicBlock::Create(CTX, "entry", TraceMemoryFunction);
IRBuilder<> Builder(BB);
Builder.CreateRetVoid();
}
The first part of addTraceMemoryFunction
is familiar - it’s similar to how we included
fprintf
- we just need to define the parameters we want and the function signature we’re
defining, and then we can call getOrInsertFunction
.
The second part is more interesting - we need to explicitly create a basic block we can add opcodes to, because by default LLVM will just emit an empty function:
declare void @_TraceMemory(i8*, i64, i32)
By explicitly creating a basic block and adding our ret void
opcode, we get:
define void @_TraceMemory(i8* %0, i64 %1, i32 %2) {
entry:
ret void
}
Cool! This is the exact function definition from when we compiled our C program. Now to implement the logic:
Step Two - Adding LLVM IR Sequences to traceMemory
Rather than just have a ret void
, let’s start coding real logic:
llvm::BasicBlock *BB = llvm::BasicBlock::Create(CTX, "entry", TraceMemoryFunction);
IRBuilder<> Builder(BB);
llvm::Value *CmpResult = Builder.CreateICmpNE(TraceMemoryFunction->getArg(2), Builder.getInt32(0));
llvm::BasicBlock *TraceLoadBB = llvm::BasicBlock::Create(CTX, "traceLoad", TraceMemoryFunction);
llvm::BasicBlock *TraceStoreBB = llvm::BasicBlock::Create(CTX, "traceStore", TraceMemoryFunction);
Builder.CreateCondBr(CmpResult, TraceLoadBB, TraceStoreBB);
Builder.SetInsertPoint(TraceLoadBB);
Builder.CreateRetVoid();
Builder.SetInsertPoint(TraceStoreBB);
Builder.CreateRetVoid();
This is our skeleton CFG - we insert a !=
conditional check, and then we create two basic blocks
for each branch of the check. We create a conditional branch to those basic blocks, and then we
use IRBuilder<>::SetInsertPoint
to redirect our Builder
object to add dummy ret void
instructions to each branch.
We get this LLVM IR:
define void @_TraceMemory(i8* %0, i64 %1, i32 %2) {
entry:
%3 = icmp ne i32 %2, 0
br i1 %3, label %traceLoad, label %traceStore
traceLoad: ; preds = %entry
ret void
traceStore: ; preds = %entry
ret void
}
And finally let’s flesh both of these basic blocks out:
Builder.SetInsertPoint(TraceLoadBB);
addMemoryTraceToBB(Builder, *TraceMemoryFunction, M, /*IsLoad=*/true);
Builder.CreateRetVoid();
Builder.SetInsertPoint(TraceStoreBB);
addMemoryTraceToBB(Builder, *TraceMemoryFunction, M, /*IsLoad=*/false);
Builder.CreateRetVoid();
And let’s define addMemoryTraceToBB
as follows:
void addMemoryTraceToBB(IRBuilder<> &Builder, llvm::Function &Function, llvm::Module &M, bool IsLoad) {
auto &CTX = M.getContext();
std::vector<llvm::Type*> FprintfArgs{
PointerType::getUnqual(Type::getInt8Ty(CTX)),
PointerType::getUnqual(Type::getInt8Ty(CTX))
};
FunctionType *FprintfTy = FunctionType::get(Type::getInt32Ty(CTX),
FprintfArgs,
true);
FunctionCallee Fprintf = M.getOrInsertFunction("fprintf", FprintfTy);
Constant *TraceLoadStr = llvm::ConstantDataArray::getString(CTX, "[Read] Read value 0x%lx from address %p\n");
Constant *TraceLoadStrVar = M.getOrInsertGlobal("TraceLoadStr", TraceLoadStr->getType());
dyn_cast<GlobalVariable>(TraceLoadStrVar)->setInitializer(TraceLoadStr);
Constant *TraceStoreStr = llvm::ConstantDataArray::getString(CTX, "[Write] Wrote value 0x%lx to address %p\n");
Constant *TraceStoreStrVar = M.getOrInsertGlobal("TraceStoreStr", TraceStoreStr->getType());
dyn_cast<GlobalVariable>(TraceStoreStrVar)->setInitializer(TraceStoreStr);
llvm::Value *StrPtr;
if (IsLoad)
StrPtr = Builder.CreatePointerCast(TraceLoadStrVar, FprintfArgs[1], "loadStrPtr");
else
StrPtr = Builder.CreatePointerCast(TraceStoreStrVar, FprintfArgs[1], "storeStrPtr");
GlobalVariable *FPGlobal = M.getNamedGlobal(FilePointerVarName);
llvm::LoadInst *FP = Builder.CreateLoad(PointerType::getUnqual(Type::getInt8Ty(CTX)), FPGlobal);
Builder.CreateCall(Fprintf, {FP, StrPtr, Function.getArg(1), Function.getArg(0)});
}
This code does a few things:
- It defines
fprintf
's signature using LLVM primitives, and then retrieves it usingllvm::Module::getOrInsertFunction
- It adds our tracing strings as global variables in the module
- It retrieves our global file pointer - and adds a call to
fprintf
, passing in the file pointer, the appropriate tracing strings, and the arguments passed to ourtraceMemory
function
And we can actually see right away that all of our code works!
> cat main.c
#include <stdint.h>
void _TraceMemory(void *Addr, uint64_t Value, int IsLoad);
int main() {
_TraceMemory((void*)0x1234, 0x400, 1);
return 0;
}
> clang -S -emit-llvm main.c
> opt -load-pass-plugin ./lib/libMemoryTrace.so -passes=memory-trace main.ll -S -o modified_main.ll
Found main in module main.ll
> clang modified_main.ll -o modified_main
> ./modified_main
> cat memory-traces.log
[Read] Read value 0x400 from address 0x1234
It works!!! All we had to do was define the function signature in our main.c
- and our LLVM Pass
did the rest of the work!
The next step is an exciting one - patching all memory access opcodes to call our function.