We want our pass to add a function to the binary that performs this logic:
void traceMemory(void *Addr, uint64_t Value, bool IsLoad) {
if (IsLoad)
fprintf(_MemoryTraceFP, "[Read] Read value 0x%lx from address %p\n", Value, Addr);
else
fprintf(_MemoryTraceFP, "[Write] Wrote value 0x%lx to address %p\n", Value, Addr);
}
Notice how the calls to fprintf
write to a FILE *_MemoryTraceFP
- where should this file
pointer come from? We could insert fopen/fclose
logic directly into traceMemory
- but then we
would be opening and closing our log on each memory access, which is wasteful.
The preferable solution is to define a global FILE *_MemoryTraceFP
- and initialize it just once.
Much like C, LLVM IR does not allow for top-level initialization of a global variable like this,
i.e. we cannot define
FILE *_MemoryTraceFP = fopen(...);
on a global scope. So where should we initialize the file pointer?
The appropriate place for this initialization is right at the beginning of main
. Since we’re
implementing a module pass, we’ll need to identify the module that defines main
- and add
initialization opcodes to main
's entrypoint.
Step One - Add A Global File Pointer to Our main
Module
All we want to do is add a file pointer to the main
-containing-module’s global scope - so that we
can use it in our traceMemory
function.
Let’s start fleshing out our pass’s run
function:
llvm::PreservedAnalyses run(llvm::Module &M,
llvm::ModuleAnalysisManager &) {
Function *main = M.getFunction("main");
if (main) {
addGlobalMemoryTraceFP(M);
errs() << "Found main in module " << M.getName() << "\n";
return llvm::PreservedAnalyses::none();
} else {
errs() << "Did not find main in " << M.getName() << "\n";
return llvm::PreservedAnalyses::all();
}
And let’s define addGlobalMemoryTraceFP
as follows:
const std::string FilePointerVarName = "_MemoryTraceFP";
void addGlobalMemoryTraceFP(llvm::Module &M) {
auto &CTX = M.getContext();
M.getOrInsertGlobal(FilePointerVarName, PointerType::getUnqual(Type::getInt8Ty(CTX)));
GlobalVariable *namedGlobal = M.getNamedGlobal(FilePointerVarName);
namedGlobal->setLinkage(GlobalValue::ExternalLinkage);
}
In essence, all we do is define an externally-linked int8 *_MemoryTraceFP
and add it to our
module using llvm::Module::getOrInsertGlobal
.
Note
What linkage should our global file pointer have? We’ll only need it inside traceMemory
- which
we can add to the same compilation module as main
.
traceMemory
will need external linkage - because we’ll want to call it from all modules - but the
file pointer itself should have internal linkage.
Why then do we use GlobalValue::ExternalLinkage
and not GlobalValue::InternalLinkage
? Because
LLVM seems to have weird behavior - I suspect a bug (I used the LLVM 13 toolchain) - where if we
use InternalLinkage
we get this error:
”Global is external, but doesn't have external or weak linkage!”
We can see the effects of this pass:
> cat main.c
int main() { return 0; }
> clang -S -emit-llvm main.c
> opt -load-pass-plugin ./lib/libMemoryTrace.so -passes=memory-trace main.ll -S
...
@_MemoryTraceFP = external global i8*
..
Cool! Now let’s add an initialization of this global variable to our main
function:
Step Two - Initializing the Global File Pointer in main
Let’s add another function call to our pass’s run
function:
llvm::PreservedAnalyses run(llvm::Module &M,
llvm::ModuleAnalysisManager &) {
Function *main = M.getFunction("main");
if (main) {
addGlobalMemoryTraceFP(M);
addMemoryTraceFPInitialization(M, *main);
errs() << "Found main in module " << M.getName() << "\n";
return llvm::PreservedAnalyses::none();
} else {
errs() << "Did not find main in " << M.getName() << "\n";
return llvm::PreservedAnalyses::all();
}
}
Now, let’s think about what needs to happen in our addMemoryTraceFPInitialization
function:
-
We need to make sure we can use
fopen
by usingllvm::Module::getOrInsertFunction
-
fopen
has this signature:FILE *fopen(const char *filename, const char *mode)
And so we need to define the
filename
andmode
we will pass into the call tofopen
-
We need to introduce an actual call to
fopen
to the beginning ofmain
The implementation of these three stages is hidden away in toggles for brevity.
Putting it all together, we get:
void addMemoryTraceFPInitialization(llvm::Module& M, llvm::Function &MainFunc) {
auto &CTX = M.getContext();
std::vector<llvm::Type*> FopenArgs{
PointerType::getUnqual(Type::getInt8Ty(CTX)),
PointerType::getUnqual(Type::getInt8Ty(CTX))
};
FunctionType *FopenTy = FunctionType::get(PointerType::getUnqual(Type::getInt8Ty(CTX)),
FopenArgs,
false);
FunctionCallee Fopen = M.getOrInsertFunction("fopen", FopenTy);
Constant *FopenFileNameStr = llvm::ConstantDataArray::getString(CTX, "memory-traces.log");
Constant *FopenFilenameStrVar = M.getOrInsertGlobal("FopenFileNameStr", FopenFileNameStr->getType());
dyn_cast<GlobalVariable>(FopenFilenameStrVar)->setInitializer(FopenFileNameStr);
Constant *FopenModeStr = llvm::ConstantDataArray::getString(CTX, "w+");
Constant *FopenModeStrVar = M.getOrInsertGlobal("FopenModeStr", FopenModeStr->getType());
dyn_cast<GlobalVariable>(FopenModeStrVar)->setInitializer(FopenModeStr);
IRBuilder<> Builder(&*MainFunc.getEntryBlock().getFirstInsertionPt());
llvm::Value *FopenFilenameStrPtr = Builder.CreatePointerCast(FopenFilenameStrVar, FopenArgs[0],
"fileNameStr");
llvm::Value *FopenModeStrPtr = Builder.CreatePointerCast(FopenModeStrVar, FopenArgs[0],
"modeStr");
llvm::Value *FopenReturn = Builder.CreateCall(Fopen, {FopenFilenameStrPtr, FopenModeStrPtr});
GlobalVariable *FPGlobal = M.getNamedGlobal(FilePointerVarName);
Builder.CreateStore(FopenReturn, FPGlobal);
}
If we run our pass, we can see that memory-traces.log
is created!
> ls
# Does not contain memory-traces.log!
> opt -load-pass-plugin ./lib/libMemoryTrace.so -passes=memory-trace main.ll -S -o modified_min.ll
Found main in module main.ll
> clang modified_main.ll -o modified_main
> ./modified_main
> ls
# Contains memory-traces.log!!!
This is really cool!! We initiated the creation of a file from our executable using nothing but LLVM IR instructions introduced by our LLVM Pass. Magic!
The generated LLVM IR looks cool too:
@_MemoryTraceFP = external global i8*
@FopenFileNameStr = global [18 x i8] c"memory-traces.log\00"
@FopenModeStr = global [3 x i8] c"w+\00"
define dso_local i32 @main() #0 {
%1 = call i8* @fopen(i8* getelementptr inbounds ([18 x i8], [18 x i8]* @FopenFileNameStr, i32 0, i32 0), i8* getelementptr inbounds ([3 x i8], [3 x i8]* @FopenModeStr, i32 0, i32 0))
%2 = load i8*, i8** @_MemoryTraceFP
store i8* %1, i8** @_MemoryTraceFP
Up next - implementing
our traceMemory
function using this global file pointer.