We want our pass to add a function to the binary that performs this logic:

void traceMemory(void *Addr, uint64_t Value, bool IsLoad) {
    if (IsLoad)
        fprintf(_MemoryTraceFP, "[Read] Read value 0x%lx from address %p\n", Value, Addr);
    else
        fprintf(_MemoryTraceFP, "[Write] Wrote value 0x%lx to address %p\n", Value, Addr);
}

Notice how the calls to fprintf write to a FILE *_MemoryTraceFP - where should this file pointer come from? We could insert fopen/fclose logic directly into traceMemory - but then we would be opening and closing our log on each memory access, which is wasteful.

The preferable solution is to define a global FILE *_MemoryTraceFP - and initialize it just once. Much like C, LLVM IR does not allow for top-level initialization of a global variable like this, i.e. we cannot define

FILE *_MemoryTraceFP = fopen(...);

on a global scope. So where should we initialize the file pointer?

The appropriate place for this initialization is right at the beginning of main. Since we’re implementing a module pass, we’ll need to identify the module that defines main - and add initialization opcodes to main's entrypoint.

Step One - Add A Global File Pointer to Our main Module

All we want to do is add a file pointer to the main-containing-module’s global scope - so that we can use it in our traceMemory function.

Let’s start fleshing out our pass’s run function:

llvm::PreservedAnalyses run(llvm::Module &M,
                        llvm::ModuleAnalysisManager &) {
    Function *main = M.getFunction("main");
    if (main) {
            addGlobalMemoryTraceFP(M);
            errs() << "Found main in module " << M.getName() << "\n";
            return llvm::PreservedAnalyses::none();
    } else {
            errs() << "Did not find main in " << M.getName() << "\n";
            return llvm::PreservedAnalyses::all();
    }
 

And let’s define addGlobalMemoryTraceFP as follows:

const std::string FilePointerVarName = "_MemoryTraceFP";
 
void addGlobalMemoryTraceFP(llvm::Module &M) {
    auto &CTX = M.getContext();
 
    M.getOrInsertGlobal(FilePointerVarName, PointerType::getUnqual(Type::getInt8Ty(CTX)));
 
    GlobalVariable *namedGlobal = M.getNamedGlobal(FilePointerVarName);
    namedGlobal->setLinkage(GlobalValue::ExternalLinkage);
}

In essence, all we do is define an externally-linked int8 *_MemoryTraceFP and add it to our module using llvm::Module::getOrInsertGlobal.

We can see the effects of this pass:

> cat main.c
int main() { return 0; }
 
> clang -S -emit-llvm main.c
 
> opt -load-pass-plugin ./lib/libMemoryTrace.so -passes=memory-trace main.ll -S
...
 
@_MemoryTraceFP = external global i8*
 
..

Cool! Now let’s add an initialization of this global variable to our main function:

Step Two - Initializing the Global File Pointer in main

Let’s add another function call to our pass’s run function:

llvm::PreservedAnalyses run(llvm::Module &M,
                        llvm::ModuleAnalysisManager &) {
    Function *main = M.getFunction("main");
    if (main) {
        addGlobalMemoryTraceFP(M);
        addMemoryTraceFPInitialization(M, *main);
        errs() << "Found main in module " << M.getName() << "\n";
        return llvm::PreservedAnalyses::none();
    } else {
        errs() << "Did not find main in " << M.getName() << "\n";
        return llvm::PreservedAnalyses::all();
    }
}

Now, let’s think about what needs to happen in our addMemoryTraceFPInitialization function:

  1. We need to make sure we can use fopen by using llvm::Module::getOrInsertFunction

  2. fopen has this signature:

    FILE *fopen(const char *filename, const char *mode)

    And so we need to define the filename and mode we will pass into the call to fopen

  3. We need to introduce an actual call to fopen to the beginning of main

The implementation of these three stages is hidden away in toggles for brevity.

Putting it all together, we get:

void addMemoryTraceFPInitialization(llvm::Module& M, llvm::Function &MainFunc) {
    auto &CTX = M.getContext();
 
    std::vector<llvm::Type*> FopenArgs{
                                    PointerType::getUnqual(Type::getInt8Ty(CTX)),
                                    PointerType::getUnqual(Type::getInt8Ty(CTX))
                                    };
 
    FunctionType *FopenTy = FunctionType::get(PointerType::getUnqual(Type::getInt8Ty(CTX)),
                                    FopenArgs,
                                    false);
 
    FunctionCallee Fopen = M.getOrInsertFunction("fopen", FopenTy);
 
    Constant *FopenFileNameStr = llvm::ConstantDataArray::getString(CTX, "memory-traces.log");
    Constant *FopenFilenameStrVar = M.getOrInsertGlobal("FopenFileNameStr", FopenFileNameStr->getType());
    dyn_cast<GlobalVariable>(FopenFilenameStrVar)->setInitializer(FopenFileNameStr);
 
    Constant *FopenModeStr = llvm::ConstantDataArray::getString(CTX, "w+");
    Constant *FopenModeStrVar = M.getOrInsertGlobal("FopenModeStr", FopenModeStr->getType());
    dyn_cast<GlobalVariable>(FopenModeStrVar)->setInitializer(FopenModeStr);
 
    IRBuilder<> Builder(&*MainFunc.getEntryBlock().getFirstInsertionPt());
    llvm::Value *FopenFilenameStrPtr = Builder.CreatePointerCast(FopenFilenameStrVar, FopenArgs[0],
                                                                    "fileNameStr");
    llvm::Value *FopenModeStrPtr = Builder.CreatePointerCast(FopenModeStrVar, FopenArgs[0],
                                                                    "modeStr");
    llvm::Value *FopenReturn = Builder.CreateCall(Fopen, {FopenFilenameStrPtr, FopenModeStrPtr});
 
    GlobalVariable *FPGlobal = M.getNamedGlobal(FilePointerVarName);
    Builder.CreateStore(FopenReturn, FPGlobal);
}

If we run our pass, we can see that memory-traces.log is created!

> ls
# Does not contain memory-traces.log!
 
> opt -load-pass-plugin ./lib/libMemoryTrace.so -passes=memory-trace main.ll -S -o modified_min.ll
Found main in module main.ll
 
> clang modified_main.ll -o modified_main
 
> ./modified_main
 
> ls
# Contains memory-traces.log!!!

This is really cool!! We initiated the creation of a file from our executable using nothing but LLVM IR instructions introduced by our LLVM Pass. Magic!

The generated LLVM IR looks cool too:

@_MemoryTraceFP = external global i8*
@FopenFileNameStr = global [18 x i8] c"memory-traces.log\00"
@FopenModeStr = global [3 x i8] c"w+\00"
 
define dso_local i32 @main() #0 {
  %1 = call i8* @fopen(i8* getelementptr inbounds ([18 x i8], [18 x i8]* @FopenFileNameStr, i32 0, i32 0), i8* getelementptr inbounds ([3 x i8], [3 x i8]* @FopenModeStr, i32 0, i32 0))
  %2 = load i8*, i8** @_MemoryTraceFP
  store i8* %1, i8** @_MemoryTraceFP

Up next - implementing our traceMemory function using this global file pointer.