Step 1 - What do we want to do?

LLVM Passes work in the context of a pass manager - which is documented as a manager that “Manages a sequence of passes over a particular unit of IR.” There are four types of pass managers:

  1. FunctionPassManager
  2. ModulePassManager
  3. LoopPassManager
  4. CGSCCPassManager - which are used to analyze and manipulate the call graph’s SCCs (strongly connected components)

Since we want to integrate our pass into one of these pass managers, we need to decide what sort of pass we’re implementing.

For implementing memory traces, we really just want to go over opcodes - so perhaps the most appropriate pass is a function pass, which will allow us to go over the opcodes of each function like so:

void OurPass::run(llvm::Function &Func,
                llvm::FunctionAnalysisManager &) {
    for (auto &BB : Func) {
        for (auto &Inst : BB) {
            // Run custom logic on the instruction
        }
    }
}

This is almost what we want, but not quite. We also want to do the tracing itself - which involves an external call to a function like fprintf. To make use of a utility function like this

  • which might not even be included or defined in the code we’re compiling - we need to use LLVM’s Module::getOrInsertFunction function - which either retrieves the function from the compilation module or adds the prototype we specify to the module.

Thus, to support the usage of C library functions, we need to implement a module pass.

Our pass will essentially do three things:

  1. Make sure that fprintf is defined in the compilation unit
  2. Go over every function in the module
    1. Go over every instruction in the function
      1. If the instruction is a memory access - generate a call to fprintf that traces this access

Step 2 - Boilerplate

LLVM Passes involve a lot of boilerplate code - both in terms of the surrounding compiling-and-running infrastructure, and in terms of the code itself. We’ll hide all of the ugly cmake details inside a toggle:

The LLVM Pass Implementation

Here is the boilerplate skeleton code for the LLVM Pass, based on code from llvm-tutor:

#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
#include "llvm/Support/raw_ostream.h"
 
using namespace llvm;
 
namespace {
struct MemoryTrace : public llvm::PassInfoMixin<MemoryTrace> {
	llvm::PreservedAnalyses run(llvm::Module &M,
				llvm::ModuleAnalysisManager &) {
		errs() << "In here\n\n";
		return llvm::PreservedAnalyses::all();
	}
 
	bool runOnModule(llvm::Module &M) {
		return true;
	}
};
} // namespace
 
llvm::PassPluginLibraryInfo getMemoryTracePluginInfo() {
	return {LLVM_PLUGIN_API_VERSION, "MemoryTrace", LLVM_VERSION_STRING,
		[](PassBuilder &PB) {
			PB.registerPipelineParsingCallback(
				[](StringRef Name, ModulePassManager &MPM,
				ArrayRef<PassBuilder::PipelineElement>) {
					if (Name == "memory-trace") {
						MPM.addPass(MemoryTrace());
						return true;
					}
 
					return false;
				}
			);
		}};
}
 
extern "C" LLVM_ATTRIBUTE_WEAK ::llvm::PassPluginLibraryInfo
llvmGetPassPluginInfo() {
  return getMemoryTracePluginInfo();
}

This code does a few things:

  • Defines our MemoryTrace struct, which will call our core logic in its run function. This function is called by the ModulePassManager, and indeed:
  • We register a pass, identified by the name "memory-trace" , to the ModulePassManager
  • We let LLVM know of our plugin’s existence using the llvmGetPassPluginInfo function - which is the “public entry point for a pass plugin”

We can see that for now, we’re not doing anything interesting inside the pass - just printing out In here.

Using our LLVM Pass

I used these instructions to compile:

mkdir build
cd build
 
cmake -DLT_LLVM_INSTALL_DIR=/usr/lib/llvm-10 ..
make
 
clang -S -emit-llvm ../empty.c -o empty.ll

The first make compiles our pass in the form of a shared library ./lib/libMemoryTrace.so

clang then compiles a minimal C file:

> cat empty.c
int main() { return 0; }

into an LLVM IR file.

We can then run our pass on this IR:

> opt -load-pass-plugin ./lib/libMemoryTrace.so -passes=memory-trace -disable-output empty.ll
In here

Boom! Boilerplate done. The next page starts to flesh out our proper memory access tracing logic.