Building and evolving your own VM

Introduction to VMs

A few days ago an article about the Quake3 source code by Fabien Sanglard surfaced on the web and I couldn’t help wondering about id‘s use of their own VM named QVM. The article itself is really well written with lots of pointers to additional information on the game design (source code) itself, but for anyone wondering about what it actually means to write something like that on a conceptual level, this example might not be the first choice. Please don’t take this as a full article. Instead, I’m going to provide links to places on where to find the basics. You can always come back later for the improvements suggested here.

As always, it’s good to start with the Wikipedia entry on Virtual Machines. But don’t get too exited there as well, because virtualizing numerous operating systems on a single host may still be over the top. Instead, we’ll focus on how byte code is executed by the VM in a register-based implementation. For completeness’ sake, here you can find a stack-based implementation in Ruby – it’s a good opportunity to learn Ruby as well.

I found solid initial work. You can read and try this program. It’s what I’ll use as a base for the following code snippets. The full source can be found on github. Ok, let’s see what we can improve…

Putting things into context

A good way to make your VM code more clean is by getting rid of global variables. They may not be dreaded by the compiler, but they’ll cost you performance in the long run. They also make multithreading a lot trickier, because you need to watch out for concurrent access. Grouping your data into structs is a good way to avoid these problems:

struct cvm_instance {
	/* pointer to initial program */
	const unsigned int *code;
	/* vm can issue halt command */
	unsigned int running;
	/* program counter */
	unsigned int pc;
	/* main registers */
	unsigned short r[4];
	/* additional memory */
	unsigned short m[256];

struct cvm_instruction {
	/* these hold the decoded instruction */
	unsigned int r1, r2, r3, num, imm;

Be sure to group variables by their use and/or meaning. In this code snipped there’s a strict differentiation of a VM’s instance and instruction. An instance can be thought of as a snapshot of the current state of execution. The instruction just holds the decoded values, which are used to modify the instance. You could save the instance as a snapshot in a file and resume it later. You can also multiplex between different instances to guarantee fair use of the CPU, or you can go for multithreading to run two VMs in parallel on different CPUs. In the case of cvm, these structs will be kept on the stack nicely wrapped inside the entry point function:

static void run(const unsigned int *program)
	struct cvm_instruction vi;
	struct cvm_instance vm;

	memset(&vm, 0, sizeof(vm));
	vm.code = program;
	vm.running = 1;

	while (vm.running) {
		fetch_and_decode(&vm, &vi);
		evaluate(&vm, &vi);


Adding memory to your VM

Registers may be plenty (well, not in our case), but there’s nothing like the access to a vast memory pool. To make use of such a feature, we’ll need two new instructions for loading and storing values from this magical place. (The memory already exists in the instance structure.)

case 4:
	printf("load r%d <- @0x%02x\n", vi->r1, vi->imm);
	vm->r[vi->r1] = vm->m[vi->imm];
case 5:
	printf("store @0x%02x <- r%d\n", vi->imm, vi->r1);
	vm->m[vi->imm] = vm->r[vi->r1];

Actually, that’s exactly what happens when more complex computations in a program exhaust the availability of registers – some values need to go away for a while to make room in the CPU itself. This issue is mitigated by the use of numerous levels of CPU cache, but, eventually, the values need to be flushed to the main memory. This can be a very slow process, which will stall the next execution for a few CPU cycles.

Having fun with Syscalls

When building a special-purpose VM you may want to keep the instruction set minimal, but from time to time you need to interact with the rest of the system. In the case of QVM, the cgame instance prepared the rendering of the scene, but the rendering itself is invoked via syscall. To do the same thing in cvm, let’s do a simple hello world example. First of all, we’ll need a print syscall:

case 6:
	printf("syscall %d\n", vi->imm);
	switch (vi->imm) {
	case 0:
		printf("%s\n", (const char *) &vm->m[vm->r[0]]);

Since there could be more than one syscall, we’ll use the immediate portion of the instruction to transport the type of syscall we are going to do. Now, there’s two problems: we need to construct a string to output and a way to reference it for the syscall. In stack-based implementations, we’d put the pointer on the stack before the syscall is invoked, but (since we don’t have a stack) we’ll use register 0 to provide the pointer to the string. That was the easy part. Now we need to create the string in order to pass it along:

const unsigned int hello[] = {

Basically, this moves a character to register 0 (0x65 == ‘e’), shifts it to the right 8 times to make room for a second character, which is put into register 1 (0x48 == ‘H’). Those registers are then added and stored in the main memory. This is repeated for all characters (and don’t forget the ‘\0’ at the end). We need to do this, because memory access is aligned to two bytes. Also, we put ‘e’ before ‘H’ because the example runs on a little endian machine. Anyway, the last few instructions adjust register 0 to point to the memory and then the syscall is executed.

Instead of ‘misusing’ a register, there could also be a dedicated memory pointer, but that traditionally means you always set the memory pointer first and then run a command referencing it. A direct load or store would grow from one instruction to two. Generally, RISC architectures offer more general purpose registers, while CISC architectures offer more special purpose registers. Neither of the two is wrong or right. RISC is easier to understand in any case, but CISC has become more and more prevalent in the PC industry.

Copy the program into memory before execution

This will bring maximum fun for two reasons. You’ll be able to reference static data like constants or strings compiled into your byte code. And you will be able to modify the byte code on the fly. All you need to do is to get rid of the “code” pointer in struct cvm_instance, copy the program into the memory on init, and fetch the instruction directly from the memory. The hello world example will drastically shrink in size:

const unsigned short hello[] = {

What this now does is append the whole null-terminated string after the halt instruction. It cannot be run as code with no way to modify the program counter, but we can reference it via a memory pointer. We push the memory address @3 into register 0 and do the print syscall. The output works just like before, but we don’t have to do a dynamic string builder type of program anymore.

Add a stack pointer and put it to use

We’ve avoided the subject of a stack before, but now is the time to implement it. This is mostly useful for saving register states when you need to do something else like another calculation or a call to a subroutine. In the second case it can also be used to pass on return values.

	case 9:
		printf("push r%d\n", vi->r1);
		vm->m[MEM_SIZE - (++vm->sp)] = vm->r[vi->r1];
	case 10:
		printf("pop r%d\n", vi->r1);
		vm->r[vi->r1] = vm->m[MEM_SIZE - (vm->sp--)];

The stack implementation uses a single stack pointer much like the program counter, but it counts upwards from the bottom of the memory. Using this rudimentary stack (with no safeguards whatsoever – try popping on an empty stack a few times!) we can write a short demonstration program like so:

const unsigned short pushpop[] = {

This program will show that even though it overwrites register 0 with zero, the original value of 10 is restored after calling the pop instruction. Likewise, it pushed 1, 2, and 3 onto the stack, only to pop 3, 2, and 1 back into registers 1, 2, 3, respectively. A stack can also be called Last In, First Out because of this particular behavior.

You might as well jump

Adding jumps is not hard at all. The program counter increase needs to be moved upwards and append a new command as follows.

case 11:
	printf("jump @%d\n", vi->imm);
	vm->pc = vi->imm;

A very simple test program will go zig-zag through the byte code while picking up one load instruction, filling register 0 with 1:

const unsigned short jumper[] = {

What’s next?

Here are a few things you can do to further improve the VM:

  1. Write another example program to get a feel for the VM.
  2. Add more instructions. A good reference would be the MIPS instruction set.
  3. Remove the memory alignment restrictions – don’t forget the program counter!

Okay, that’s it for now. If you have questions or corrections, please leave a comment. Don’t forget that the source code is available via github.

2 thoughts on “Building and evolving your own VM”

  1. WOW! Excellent and very informative article. I am a big fan of video games (including Quake and those based on it, like the Jedi Knight games) and I’ve always wondered about the QVM. Now that RavenSoft has released the source code it’s easy to see how it’s implemented in code but it’s always better to start with the theory in my opinion.

    I must admit I am still somewhat unclear as to why anyone would want to do this though. If it’s a cross-platform game it makes sense, since the real game code could be implemented in VM and only the platform-specific stuff be actual executables. But at least in JK2’s case (not sure about Quake), they were originally intended for Windows-only. (Maybe plans for other OSes in the future that were never finished though?)

    Still, what are the advantages of using a custom VM as opposed to a scripting language? Obviously there’s the speed factor (which is very important in games), but I can’t really think of anything else.

    Thank you for providing such an easy-to-understand reference for a seemingly complicated concept. I was somewhat unsure of how to go about doing this but your article certainly helped a lot!

  2. Hi Michael,

    that’s a good question. One that I probably couldn’t have answered back when the original article was written. Nowadays, however, the approach takes boils down to three basic things:

    (a) The author really likes VM designs. ;)

    (b) It’s quite like a network designed to transport slices of information: the VM view helps to slice time into visible frames, which inherently helps to clean the code and let it not grow out of proportions.

    (c) It also separates data and control flow. The impact on performance is minimal, the gains are a clean and extensible (and also replaceable) architecture.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.