Building a VM Instruction Set in Rust

In this comprehensive tutorial, we’ll build a basic Virtual Machine (VM) in Rust. It isn’t just about coding; it’s about understanding the core concepts of virtualization, instruction sets, and how to implement these ideas in a practical, hands-on manner.

By the end of this tutorial, you will have a deeper understanding of VMs and a working Rust application that simulates a simple VM.

What is a Virtual Machine?

A Virtual Machine is a software emulation of a physical computer. It’s an abstraction layer that runs between the hardware and the operating system or applications, allowing multiple operating systems to coexist on the same physical hardware or enabling software to run in a consistent environment regardless of the underlying hardware.

VMs are widely used for various purposes, from running different operating systems on a single physical machine (like Windows on a Mac) to providing isolated environments for software development and testing.

What are Instruction Sets?

An instruction set is a group of commands that a VM or a processor can execute. These instructions can range from simple arithmetic operations (like addition and subtraction) to complex operations involving memory management and I/O handling. The richness and efficiency of an instruction set play a crucial role in determining the performance and capabilities of a VM or a CPU.

Examples of Virtual Machines

A well-known example of a VM is the Java Virtual Machine (JVM), which allows Java applications to run on any device with a JVM installed, irrespective of the underlying hardware and operating system. This “write once, run anywhere” capability is a significant advantage of using VMs.

Implementing the VM in Rust

Step 1: Setting Up the Rust Environment

Ensure Rust is installed on your system. Create a new Rust project using Cargo:

cargo new my_virtual_machine
cd my_virtual_machine

Step 2: Defining the Instruction Set

Start by defining the instructions your VM will support:

#[derive(Clone)]
enum Operand {
    Value(i32),
    Var(String),
}

#[derive(Clone)]
enum Instruction {
    Push(i32),
    Add(Operand, Operand),
    Sub(Operand, Operand),
    Mul(Operand, Operand),
    Div(Operand, Operand),
    Print,
    Set(String, i32),
    Get(String),
    Input(String),
    If(Vec<Instruction>, Vec<Instruction>),
    Else(Vec<Instruction>),
}

Step 3: Building the VM Structure

Create a struct to represent the VM, which includes a stack for operands and a hashmap for variables:

struct VM {
    stack: Vec<i32>,
    vars: HashMap<String, i32>,
}

Step 4: Implementing the Instruction Logic

Implement the logic to execute each instruction:

    fn new() -> VM {
        VM {
            stack: Vec::new(),
            vars: HashMap::new(),
        }
    }

    fn get_operand_value(&self, operand: &Operand) -> i32 {
        match operand {
            Operand::Value(val) => *val,
            Operand::Var(var_name) => *self.vars.get(var_name)
                .expect("Variable not found"),
        }
    }

    fn run(&mut self, program: Vec<Instruction>, path: &str) {
        let mut pc = 0; // Program counter
        while pc < program.len() {
            match &program[pc] {
                //PUSH
                Instruction::Push(val) => self.stack.push(*val),

                //ADDITION
                Instruction::Add(op1, op2) => {
                    let val1 = self.get_operand_value(op1);
                    let val2 = self.get_operand_value(op2);
                    self.stack.push(val1 + val2);
                },

                //SUBSTRACTION
                Instruction::Sub(op1, op2) => {
                    let val1 = self.get_operand_value(op1);
                    let val2 = self.get_operand_value(op2);
                    self.stack.push(val1 - val2);
                },

                //MULTIPLICATION
                Instruction::Mul(op1, op2) => {
                    let val1 = self.get_operand_value(op1);
                    let val2 = self.get_operand_value(op2);
                    self.stack.push(val1 * val2);
                },

                //DIVISION
                Instruction::Div(op1, op2) => {
                    let val1 = self.get_operand_value(op1);
                    let val2 = self.get_operand_value(op2);
                    if val2 == 0 {
                        panic!("Division by zero");
                    }
                    self.stack.push(val1 / val2);
                },

                //PRINT
                Instruction::Print => {
                    if let Some(top) = self.stack.last() {
                        println!("{}", top);
                    } else {
                        println!("Stack is empty");
                    }
                },

                //SET VARIABLE
                Instruction::Set(var_name, value) => {
                    self.vars.insert(var_name.clone(), *value);
                },

                //GET VARIABLE
                Instruction::Get(var_name) => {
                    if let Some(&value) = self.vars.get(var_name) {
                        self.stack.push(value);
                    } else {
                        panic!("Undefined variable: {}", var_name);
                    }
                },

                //GET USER INPUT from the command line
                Instruction::Input(var_name) => {
                    let mut input = String::new();
                    io::stdin().read_line(&mut input).expect("Failed to read line");
                    let value = input.trim().parse::<i32>().expect("Invalid input");
                    self.vars.insert(var_name.clone(), value);
                },

                //PROCESS IF instructions
                Instruction::If(if_block, else_block) => {
                    if let Some(top) = self.stack.last() {
                        if *top != 0 {
                            self.run(if_block.to_vec(), path); // IF the value at the stack is > 0, execute the IF instruction
                        } else if !else_block.is_empty() { // If the value at the stack = 0, execute the else
                            if let Ok(file) = File::open(path) {
                                let reader = io::BufReader::new(file);
                                let mut else_block_clone = else_block.clone(); // Clone the else_block
                                let mut else_block_reader = reader.lines();

                                for next_line in &mut else_block_reader {
                                    if let Ok(next_line) = next_line {
                                        else_block_clone.extend(parse_instruction(&next_line));
                                    }
                                }
                                self.run(else_block_clone, path); // Pass the cloned else_block
                            } else {
                                panic!("Failed to open file: {}", path);
                            }
                        }
                    } else {
                        panic!("Stack is empty");
                    }
                },

                //Process the ELSE block
                Instruction::Else(else_block) => {
                    // This is only executed if the 'if' condition was not met,
                    // so we don't need to check the stack again.
                    self.run(else_block.to_vec(), path); // Pass path as an argument
                },
            }
            pc += 1;
        }
    }

Step 5: Parsing Instructions from a File

Implement functionality to load and parse instructions from a file. This requires reading a file line by line and converting each line into an Instruction:

    fn load_program(reader: &mut io::BufReader<File>) -> io::Result<Vec<Instruction>> {
        let mut program = Vec::new();

        // Read all lines into a vector
        let lines: Vec<String> = reader.lines().collect::<Result<_, _>>()?;

        // Temporary storage for IF/ELSE blocks
        let mut if_block = Vec::new();
        let mut else_block = Vec::new();
        let mut in_if_block = false;
        let mut in_else_block = false;

        for line in lines.iter() {
            let parts: Vec<&str> = line.split_whitespace().collect();

            // Handle the start of an IF block
            if parts.get(0) == Some(&"IF") {
                in_if_block = true;
                in_else_block = false;
                continue;
            }

            // Handle the start of an ELSE block
            if parts.get(0) == Some(&"ELSE") {
                in_else_block = true;
                in_if_block = false;
                continue;
            }

            // Check if currently inside an IF or ELSE block
            if in_if_block || in_else_block {
                let block = if in_if_block { &mut if_block } else { &mut else_block };

                // Add instruction to the current block
                block.extend(parse_instruction(line));

                // Check for the end of the block
                if parts.get(0) == Some(&"ENDIF") {
                    if in_if_block {
                        program.push(Instruction::If(if_block.clone(), else_block.clone()));
                    } else {
                        program.push(Instruction::Else(else_block.clone()));
                    }
                    if_block.clear();
                    else_block.clear();
                    in_if_block = false;
                    in_else_block = false;
                }

                continue;
            }

            // Parse other instructions
            let instruction = parse_instruction(line);
            program.extend(instruction);
        }

        Ok(program)
    }

Implement additional parsing methods to improve code readability:

fn parse_operand(op_str: &str) -> Operand {
    if let Ok(val) = op_str.parse::<i32>() {
        Operand::Value(val)
    } else {
        Operand::Var(op_str.to_string())
    }
}

fn extract_var_name(operand: &str) -> &str {
    operand.trim_start_matches("Var(\"").trim_end_matches("\")")
}

fn parse_instruction(line: &str) -> Vec<Instruction> {
    let parts: Vec<&str> = line.split_whitespace().collect();
    match parts.as_slice() {
        ["PUSH", num] => vec![Instruction::Push(num.parse::<i32>().expect("Invalid number"))],
        ["ADD", op1, op2] => {
            let operand1 = parse_operand(extract_var_name(op1));
            let operand2 = parse_operand(extract_var_name(op2));
            vec![Instruction::Add(operand1, operand2)]
        },
        ["SUB", op1, op2] => {
            let operand1 = parse_operand(extract_var_name(op1));
            let operand2 = parse_operand(extract_var_name(op2));
            vec![Instruction::Sub(operand1, operand2)]
        },
        ["MUL", op1, op2] => {
            let operand1 = parse_operand(extract_var_name(op1));
            let operand2 = parse_operand(extract_var_name(op2));
            vec![Instruction::Mul(operand1, operand2)]
        },
        ["DIV", op1, op2] => {
            let operand1 = parse_operand(extract_var_name(op1));
            let operand2 = parse_operand(extract_var_name(op2));
            vec![Instruction::Div(operand1, operand2)]
        },
        ["PRINT"] => vec![Instruction::Print],
        ["SET", var_name, value] => {
            let value = value.parse::<i32>().expect("Invalid number");
            vec![Instruction::Set(var_name.to_string(), value)]
        },
        ["GET", var_name] => vec![Instruction::Get(var_name.to_string())],
        ["Input", var_name] => vec![Instruction::Input(var_name.to_string())],
        _ => vec![],
    }
}

// Function to create a BufReader and call VM::load_program
fn load_program_and_run(file_path: &str) -> Result<(), Box<dyn std::error::Error>> {
    let file = match File::open(file_path) {
        Ok(file) => file,
        Err(e) => {
            eprintln!("Failed to open file: {}", e);
            return Err(Box::new(e)); // Return an error
        }
    };
    let mut reader = io::BufReader::new(file);

    // Create a VM instance
    let mut vm = VM::new();

    // Load and run the program
    match VM::load_program(&mut reader) {
        Ok(program) => {
            vm.run(program, file_path); // Just call run without expecting a Result
            // Handle any other necessary logic here if needed
        }
        Err(e) => {
            eprintln!("Failed to load program: {}", e);
            return Err(Box::new(e)); // Return an error
        }
    }

    Ok(()) // Return Ok to indicate success
}

Step 6: Handling Command Line Arguments

Modify the main function to take the file path as a command line argument:

fn main() {
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: {} <program_file.rm>", args[0]);
        process::exit(1);
    }

    let file_path = &args[1];

    match load_program_and_run(file_path) {
        Ok(_) => {
            println!("Program executed successfully.");
        }
        Err(e) => {
            eprintln!("Error: {}", e);
            process::exit(1);
        }
    }
}

Step 7: Testing the VM

Create a text file with a series of instructions and use it to test your VM. For example:

Input y
GET y
Input x
GET x
ADD Var("x") Var("y")
PRINT
IF
    GET x
    PRINT
ELSE
    GET y
    PRINT
ENDIF

This should:

Ask for user input from the command line
Store it in a variable y
Ask for a new user input from the command line
Store it in a variable x
Add the two variables' values
Print the top of the stack, which will contain the result of the Addition
Evaluate the top of the stack — if the value is 0, execute the IF block, which will print the value of the yvariable. If not, execute the ELSE block, which will print the value of the x variable.

Play around with the VM by combining operators, variables, and so on!

This tutorial has guided you through creating a simple VM in Rust, demonstrating core concepts like instruction sets and VM operation.

Keep exploring and experimenting, and you’ll find that the world of VMs offers endless opportunities for learning and innovation.

You can find the complete implementation in my GitHub repository: https://github.com/luishsr/rustvm.

Building a VM Instruction Set in Rust

What is a Virtual Machine?

What are Instruction Sets?

Examples of Virtual Machines

Implementing the VM in Rust

Step 1: Setting Up the Rust Environment

Step 2: Defining the Instruction Set

Step 3: Building the VM Structure

Step 4: Implementing the Instruction Logic

Practice what you learned

Step 5: Parsing Instructions from a File

Step 6: Handling Command Line Arguments

Step 7: Testing the VM

Practice what you learned

Go deeper with these books

Related Articles

Working with Strings in Rust: A Definitive Guide

Data Serialization in Rust with Serde

Implementing a Secret Vault in Rust

Building a P2P Database in Rust

Master Strings & Text hands-on

Related Articles

Strings & Text
Working with Strings in Rust: A Definitive Guide
Rust’s approach to strings can be a bit challenging for newcomers to the language or developers familiar with strings in other languages…

Strings & Text
Data Serialization in Rust with Serde
In our digital age, data is the lifeblood of countless applications. But there’s a tiny secret behind the scenes: the art and science of…

Strings & Text
Implementing a Secret Vault in Rust
Hey there, fellow Rustacean!

Strings & Text
Building a P2P Database in Rust
Hey there! If you’ve landed here, you’re probably interested in understanding how distributed systems work, particularly in Rust. Today…