Strings & Text

Building a VM with Native ZK Proof Generation in Rust

In this article we will build a cryptographic virtual machine (VM) in Rust, inspired by the TinyRAM model, using a reduced instruction set…

By Luis SoaresNovember 17, 202414 min readOriginal on Medium

In this article we will build a cryptographic virtual machine (VM) in Rust, inspired by the TinyRAM model, using a reduced instruction set to keep things straightforward and efficient.

This VM will be able to execute computations while generating zero-knowledge proofs (ZKPs) at the same time, natively. It’s a practical tool for anyone working on cryptographic systems or exploring secure and verifiable processing.

Let’s get started!

Architecture

The project is divided into the following modules:

Core Virtual Machine (vm.rs)

Implements the VM execution, including the instruction set and state capture.
Handles runtime errors gracefully.

Zero-Knowledge Proof Module (zk_proof.rs)

Encapsulates proof generation and verification logic using Groth16.
Abstracts cryptographic operations to ensure modularity.

Utilities (utils.rs)

Provides helper functions, including file management and field conversion utilities.

Program Loader (program_loader.rs)

Reads and parses assembly-like programs for execution on the VM.

Circuit Implementation (execution_circuit.rs)

Models the VM’s state and operations as a constraint system for ZKP purposes.

Workflow

The Provable VM operates through a streamlined yet comprehensive workflow designed to enable program execution, traceability, and zero-knowledge proof generation. This approach ensures transparency, security, and privacy while maintaining computational integrity.

Program Input

The workflow begins with a program written in a simple, assembly-like language specifically designed for the Provable VM. This program comprises a series of instructions, such as PUSH, ADD, and STORE, which describe the computational operations to be performed. Each line of the program corresponds to an Instruction struct that encapsulates the opcode and its optional operand. The program is parsed and loaded into the VM, ready for execution.

Execution and State Capture

Once the program is loaded, the Provable VM begins executing instructions sequentially, starting from the first line. The VM interprets each instruction, updates its internal components — such as the program counter (pc), stack, and heap—and performs the required operation. At every step, the VM captures its current state, recording the pc, stack content, heap data, and flags into a ProvableState object. These snapshots form a detailed trace of the execution, ensuring that every intermediate computation step is recorded and traceable.

Trace Commitment Generation

The captured states during execution collectively form an execution trace, which represents the complete operational history of the program. This trace is then hashed using a cryptographic hash function, such as SHA-256, to produce a trace commitment. The trace commitment serves as a compact representation of the entire execution trace, enabling external entities to verify the validity of the trace without requiring access to the underlying data. This commitment enhances privacy and efficiency in trace validation.

Zero-Knowledge Proof (ZKP) Generation

With the execution trace and its commitment generated, the next step involves the creation of a zero-knowledge succinct non-interactive argument of knowledge (zkSNARK) proof. This proof asserts that the execution trace was generated by faithfully running the given program. Importantly, the proof guarantees that no sensitive information — such as the program’s internal states or specific values in the stack or heap — is revealed. The zkSNARK proof is generated using the cryptographic commitment, program, and captured states as inputs, ensuring that the entire computational process is represented without compromising privacy.

Join @CoinCodeCap on Telegram

Proof Verification

The final step in the workflow is proof verification. A verifier — typically an external party — uses the zkSNARK proof, program code, and trace commitment to validate that the execution trace aligns with the program’s prescribed behavior. If the proof is valid, the verifier can confidently conclude that the program was executed correctly, even without seeing the actual trace or internal states. This verification process ensures the integrity of the computation and establishes trust between parties.

Dependencies and Setup

Project Dependencies

The Provable VM relies on the following crates to provide cryptographic operations, serialization, and logging:

ark-ff
Enables finite field arithmetic, a fundamental component for elliptic curve and zk-SNARK computations.
ark-bls12-381
Implements the BLS12-381 elliptic curve, which is crucial for pairing-based cryptography used in zk-SNARKs.
ark-groth16
Provides the Groth16 zk-SNARK implementation for generating and verifying zero-knowledge proofs.
ark-relations
Defines relation and constraint systems for zk-SNARK circuits, supporting the VM's cryptographic proofs of execution.
ark-snark
Supplies standardized interfaces for zk-SNARK systems, integrating Groth16 with other Arkworks components.
ark-serialize
Handles serialization and deserialization of cryptographic objects like proofs and keys, enabling persistence.
ark-std
Offers utility traits and functions used across Arkworks crates.
bincode
An efficient binary serialization library used for encoding VM state data into cryptographic trace commitments.
serde
Facilitates serialization and deserialization of VM states and instructions in JSON format for debugging and sharing.
sha2
Implements the SHA-256 hash function for secure trace commitments, ensuring trace immutability.
hex
Provides encoding and decoding of hexadecimal values, used for presenting trace commitments in a readable format.
rand_chacha
Supplies the ChaCha20 random number generator, ensuring cryptographically secure randomness for proof generation.
rand_core
Offers core traits and utilities for random number generation, supporting cryptographic operations.
tracing-subscriber
Adds advanced logging and structured debugging capabilities for tracking VM operations and circuit constraints.
tracing
Enables structured and scoped logging for program execution and zk-SNARK constraint generation.

Rust Implementation

While this post highlights key snippets of the implementation, the entire codebase is available on Github. If you’re curious about the finer details or want to dive deeper into the code, be sure to check out the repository.

To make this easier to follow, I’m focusing on some of the most important parts of the implementation that power the VM and its unique capabilities. These snippets will give you a good idea of how the VM works and how the components interact. Let’s explore each major piece of the system, from how instructions are executed to how zero-knowledge proofs are generated. For the full implementation, with all the nuances and helper functions, visit the provided Github link.

Virtual Machine Execution

The execute_instruction function manages the execution of individual instructions in the virtual machine. Each opcode defines a unique behavior. For instance, the PUSH opcode adds a value to the stack, while ADD pops the top two values from the stack, computes their sum, and pushes the result back. The function also updates the program counter after executing an instruction. When encountering the HALT opcode, the function stops execution by returning false.

fn execute_instruction(&mut self, instruction: &Instruction) -> bool {
    match instruction.opcode {
        Opcode::PUSH => {
            if let Some(value) = instruction.operand {
                self.stack.push(value);
            }
        }
        Opcode::ADD => {
            if let (Some(a), Some(b)) = (self.stack.pop(), self.stack.pop()) {
                self.stack.push(a + b);
            } else {
                panic!("ADD requires two stack elements.");
            }
        }
        Opcode::HALT => return false,
        _ => unimplemented!(),
    }
    self.pc += 1;
    true
}

This modular design enables new opcodes to be added with minimal changes, making the virtual machine extensible. As the program executes, the state of the virtual machine evolves dynamically, capturing a faithful simulation of computation.

State Capture and Trace Commitment

To ensure that program execution can be audited and verified, the Provable VM captures its state at each step using the capture_state function. The captured states include the program counter, stack, heap, and flags, forming a complete snapshot of the VM's state. These states are then used to generate a cryptographic trace commitment.

The generate_trace_commitment function hashes the entire execution trace into a single cryptographic commitment. This hash acts as a secure fingerprint of the execution trace, ensuring its integrity and immutability.

fn capture_state(&self) -> ProvableState {
    ProvableState {
        pc: self.pc,
        stack: self.stack.clone(),
        heap: self.heap.clone(),
        flags: self.flags,
    }
}

fn generate_trace_commitment(&self, trace_file: &str) -> Result<Vec<u8>, String> {
    let mut hasher = Sha256::new();
    for state in &self.trace {
        let serialized = bincode::serialize(state).map_err(|e| e.to_string())?;
        hasher.update(serialized);
    }
    let hash = hasher.finalize().to_vec();
    File::create(trace_file)
        .and_then(|mut file| writeln!(file, "{}", hex::encode(&hash)))
        .map_err(|e| e.to_string())?;
    Ok(hash)
}

The commitment is stored in a file (trace_file) for later use, such as zero-knowledge proof verification. The commitment enables users to verify the trace without exposing its details, preserving privacy while maintaining trust.

Practice what you learned

Reinforce this article with hands-on coding exercises and AI-powered feedback.

String Manipulation ToolkitIntermediate Regex-Free Pattern MatcherIntermediate Build a head CommandBeginner

View all exercises

Circuit Constraints for ZKP

The translation of program execution into zkSNARK constraints is handled by the generate_constraints function. This function models stack, heap, and program behavior as arithmetic constraints, ensuring the program's execution trace matches the inputs and expected results.

fn generate_constraints(self, cs: ConstraintSystemRef<Fr>) -> Result<(), SynthesisError> {
    let trace_commitment_field = convert_commitment_to_field(&self.trace_commitment);
    let trace_commitment_var = cs.new_input_variable(|| Ok(trace_commitment_field))?;

    for (i, instruction) in self.program.iter().enumerate() {
        match instruction.opcode {
            Opcode::PUSH => {
                if let Some(value) = instruction.operand {
                    let value_var = cs.new_witness_variable(|| Ok(Fr::from(value)))?;
                    cs.enforce_constraint(lc!() + value_var, lc!() + Variable::One, lc!() + value_var)?;
                }
            }
            Opcode::ADD => {
                let a_var = cs.new_witness_variable(|| Ok(Fr::from(simulated_stack.pop().unwrap())))?;
                let b_var = cs.new_witness_variable(|| Ok(Fr::from(simulated_stack.pop().unwrap())))?;
                let result_var = cs.new_witness_variable(|| Ok(Fr::from(simulated_stack.push(a + b))))?;
                cs.enforce_constraint(lc!() + a_var + b_var, lc!() + Variable::One, lc!() + result_var)?;
            }
            _ => {}
        }
    }
    Ok(())
}

This mechanism ensures that the stack manipulations and arithmetic operations align with the program’s logic. The constraints are crucial for generating a zero-knowledge proof.

Program Loader

The load_program function provides a streamlined way to parse and load a program into the VM. The program is read from a file and converted into a vector of instructions. Each instruction includes an opcode and an optional operand. Comments and empty lines are ignored during parsing.

fn load_program(file_path: &str) -> Result<Vec<Instruction>, String> {
    let file = File::open(file_path).map_err(|e| e.to_string())?;
    BufReader::new(file)
        .lines()
        .filter_map(|line| {
            let line = line.ok()?;
            if line.starts_with('#') || line.trim().is_empty() {
                return None;
            }
            let parts: Vec<&str> = line.split_whitespace().collect();
            let opcode = Opcode::from_str(parts[0]).ok()?;
            let operand = parts.get(1).and_then(|v| v.parse().ok());
            Some(Ok(Instruction { opcode, operand }))
        })
        .collect::<Result<Vec<_>, _>>()
}

This function ensures that only valid instructions are loaded into the VM, enabling consistent execution. Programs are represented as an ordered list of instructions, ready for execution.

Zero-Knowledge Proof Operations

The Provable VM integrates seamlessly with zkSNARK libraries to generate and verify proofs. The generate_proof function creates a zkSNARK proof asserting that the execution trace is correct and consistent with the input program and its cryptographic trace commitment.

pub fn generate_proof(
    &self,
    program: &[Instruction],
    trace_file: &str,
    proof_file: &str,
    pk: &ProvingKey<Bls12_381>,
) -> Result<(), String> {
    let trace_commitment = self.generate_trace_commitment(trace_file)?;
    let circuit = ExecutionCircuit {
        initial_state: self.trace.first().unwrap().clone(),
        final_state: self.trace.last().unwrap().clone(),
        program: program.to_vec(),
        trace_commitment,
    };
    let mut rng = ChaCha20Rng::from_entropy();
    let proof = Groth16::<Bls12_381, LibsnarkReduction>::prove(pk, circuit, &mut rng)
        .map_err(|e| e.to_string())?;
    File::create(proof_file)
        .and_then(|mut file| proof.serialize_compressed(&mut file))
        .map_err(|e| e.to_string())?;
    Ok(())
}

The proof is stored in the specified file (proof_file), while the trace commitment in trace_file serves as a reference. The proof can then be verified against the trace commitment using the verify_proof function. This mechanism enables privacy-preserving and trustless verification of computation.

Supported Opcodes in the Initial Release

The Provable VM in its initial release supports a range of basic opcodes that enable core stack-based computation, memory management, and control flow. These opcodes form the foundation for program execution and traceability. Here’s a detailed overview of each supported opcode:

PUSH is used to push a value onto the stack. It takes a u32 operand, which represents the value to be added. For example, the instruction PUSH 10 places the value 10 at the top of the stack, increasing its size by one.

POP removes the top value from the stack. This opcode requires no operand. For instance, if the stack contains [10, 20] and the instruction POP is executed, the value 20 is removed, leaving [10] on the stack.

ADD performs arithmetic addition. It pops the top two values from the stack, computes their sum, and pushes the result back onto the stack. For example, executing PUSH 10, PUSH 20, and ADD leaves [30] on the stack.

SUB performs arithmetic subtraction. It pops the top two values from the stack, subtracts the first popped value from the second, and pushes the result back. For example, if the stack initially contains [20, 10], executing SUB results in [10].

JMP implements an unconditional jump by updating the program counter (pc) to the specified address. This opcode takes a u32 operand that indicates the target address. For example, the instruction JMP 4 moves execution to the instruction located at position 4.

JZ enables conditional jumps based on the state of the flags. If the flag value is zero (flags == 0), it sets the program counter to the specified address. For instance, JZ 10 jumps to position 10 if the condition is satisfied.

LOAD retrieves a value from the heap at a given address and pushes it onto the stack. It takes a u32 operand representing the address. For example, after executing STORE 0 to store a value, LOAD 0 retrieves the value at address 0 and pushes it onto the stack.

STORE pops a value from the stack and writes it to the heap at the specified address. This opcode requires a u32 operand representing the target address. For example, executing PUSH 30 followed by STORE 0 stores the value 30 at address 0 in the heap.

HALT stops program execution. It requires no operand and serves as a signal for the program to terminate. For example, after a series of instructions, HALT ends the execution cleanly.

These opcodes allow the Provable VM to perform essential operations while maintaining a clear and traceable execution path.

An Example Provable VM program

#provable 1.0
PUSH 10
PUSH 20
ADD
STORE 0
LOAD 0
PUSH 5
ADD
HALT

Why Use a Reduced Instruction Set?

The choice of a reduced instruction set for the Provable VM is no coincidence. It’s a deliberate design decision to align with the unique challenges and opportunities presented by representing computations in zkSNARKs.

Simplicity is Key

When working with zkSNARKs, every operation in your program — whether it’s a simple addition or a more complex memory store — needs to be translated into a set of arithmetic constraints. These constraints are then used to generate the cryptographic proof. A reduced instruction set makes this translation process simpler, as each opcode can be mapped to a small, well-defined set of constraints.

For example, the PUSH opcode in the Provable VM simply places a value on the stack. Its constraints reflect this behavior: the value pushed must match a specified input, and the stack’s new state must match the expected result. Similarly, the ADD opcode translates to constraints ensuring that two stack elements are added, and their sum matches the top stack element in the updated state.

Avoiding Complexity in Circuits

In zkSNARKs, complexity matters. Each additional constraint increases the proving time and the size of the proof. A reduced instruction set avoids bloating the constraint system by focusing only on the core operations necessary to represent general computation. This not only keeps the circuits manageable but also ensures that the VM remains performant and scalable.

For instance, advanced instructions like bitwise operations or floating-point arithmetic can introduce significant overhead in terms of both constraints and computational complexity. By limiting the instruction set to essential operations like arithmetic, memory load/store, and branching, we keep the proof system efficient while still maintaining general-purpose computation capability.

Relating to Real-World Computation

Although the reduced instruction set may seem minimalistic, it’s sufficient to represent any computation. This is akin to how Turing Machines operate: with just a few basic instructions, you can perform any computable function. The Provable VM leverages this same principle, providing just enough to enable the execution of complex logic without unnecessary overhead.

Why This Matters for zkSNARKs

When you’re proving computation in zkSNARKs, the goal is to strike a balance between functionality and efficiency. A reduced instruction set lets you:

Reduce the Size of the Proof: Fewer constraints mean smaller proofs, which are faster to generate and easier to verify.
Optimize for Specific Workloads: By focusing on core operations, the VM can be fine-tuned for specific use cases, such as cryptographic computations or data integrity checks.
Enable Extensibility: With a smaller core, it’s easier to add well-defined custom instructions in the future, keeping the VM adaptable to evolving needs.

In the context of zkSNARKs, every constraint counts, and a reduced instruction set is the key to keeping things lean and efficient while still representing any computation you might want to prove.

For the full implementation and examples of how these concepts come together, be sure to check out the Provable VM GitHub repository.