When you compile source code into object files (such as .o files), the compiler generates machine code along with metadata that indicates how different parts of the code should be adjusted when the program is loaded into memory. These adjustments are known as relocations. They ensure that references to functions and variables point to the correct memory addresses, even if the final placement of the code in memory isn't known at compile time.
A relocation typically specifies:
- Offset: The location in the code or data segment where an address needs to be updated.
- Symbol Reference: The function or variable whose address needs to be inserted.
- Relocation Type: The kind of adjustment required (e.g., absolute address, relative address).
In this guide, we’ll focus on parsing ELF object files, extracting relocation entries, resolving symbol addresses across multiple libraries, and applying these relocations to a simulated memory space.
Setting Up the Environment
Before diving into the code, ensure you have Rust installed. You’ll also need the goblin, anyhow, and plain crates, which facilitate parsing ELF files, error handling, and byte-level data manipulation, respectively.
Cargo.toml
Begin by setting up your Cargo.toml with the necessary dependencies:
[package]
name = "toy_linker_demo"
version = "0.1.0"
edition = "2021"
[dependencies]
goblin = "0.7"
anyhow = "1.0"
plain = "0.3"
Writing the Linker in Rust
We’ll construct a Rust program that simulates a simple linker. This linker will:
- Load two ELF object files (
a.oandb.o). - Parse their sections and symbols.
- Resolve symbol references between them.
- Apply relocations to adjust addresses accordingly.
Structuring the Global Symbol Table
To manage symbols across multiple libraries, we introduce a GlobalSymbolTable. This structure maintains a mapping of exported symbols to their memory addresses and keeps track of loaded memory sections.
struct ExportedSymbol {
file_name: String,
address: usize, // Memory address where the symbol resides
}
struct GlobalSymbolTable {
exports: std::collections::HashMap<String, ExportedSymbol>,
mem_map: std::collections::HashMap<String, Vec<u8>>,
}
impl GlobalSymbolTable {
fn new() -> Self {
Self {
exports: std::collections::HashMap::new(),
mem_map: std::collections::HashMap::new(),
}
}
}
Loading and Relocating Object Files
The core functionality resides in the load_and_relocate_object function. This function performs several critical tasks:
- Reading the Object File: It reads the raw bytes of the ELF object file.
- Parsing the ELF Structure: Using
goblin, it parses the ELF headers, sections, and symbols. - Copying Relevant Sections: It identifies and copies sections like
.text,.data, and.rodatainto a simulated memory buffer. - Processing Symbols: It distinguishes between exported symbols (those defined within the object file) and undefined symbols (those referencing external symbols).
- Applying Relocations: It parses relocation entries and adjusts the memory buffer based on symbol addresses.
Here’s how the function is implemented:
fn load_and_relocate_object(
file_name: &str,
load_base: usize,
global_syms: &mut GlobalSymbolTable,
) -> Result<()> {
println!("Loading file: {} at base 0x{:x}", file_name, load_base);
// 1) Read the object file
let bytes = fs::read(file_name)?;
// 2) Parse the ELF
let obj = match Object::parse(&bytes)? {
Object::Elf(elf) => elf,
_ => {
println!("Not an ELF file: {}", file_name);
return Ok(());
}
};
// Create a memory buffer (64 KB for demonstration)
let mut memory = vec![0u8; 65536];
// 3) Copy .text, .data, .rodata, etc. into 'memory'
for sh in &obj.section_headers {
if sh.sh_size == 0 {
continue;
}
if let Some(name) = obj.shdr_strtab.get_at(sh.sh_name) {
if name == ".text" || name == ".data" || name == ".rodata" {
let section_start = load_base + (sh.sh_addr as usize);
let section_end = section_start + (sh.sh_size as usize);
let file_offset = sh.sh_offset as usize;
let file_end = file_offset + (sh.sh_size as usize);
memory[section_start..section_end]
.copy_from_slice(&bytes[file_offset..file_end]);
println!("Copied section {}: 0x{:x}..0x{:x}",
name, section_start, section_end);
}
}
}
// 4) Parse the symbol table and note which are exported vs. undefined
let mut symbols: Vec<(String, Sym)> = Vec::new();
let syms = &obj.syms; // Direct Symtab reference
for sym in syms.iter() {
if sym.st_name == 0 {
continue;
}
if let Some(name) = obj.strtab.get_at(sym.st_name) {
symbols.push((name.to_string(), sym));
}
}
// 4b) For each symbol, if st_shndx != 0 => export
for (sym_name, sym) in &symbols {
if sym.st_shndx != 0 {
let sym_addr = load_base + sym.st_value as usize;
println!("Symbol '{}' exported at 0x{:x} by {}",
sym_name, sym_addr, file_name);
global_syms.exports.insert(sym_name.clone(), ExportedSymbol {
file_name: file_name.to_string(),
address: sym_addr,
});
} else {
// It's an undefined symbol => we'll patch references
println!("Symbol '{}' is UNDEF in {}", sym_name, file_name);
}
}
// 5) Apply relocations: .rel.* (Rel) or .rela.* (Rela)
apply_rel_or_rela(&obj, &bytes, false, load_base, &mut memory, &symbols, global_syms)?;
apply_rel_or_rela(&obj, &bytes, true, load_base, &mut memory, &symbols, global_syms)?;
// 6) Store the final memory buffer
global_syms.mem_map.insert(file_name.to_string(), memory);
Ok(())
}
Handling Relocations
Relocations are entries that specify where and how to adjust addresses in the loaded sections. The apply_rel_or_rela function processes both .rel.* and .rela.* relocation sections. It utilizes the plain crate to parse raw bytes into Rel or Rela structures.
fn apply_rel_or_rela(
obj: &goblin::elf::Elf,
file_bytes: &[u8],
is_rela: bool,
load_base: usize,
memory: &mut [u8],
symbols: &[(String, goblin::elf::Sym)],
global_syms: &mut GlobalSymbolTable,
) -> Result<()> {
for sh in &obj.section_headers {
if let Some(name) = obj.shdr_strtab.get_at(sh.sh_name) {
if (is_rela && name.starts_with(".rela")) || (!is_rela && name.starts_with(".rel")) {
println!("Processing relocation section: {}", name);
let entry_size = if is_rela {
std::mem::size_of::<Rela>()
} else {
std::mem::size_of::<Rel>()
};
let count = sh.sh_size as usize / entry_size;
let mut offset = sh.sh_offset as usize;
for _ in 0..count {
if is_rela {
let rela: Rela = from_bytes::<Rela>(&file_bytes[offset..offset + entry_size])
.map_err(|e| anyhow!("Failed to parse Rela: {:?}", e))?;
offset += entry_size;
let sym_index = rela.r_info >> 32;
let r_type = (rela.r_info & 0xffffffff) as u32;
let reloc_offset = rela.r_offset as usize;
let addend = rela.r_addend;
apply_one_reloc(
reloc_offset,
sym_index as usize,
r_type,
addend,
load_base,
memory,
symbols,
global_syms
)?;
} else {
let rel: Rel = from_bytes::<Rel>(&file_bytes[offset..offset + entry_size])
.map_err(|e| anyhow!("Failed to parse Rel: {:?}", e))?;
offset += entry_size;
let sym_index = rel.r_info >> 32;
let r_type = (rel.r_info & 0xffffffff) as u32;
let reloc_offset = rel.r_offset as usize;
// .rel typically has implicit addend = 0
apply_one_reloc(
reloc_offset,
sym_index as usize,
r_type,
0,
load_base,
memory,
symbols,
global_syms
)?;
}
}
}
}
}
Ok(())
}
This function iterates over all section headers, identifying relocation sections based on their names (.rel.* or .rela.*). For each relocation entry, it parses the raw bytes into a Rel or Rela structure and then delegates the patching process to apply_one_reloc.
Patching Memory with Relocations
The apply_one_reloc function performs the actual memory patching. It calculates the final address for a symbol and updates the memory buffer accordingly.
fn apply_one_reloc(
reloc_offset: usize,
sym_index: usize,
r_type: u32,
addend: i64,
load_base: usize,
memory: &mut [u8],
symbols: &[(String, goblin::elf::Sym)],
global_syms: &mut GlobalSymbolTable,
) -> Result<()> {
let patch_addr = load_base + reloc_offset;
println!("Applying reloc @ 0x{:x}, sym_idx {}, type {}, addend={}",
patch_addr, sym_index, r_type, addend);
// 1) Find symbol name from sym_index
let (sym_name, sym) = match symbols.get(sym_index) {
Some(pair) => pair,
None => {
eprintln!("No symbol for index {}", sym_index);
return Ok(()); // Gracefully skip unresolved symbols
}
};
// 2) Resolve the symbol address
let final_addr: u64 = if sym.st_shndx == 0 {
// Imported symbol; look it up in the global symbol table
if let Some(export) = global_syms.exports.get(sym_name) {
export.address as u64
} else {
eprintln!("Symbol '{}' not found in global exports!", sym_name);
0
}
} else {
// Local symbol; compute its address based on load_base
(load_base + sym.st_value as usize) as u64
};
// Incorporate the addend into the relocation value
let reloc_value = final_addr.wrapping_add(addend as u64);
// 3) Patch the memory buffer with the computed address (little-endian)
let bytes = reloc_value.to_le_bytes();
for i in 0..8 {
memory[patch_addr + i] = bytes[i];
}
println!(" -> Patched 0x{:x} with 0x{:x} (symbol={})",
patch_addr, reloc_value, sym_name);
Ok(())
}
This function begins by calculating the absolute address where the relocation needs to be applied. It then retrieves the symbol’s name and determines whether the symbol is local or imported. For imported symbols, it looks up the address in the global symbol table. Finally, it updates the memory buffer at the specified offset with the resolved address, taking into account any addend.
The Main Function
The main function orchestrates the loading and linking process. It initializes the global symbol table, loads each object file, and displays the resolved symbols.
fn main() -> Result<()> {
let mut global_symbols = GlobalSymbolTable::new();
// Load 'b.o' first, then 'a.o'
load_and_relocate_object("b.o", 0x20000, &mut global_symbols)?;
load_and_relocate_object("a.o", 0x30000, &mut global_symbols)?;
println!("\nDone loading both libraries!\n");
println!("Global symbols known are:");
for (name, sym) in &global_symbols.exports {
println!(" - {} => address 0x{:x} (in file {})",
name, sym.address, sym.file_name);
}
Ok(())
}


