diff --git a/README.md b/README.md index cbc25e5..ef2e73f 100644 --- a/README.md +++ b/README.md @@ -137,34 +137,96 @@ The compiler always checks types on the fly, so there is no need for a separate ### Instructions -Dust's virtual machine is register-based and uses 64-bit instructions, which encode ten pieces of -information: +Dust's virtual machine uses 32-bit instructions, which encode seven pieces of information: Bit | Description ----- | ----------- -0-5 | Operation code -6-8 | Unused, reserved in case more operation codes are needed -9 | Flag indicating that A is a local -10 | Flag indicating that B is a constant -11 | Flag indicating that B is a local -12 | Flag indicating that C is a constant -13 | Flag indicating that C is a local -14 | D Argument (boolean value) -15-16 | Unused -17-32 | A argument (unsigned 16-bit integer) -33-48 | B argument (unsigned 16-bit integer) -49-63 | C argument (unsigned 16-bit integer) +0-4 | Operation code +5 | Flag indicating if the B argument is a constant +6 | Flag indicating if the C argument is a constant +7 | D field (boolean) +8-15 | A field (unsigned 8-bit integer) +16-23 | B field (unsigned 8-bit integer) +24-31 | C field (unsigned 8-bit integer) -Because the instructions are 64 bits, the maximum number of registers is 2^16, which is more than -enough, even for programs that are very large. This also means that chunks can store up to 2^16 -constants and locals. +#### Operations + +Five bits are used for the operation, which allows for up to 32 operations. + +##### Stack manipulation + +- MOVE: Makes a register's value available in another register by using a pointer. This avoids + copying the value or invalidating the original register. +- CLOSE: Sets a range of registers to the "empty" state. + +##### Value loaders + +- LOAD_BOOLEAN: Loads a boolean, the value of which is encoded in the instruction, to a register. +- LOAD_CONSTANT: Loads a constant from the constant list to a register. +- LOAD_LIST: Creates a list abstraction from a range of registers and loads it to a register. +- LOAD_MAP: Creates a map abstraction from a range of registers and loads it to a register. +- LOAD_SELF: Creates an abstraction that represents the current function and loads it to a register. + +##### Variable operations + +- GET_LOCAL: Loads a variable's value to a register by using a pointer to point to the variable's + canonical register (i.e. the register whose index is stored in the locals list). +- SET_LOCAL: Changes a variable's register to a pointer to another register, effectively changing + the variable's value. + +##### Arithmetic + +Arithmetic instructions use every field except for D. The A field is the destination register, the B +and C fields are the arguments, and the flags indicate whether the arguments are constants. + +- ADD: Adds two values and stores the result in a register. Unlike the other arithmetic operations, + the ADD instruction can also be used to concatenate strings and characters. +- SUBTRACT: Subtracts one argument from another and stores the result in a register. +- MULTIPLY: Multiplies two arguments and stores the result in a register. +- DIVIDE: Divides one value by another and stores the result in a register. +- MODULO: Calculates the division remainder of two values and stores the result in a register. +- POWER: Raises one value to the power of another and stores the result in a register. + +##### Logic + +Logic instructions work differently from arithmetic and comparison instructions, but they are still +essentially binary operations with a left and a right argument. Rather than performing some +calculation and storing a result, the logic instructions perform a check on the left-hand argument +and, based on the result, either skip the right-hand argument or allow it to be executed. A `TEST` +is always followed by a `JUMP`. If the left argument passes the test (a boolean equality check), the +`JUMP` instruction is skipped and the right argument is executed. If the left argument fails the +test, the `JUMP` is not skipped and it jumps past the right argument. + +- TEST +- TEST_SET + +##### Comparison + +- EQUAL +- LESS +- LESS_EQUAL + +##### Unary operations + +- NEGATE +- NOT + +##### Execution + +- CALL +- CALL_NATIVE +- JUMP +- RETURN + + +The A, B, and C +fields are used for usually used as indexes into the constant list or stack, but they can also hold +other information, like the number of arguments for a function call. ### Virtual Machine The virtual machine is simple and efficient. It uses a stack of registers, which can hold values or -pointers. Pointers can point to values in the constant list, locals list, or the stack itself. If it -points to a local, the VM must consult its local definitions to find which register hold's the -value. Those local defintions are stored as a simple list of register indexes. +pointers. Pointers can point to values in the constant list, locals list, or the stack itself. While the compiler has multiple responsibilities that warrant more complexity, the VM is simple enough to use a very straightforward design. The VM's `run` function uses a simple `while` loop with diff --git a/dust-lang/src/disassembler.rs b/dust-lang/src/chunk/disassembler.rs similarity index 87% rename from dust-lang/src/disassembler.rs rename to dust-lang/src/chunk/disassembler.rs index 089ab8e..4e5c767 100644 --- a/dust-lang/src/disassembler.rs +++ b/dust-lang/src/chunk/disassembler.rs @@ -13,30 +13,30 @@ //! # Output //! //! The output of [Disassembler::disassemble] is a string that can be printed to the console or -//! written to a file. Below is an example of the disassembly for a simple "Hello, world!" program. +//! written to a file. Below is an example of the disassembly for a simple "Hello world!" program. //! //! ```text -//! ┌──────────────────────────────────────────────────────────────────────────────┐ -//! │ dust │ -//! │ │ -//! │ write_line("hello_world") │ -//! │ │ -//! │ 3 instructions, 1 constants, 0 locals, returns none │ -//! │ │ -//! │ Instructions │ -//! │ ------------ │ -//! │ i POSITION OPERATION TYPE INFO │ -//! │ --- ---------- ------------- -------------- -------------------------------- │ -//! │ 0 (11, 24) LOAD_CONSTANT str R0 = C0 │ -//! │ 1 (0, 25) CALL_NATIVE none write_line(R0..R1) │ -//! │ 2 (25, 25) RETURN none │ -//! │┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈│ -//! │ Constants │ -//! │ --------- │ -//! │ i TYPE VALUE │ -//! │ --- ---------------- ----------------- │ -//! │ 0 str hello_world │ -//! └──────────────────────────────────────────────────────────────────────────────┘ +//! ┌───────────────────────────────────────────────────────────────┐ +//! │ dust │ +//! │ │ +//! │ write_line("Hello world!") │ +//! │ │ +//! │ 3 instructions, 1 constants, 0 locals, returns none │ +//! │ │ +//! │ Instructions │ +//! │ ------------ │ +//! │ i POSITION OPERATION INFO │ +//! │ --- ---------- ------------- -------------------------------- │ +//! │ 0 (11, 25) LOAD_CONSTANT R0 = C0 │ +//! │ 1 (0, 26) CALL_NATIVE write_line(R0..R1) │ +//! │ 2 (26, 26) RETURN │ +//! │┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈│ +//! │ Constants │ +//! │ --------- │ +//! │ i TYPE VALUE │ +//! │ --- ---------------- ----------------- │ +//! │ 0 str Hello world! │ +//! └───────────────────────────────────────────────────────────────┘ //! ``` use std::env::current_exe; diff --git a/dust-lang/src/chunk.rs b/dust-lang/src/chunk/mod.rs similarity index 96% rename from dust-lang/src/chunk.rs rename to dust-lang/src/chunk/mod.rs index 9107b12..4981e26 100644 --- a/dust-lang/src/chunk.rs +++ b/dust-lang/src/chunk/mod.rs @@ -3,6 +3,9 @@ //! A chunk consists of a sequence of instructions and their positions, a list of constants, and a //! list of locals that can be executed by the Dust virtual machine. Chunks have a name when they //! belong to a named function. +mod disassembler; + +pub use disassembler::Disassembler; use std::fmt::{self, Debug, Display, Write}; @@ -10,7 +13,7 @@ use serde::{Deserialize, Serialize}; use smallvec::SmallVec; use smartstring::alias::String; -use crate::{ConcreteValue, Disassembler, FunctionType, Instruction, Scope, Span, Type}; +use crate::{ConcreteValue, FunctionType, Instruction, Scope, Span, Type}; /// In-memory representation of a Dust program or function. /// diff --git a/dust-lang/src/compiler/mod.rs b/dust-lang/src/compiler/mod.rs index 9e25224..2155846 100644 --- a/dust-lang/src/compiler/mod.rs +++ b/dust-lang/src/compiler/mod.rs @@ -13,7 +13,10 @@ use std::{ }; use colored::Colorize; -use optimize::{optimize_control_flow, optimize_set_local}; +use optimize::{ + condense_set_local_to_math, optimize_test_with_explicit_booleans, + optimize_test_with_loader_arguments, +}; use smallvec::{smallvec, SmallVec}; use crate::{ @@ -1003,7 +1006,7 @@ impl<'src> Compiler<'src> { }); self.emit_instruction(set_local, Type::None, start_position); - optimize_set_local(self)?; + condense_set_local_to_math(self)?; return Ok(()); } @@ -1209,9 +1212,8 @@ impl<'src> Compiler<'src> { self.instructions .insert(if_block_start, (jump, Type::None, if_block_start_position)); - if self.instructions.len() >= 4 { - optimize_control_flow(&mut self.instructions); - } + optimize_test_with_explicit_booleans(self); + optimize_test_with_loader_arguments(self); let else_last_register = self.next_register().saturating_sub(1); let r#move = Instruction::from(Move { @@ -1385,13 +1387,24 @@ impl<'src> Compiler<'src> { self.emit_instruction(r#return, Type::None, self.current_position); } else { - let previous_expression_type = self.get_last_instruction_type(); - let should_return_value = previous_expression_type != Type::None; + let previous_expression_type = self + .instructions + .iter() + .rev() + .find_map(|(instruction, r#type, _)| { + if instruction.yields_value() { + Some(r#type) + } else { + None + } + }) + .unwrap_or(&Type::None); + let should_return_value = previous_expression_type != &Type::None; let r#return = Instruction::from(Return { should_return_value, }); - self.update_return_type(previous_expression_type)?; + self.update_return_type(previous_expression_type.clone())?; self.emit_instruction(r#return, Type::None, self.current_position); } diff --git a/dust-lang/src/compiler/optimize.rs b/dust-lang/src/compiler/optimize.rs index bb6b107..ff1baa5 100644 --- a/dust-lang/src/compiler/optimize.rs +++ b/dust-lang/src/compiler/optimize.rs @@ -1,29 +1,14 @@ //! Tools used by the compiler to optimize a chunk's bytecode. -use crate::{instruction::SetLocal, CompileError, Compiler, Instruction, Operation, Span, Type}; +use crate::{instruction::SetLocal, CompileError, Compiler, Operation}; -fn get_last_operations( - instructions: &[(Instruction, Type, Span)], -) -> Option<[Operation; COUNT]> { - let mut n_operations = [Operation::Return; COUNT]; - - for (nth, operation) in n_operations.iter_mut().rev().zip( - instructions - .iter() - .rev() - .map(|(instruction, _, _)| instruction.operation()), - ) { - *nth = operation; - } - - Some(n_operations) -} - -/// Optimizes a short control flow pattern. +/// Optimizes a control flow pattern by removing redundant instructions. +/// +/// If a comparison instruction is followed by a test instruction, the test instruction may be +/// redundant because the comparison instruction already sets the correct value. If the test's +/// arguments (i.e. the boolean loaders) are `true` and `false` (in that order) then the boolean +/// loaders, jump and test instructions are removed, leaving a single comparison instruction. /// -/// Comparison and test instructions (which are always followed by a JUMP) can be optimized when -/// the next instructions are two constant or boolean loaders. The first loader is set to skip -/// an instruction if it is run while the second loader is modified to use the first's register. /// This makes the following two code snippets compile to the same bytecode: /// /// ```dust @@ -35,15 +20,55 @@ fn get_last_operations( /// ``` /// /// The instructions must be in the following order: -/// - `Equal`, `Less`, `LessEqual` or `Test` +/// - `Equal`, `Less` or `LessEqual` +/// - `Test` +/// - `Jump` +/// - `LoadBoolean` +/// - `LoadBoolean` +pub fn optimize_test_with_explicit_booleans(compiler: &mut Compiler) { + if matches!( + compiler.get_last_operations(), + Some([ + Operation::Equal | Operation::Less | Operation::LessEqual, + Operation::Test, + Operation::Jump, + Operation::LoadBoolean, + Operation::LoadBoolean, + ]) + ) { + log::debug!("Removing redundant test, jump and boolean loaders after comparison"); + + let first_loader = compiler.instructions.iter().nth_back(1).unwrap(); + let second_loader = compiler.instructions.last().unwrap(); + let first_boolean = first_loader.0.b != 0; + let second_boolean = second_loader.0.b != 0; + + if first_boolean && !second_boolean { + compiler.instructions.pop(); + compiler.instructions.pop(); + compiler.instructions.pop(); + compiler.instructions.pop(); + } + } +} + +/// Optimizes a control flow pattern. +/// +/// Test instructions (which are always followed by a jump) can be optimized when the next +/// instructions are two constant or boolean loaders. The first loader is set to skip an instruction +/// if it is run while the second loader is modified to use the first's register. This foregoes the +/// use of a jump instruction and uses one fewer register. +/// +/// The instructions must be in the following order: +/// - `Test` /// - `Jump` /// - `LoadBoolean` or `LoadConstant` /// - `LoadBoolean` or `LoadConstant` -pub fn optimize_control_flow(instructions: &mut [(Instruction, Type, Span)]) { +pub fn optimize_test_with_loader_arguments(compiler: &mut Compiler) { if !matches!( - get_last_operations(instructions), + compiler.get_last_operations(), Some([ - Operation::Equal | Operation::Less | Operation::LessEqual | Operation::Test, + Operation::Test, Operation::Jump, Operation::LoadBoolean | Operation::LoadConstant, Operation::LoadBoolean | Operation::LoadConstant, @@ -54,17 +79,17 @@ pub fn optimize_control_flow(instructions: &mut [(Instruction, Type, Span)]) { log::debug!("Consolidating registers for control flow optimization"); - let first_loader = &mut instructions.iter_mut().nth_back(1).unwrap().0; + let first_loader = &mut compiler.instructions.iter_mut().nth_back(1).unwrap().0; first_loader.c = true as u8; let first_loader_destination = first_loader.a; - let second_loader = &mut instructions.last_mut().unwrap().0; + let second_loader = &mut compiler.instructions.last_mut().unwrap().0; second_loader.a = first_loader_destination; } -/// Optimizes a math instruction followed by a SetLocal instruction. +/// Optimizes a math assignment pattern. /// /// The SetLocal instruction is removed and the math instruction is modified to use the local as /// its destination. This makes the following two code snippets compile to the same bytecode: @@ -82,7 +107,7 @@ pub fn optimize_control_flow(instructions: &mut [(Instruction, Type, Span)]) { /// The instructions must be in the following order: /// - `Add`, `Subtract`, `Multiply`, `Divide` or `Modulo` /// - `SetLocal` -pub fn optimize_set_local(compiler: &mut Compiler) -> Result<(), CompileError> { +pub fn condense_set_local_to_math(compiler: &mut Compiler) -> Result<(), CompileError> { if !matches!( compiler.get_last_operations(), Some([ diff --git a/dust-lang/src/instruction/mod.rs b/dust-lang/src/instruction/mod.rs index 68514e9..2ad71c1 100644 --- a/dust-lang/src/instruction/mod.rs +++ b/dust-lang/src/instruction/mod.rs @@ -365,6 +365,9 @@ impl Instruction { | Operation::Multiply | Operation::Divide | Operation::Modulo + | Operation::Equal + | Operation::Less + | Operation::LessEqual | Operation::Negate | Operation::Not | Operation::Call diff --git a/dust-lang/src/lib.rs b/dust-lang/src/lib.rs index 1d0c765..528a78d 100644 --- a/dust-lang/src/lib.rs +++ b/dust-lang/src/lib.rs @@ -30,7 +30,6 @@ pub mod chunk; pub mod compiler; -pub mod disassembler; pub mod dust_error; pub mod instruction; pub mod lexer; @@ -41,9 +40,8 @@ pub mod r#type; pub mod value; pub mod vm; -pub use crate::chunk::{Chunk, Local}; +pub use crate::chunk::{Chunk, Disassembler, Local}; pub use crate::compiler::{compile, CompileError, Compiler}; -pub use crate::disassembler::Disassembler; pub use crate::dust_error::{AnnotatedError, DustError}; pub use crate::instruction::{Argument, Instruction, Operation}; pub use crate::lexer::{lex, LexError, Lexer};