1
0

Continue refactor and rewrite comparison operator compilation

This commit is contained in:
Jeff 2024-12-09 10:30:57 -05:00
parent 98a7b7984a
commit 5d43674000
7 changed files with 188 additions and 84 deletions

102
README.md
View File

@ -137,34 +137,96 @@ The compiler always checks types on the fly, so there is no need for a separate
### Instructions
Dust's virtual machine is register-based and uses 64-bit instructions, which encode ten pieces of
information:
Dust's virtual machine uses 32-bit instructions, which encode seven pieces of information:
Bit | Description
----- | -----------
0-5 | Operation code
6-8 | Unused, reserved in case more operation codes are needed
9 | Flag indicating that A is a local
10 | Flag indicating that B is a constant
11 | Flag indicating that B is a local
12 | Flag indicating that C is a constant
13 | Flag indicating that C is a local
14 | D Argument (boolean value)
15-16 | Unused
17-32 | A argument (unsigned 16-bit integer)
33-48 | B argument (unsigned 16-bit integer)
49-63 | C argument (unsigned 16-bit integer)
0-4 | Operation code
5 | Flag indicating if the B argument is a constant
6 | Flag indicating if the C argument is a constant
7 | D field (boolean)
8-15 | A field (unsigned 8-bit integer)
16-23 | B field (unsigned 8-bit integer)
24-31 | C field (unsigned 8-bit integer)
Because the instructions are 64 bits, the maximum number of registers is 2^16, which is more than
enough, even for programs that are very large. This also means that chunks can store up to 2^16
constants and locals.
#### Operations
Five bits are used for the operation, which allows for up to 32 operations.
##### Stack manipulation
- MOVE: Makes a register's value available in another register by using a pointer. This avoids
copying the value or invalidating the original register.
- CLOSE: Sets a range of registers to the "empty" state.
##### Value loaders
- LOAD_BOOLEAN: Loads a boolean, the value of which is encoded in the instruction, to a register.
- LOAD_CONSTANT: Loads a constant from the constant list to a register.
- LOAD_LIST: Creates a list abstraction from a range of registers and loads it to a register.
- LOAD_MAP: Creates a map abstraction from a range of registers and loads it to a register.
- LOAD_SELF: Creates an abstraction that represents the current function and loads it to a register.
##### Variable operations
- GET_LOCAL: Loads a variable's value to a register by using a pointer to point to the variable's
canonical register (i.e. the register whose index is stored in the locals list).
- SET_LOCAL: Changes a variable's register to a pointer to another register, effectively changing
the variable's value.
##### Arithmetic
Arithmetic instructions use every field except for D. The A field is the destination register, the B
and C fields are the arguments, and the flags indicate whether the arguments are constants.
- ADD: Adds two values and stores the result in a register. Unlike the other arithmetic operations,
the ADD instruction can also be used to concatenate strings and characters.
- SUBTRACT: Subtracts one argument from another and stores the result in a register.
- MULTIPLY: Multiplies two arguments and stores the result in a register.
- DIVIDE: Divides one value by another and stores the result in a register.
- MODULO: Calculates the division remainder of two values and stores the result in a register.
- POWER: Raises one value to the power of another and stores the result in a register.
##### Logic
Logic instructions work differently from arithmetic and comparison instructions, but they are still
essentially binary operations with a left and a right argument. Rather than performing some
calculation and storing a result, the logic instructions perform a check on the left-hand argument
and, based on the result, either skip the right-hand argument or allow it to be executed. A `TEST`
is always followed by a `JUMP`. If the left argument passes the test (a boolean equality check), the
`JUMP` instruction is skipped and the right argument is executed. If the left argument fails the
test, the `JUMP` is not skipped and it jumps past the right argument.
- TEST
- TEST_SET
##### Comparison
- EQUAL
- LESS
- LESS_EQUAL
##### Unary operations
- NEGATE
- NOT
##### Execution
- CALL
- CALL_NATIVE
- JUMP
- RETURN
The A, B, and C
fields are used for usually used as indexes into the constant list or stack, but they can also hold
other information, like the number of arguments for a function call.
### Virtual Machine
The virtual machine is simple and efficient. It uses a stack of registers, which can hold values or
pointers. Pointers can point to values in the constant list, locals list, or the stack itself. If it
points to a local, the VM must consult its local definitions to find which register hold's the
value. Those local defintions are stored as a simple list of register indexes.
pointers. Pointers can point to values in the constant list, locals list, or the stack itself.
While the compiler has multiple responsibilities that warrant more complexity, the VM is simple
enough to use a very straightforward design. The VM's `run` function uses a simple `while` loop with

View File

@ -13,30 +13,30 @@
//! # Output
//!
//! The output of [Disassembler::disassemble] is a string that can be printed to the console or
//! written to a file. Below is an example of the disassembly for a simple "Hello, world!" program.
//! written to a file. Below is an example of the disassembly for a simple "Hello world!" program.
//!
//! ```text
//! ┌──────────────────────────────────────────────────────────────────────────────
//! │ dust
//! │
//! │ write_line("hello_world")
//! │
//! │ 3 instructions, 1 constants, 0 locals, returns none
//! │
//! │ Instructions
//! │ ------------
//! │ i POSITION OPERATION TYPE INFO │
//! │ --- ---------- ------------- -------------- -------------------------------- │
//! │ 0 (11, 24) LOAD_CONSTANT str R0 = C0 │
//! │ 1 (0, 25) CALL_NATIVE none write_line(R0..R1) │
//! │ 2 (25, 25) RETURN none
//! │┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
//! │ Constants
//! │ ---------
//! │ i TYPE VALUE
//! │ --- ---------------- -----------------
//! │ 0 str hello_world
//! └──────────────────────────────────────────────────────────────────────────────
//! ┌───────────────────────────────────────────────────────────────
//! │ dust │
//! │
//! │ write_line("Hello world!")
//! │
//! │ 3 instructions, 1 constants, 0 locals, returns none │
//! │
//! │ Instructions │
//! │ ------------ │
//! │ i POSITION OPERATION INFO │
//! │ --- ---------- ------------- -------------------------------- │
//! │ 0 (11, 25) LOAD_CONSTANT R0 = C0 │
//! │ 1 (0, 26) CALL_NATIVE write_line(R0..R1) │
//! │ 2 (26, 26) RETURN
//! │┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
//! │ Constants │
//! │ --------- │
//! │ i TYPE VALUE │
//! │ --- ---------------- ----------------- │
//! │ 0 str Hello world!
//! └───────────────────────────────────────────────────────────────
//! ```
use std::env::current_exe;

View File

@ -3,6 +3,9 @@
//! A chunk consists of a sequence of instructions and their positions, a list of constants, and a
//! list of locals that can be executed by the Dust virtual machine. Chunks have a name when they
//! belong to a named function.
mod disassembler;
pub use disassembler::Disassembler;
use std::fmt::{self, Debug, Display, Write};
@ -10,7 +13,7 @@ use serde::{Deserialize, Serialize};
use smallvec::SmallVec;
use smartstring::alias::String;
use crate::{ConcreteValue, Disassembler, FunctionType, Instruction, Scope, Span, Type};
use crate::{ConcreteValue, FunctionType, Instruction, Scope, Span, Type};
/// In-memory representation of a Dust program or function.
///

View File

@ -13,7 +13,10 @@ use std::{
};
use colored::Colorize;
use optimize::{optimize_control_flow, optimize_set_local};
use optimize::{
condense_set_local_to_math, optimize_test_with_explicit_booleans,
optimize_test_with_loader_arguments,
};
use smallvec::{smallvec, SmallVec};
use crate::{
@ -1003,7 +1006,7 @@ impl<'src> Compiler<'src> {
});
self.emit_instruction(set_local, Type::None, start_position);
optimize_set_local(self)?;
condense_set_local_to_math(self)?;
return Ok(());
}
@ -1209,9 +1212,8 @@ impl<'src> Compiler<'src> {
self.instructions
.insert(if_block_start, (jump, Type::None, if_block_start_position));
if self.instructions.len() >= 4 {
optimize_control_flow(&mut self.instructions);
}
optimize_test_with_explicit_booleans(self);
optimize_test_with_loader_arguments(self);
let else_last_register = self.next_register().saturating_sub(1);
let r#move = Instruction::from(Move {
@ -1385,13 +1387,24 @@ impl<'src> Compiler<'src> {
self.emit_instruction(r#return, Type::None, self.current_position);
} else {
let previous_expression_type = self.get_last_instruction_type();
let should_return_value = previous_expression_type != Type::None;
let previous_expression_type = self
.instructions
.iter()
.rev()
.find_map(|(instruction, r#type, _)| {
if instruction.yields_value() {
Some(r#type)
} else {
None
}
})
.unwrap_or(&Type::None);
let should_return_value = previous_expression_type != &Type::None;
let r#return = Instruction::from(Return {
should_return_value,
});
self.update_return_type(previous_expression_type)?;
self.update_return_type(previous_expression_type.clone())?;
self.emit_instruction(r#return, Type::None, self.current_position);
}

View File

@ -1,29 +1,14 @@
//! Tools used by the compiler to optimize a chunk's bytecode.
use crate::{instruction::SetLocal, CompileError, Compiler, Instruction, Operation, Span, Type};
use crate::{instruction::SetLocal, CompileError, Compiler, Operation};
fn get_last_operations<const COUNT: usize>(
instructions: &[(Instruction, Type, Span)],
) -> Option<[Operation; COUNT]> {
let mut n_operations = [Operation::Return; COUNT];
for (nth, operation) in n_operations.iter_mut().rev().zip(
instructions
.iter()
.rev()
.map(|(instruction, _, _)| instruction.operation()),
) {
*nth = operation;
}
Some(n_operations)
}
/// Optimizes a short control flow pattern.
/// Optimizes a control flow pattern by removing redundant instructions.
///
/// If a comparison instruction is followed by a test instruction, the test instruction may be
/// redundant because the comparison instruction already sets the correct value. If the test's
/// arguments (i.e. the boolean loaders) are `true` and `false` (in that order) then the boolean
/// loaders, jump and test instructions are removed, leaving a single comparison instruction.
///
/// Comparison and test instructions (which are always followed by a JUMP) can be optimized when
/// the next instructions are two constant or boolean loaders. The first loader is set to skip
/// an instruction if it is run while the second loader is modified to use the first's register.
/// This makes the following two code snippets compile to the same bytecode:
///
/// ```dust
@ -35,15 +20,55 @@ fn get_last_operations<const COUNT: usize>(
/// ```
///
/// The instructions must be in the following order:
/// - `Equal`, `Less`, `LessEqual` or `Test`
/// - `Equal`, `Less` or `LessEqual`
/// - `Test`
/// - `Jump`
/// - `LoadBoolean`
/// - `LoadBoolean`
pub fn optimize_test_with_explicit_booleans(compiler: &mut Compiler) {
if matches!(
compiler.get_last_operations(),
Some([
Operation::Equal | Operation::Less | Operation::LessEqual,
Operation::Test,
Operation::Jump,
Operation::LoadBoolean,
Operation::LoadBoolean,
])
) {
log::debug!("Removing redundant test, jump and boolean loaders after comparison");
let first_loader = compiler.instructions.iter().nth_back(1).unwrap();
let second_loader = compiler.instructions.last().unwrap();
let first_boolean = first_loader.0.b != 0;
let second_boolean = second_loader.0.b != 0;
if first_boolean && !second_boolean {
compiler.instructions.pop();
compiler.instructions.pop();
compiler.instructions.pop();
compiler.instructions.pop();
}
}
}
/// Optimizes a control flow pattern.
///
/// Test instructions (which are always followed by a jump) can be optimized when the next
/// instructions are two constant or boolean loaders. The first loader is set to skip an instruction
/// if it is run while the second loader is modified to use the first's register. This foregoes the
/// use of a jump instruction and uses one fewer register.
///
/// The instructions must be in the following order:
/// - `Test`
/// - `Jump`
/// - `LoadBoolean` or `LoadConstant`
/// - `LoadBoolean` or `LoadConstant`
pub fn optimize_control_flow(instructions: &mut [(Instruction, Type, Span)]) {
pub fn optimize_test_with_loader_arguments(compiler: &mut Compiler) {
if !matches!(
get_last_operations(instructions),
compiler.get_last_operations(),
Some([
Operation::Equal | Operation::Less | Operation::LessEqual | Operation::Test,
Operation::Test,
Operation::Jump,
Operation::LoadBoolean | Operation::LoadConstant,
Operation::LoadBoolean | Operation::LoadConstant,
@ -54,17 +79,17 @@ pub fn optimize_control_flow(instructions: &mut [(Instruction, Type, Span)]) {
log::debug!("Consolidating registers for control flow optimization");
let first_loader = &mut instructions.iter_mut().nth_back(1).unwrap().0;
let first_loader = &mut compiler.instructions.iter_mut().nth_back(1).unwrap().0;
first_loader.c = true as u8;
let first_loader_destination = first_loader.a;
let second_loader = &mut instructions.last_mut().unwrap().0;
let second_loader = &mut compiler.instructions.last_mut().unwrap().0;
second_loader.a = first_loader_destination;
}
/// Optimizes a math instruction followed by a SetLocal instruction.
/// Optimizes a math assignment pattern.
///
/// The SetLocal instruction is removed and the math instruction is modified to use the local as
/// its destination. This makes the following two code snippets compile to the same bytecode:
@ -82,7 +107,7 @@ pub fn optimize_control_flow(instructions: &mut [(Instruction, Type, Span)]) {
/// The instructions must be in the following order:
/// - `Add`, `Subtract`, `Multiply`, `Divide` or `Modulo`
/// - `SetLocal`
pub fn optimize_set_local(compiler: &mut Compiler) -> Result<(), CompileError> {
pub fn condense_set_local_to_math(compiler: &mut Compiler) -> Result<(), CompileError> {
if !matches!(
compiler.get_last_operations(),
Some([

View File

@ -365,6 +365,9 @@ impl Instruction {
| Operation::Multiply
| Operation::Divide
| Operation::Modulo
| Operation::Equal
| Operation::Less
| Operation::LessEqual
| Operation::Negate
| Operation::Not
| Operation::Call

View File

@ -30,7 +30,6 @@
pub mod chunk;
pub mod compiler;
pub mod disassembler;
pub mod dust_error;
pub mod instruction;
pub mod lexer;
@ -41,9 +40,8 @@ pub mod r#type;
pub mod value;
pub mod vm;
pub use crate::chunk::{Chunk, Local};
pub use crate::chunk::{Chunk, Disassembler, Local};
pub use crate::compiler::{compile, CompileError, Compiler};
pub use crate::disassembler::Disassembler;
pub use crate::dust_error::{AnnotatedError, DustError};
pub use crate::instruction::{Argument, Instruction, Operation};
pub use crate::lexer::{lex, LexError, Lexer};