Continue refactor and rewrite comparison operator compilation
This commit is contained in:
parent
98a7b7984a
commit
5d43674000
102
README.md
102
README.md
@ -137,34 +137,96 @@ The compiler always checks types on the fly, so there is no need for a separate
|
||||
|
||||
### Instructions
|
||||
|
||||
Dust's virtual machine is register-based and uses 64-bit instructions, which encode ten pieces of
|
||||
information:
|
||||
Dust's virtual machine uses 32-bit instructions, which encode seven pieces of information:
|
||||
|
||||
Bit | Description
|
||||
----- | -----------
|
||||
0-5 | Operation code
|
||||
6-8 | Unused, reserved in case more operation codes are needed
|
||||
9 | Flag indicating that A is a local
|
||||
10 | Flag indicating that B is a constant
|
||||
11 | Flag indicating that B is a local
|
||||
12 | Flag indicating that C is a constant
|
||||
13 | Flag indicating that C is a local
|
||||
14 | D Argument (boolean value)
|
||||
15-16 | Unused
|
||||
17-32 | A argument (unsigned 16-bit integer)
|
||||
33-48 | B argument (unsigned 16-bit integer)
|
||||
49-63 | C argument (unsigned 16-bit integer)
|
||||
0-4 | Operation code
|
||||
5 | Flag indicating if the B argument is a constant
|
||||
6 | Flag indicating if the C argument is a constant
|
||||
7 | D field (boolean)
|
||||
8-15 | A field (unsigned 8-bit integer)
|
||||
16-23 | B field (unsigned 8-bit integer)
|
||||
24-31 | C field (unsigned 8-bit integer)
|
||||
|
||||
Because the instructions are 64 bits, the maximum number of registers is 2^16, which is more than
|
||||
enough, even for programs that are very large. This also means that chunks can store up to 2^16
|
||||
constants and locals.
|
||||
#### Operations
|
||||
|
||||
Five bits are used for the operation, which allows for up to 32 operations.
|
||||
|
||||
##### Stack manipulation
|
||||
|
||||
- MOVE: Makes a register's value available in another register by using a pointer. This avoids
|
||||
copying the value or invalidating the original register.
|
||||
- CLOSE: Sets a range of registers to the "empty" state.
|
||||
|
||||
##### Value loaders
|
||||
|
||||
- LOAD_BOOLEAN: Loads a boolean, the value of which is encoded in the instruction, to a register.
|
||||
- LOAD_CONSTANT: Loads a constant from the constant list to a register.
|
||||
- LOAD_LIST: Creates a list abstraction from a range of registers and loads it to a register.
|
||||
- LOAD_MAP: Creates a map abstraction from a range of registers and loads it to a register.
|
||||
- LOAD_SELF: Creates an abstraction that represents the current function and loads it to a register.
|
||||
|
||||
##### Variable operations
|
||||
|
||||
- GET_LOCAL: Loads a variable's value to a register by using a pointer to point to the variable's
|
||||
canonical register (i.e. the register whose index is stored in the locals list).
|
||||
- SET_LOCAL: Changes a variable's register to a pointer to another register, effectively changing
|
||||
the variable's value.
|
||||
|
||||
##### Arithmetic
|
||||
|
||||
Arithmetic instructions use every field except for D. The A field is the destination register, the B
|
||||
and C fields are the arguments, and the flags indicate whether the arguments are constants.
|
||||
|
||||
- ADD: Adds two values and stores the result in a register. Unlike the other arithmetic operations,
|
||||
the ADD instruction can also be used to concatenate strings and characters.
|
||||
- SUBTRACT: Subtracts one argument from another and stores the result in a register.
|
||||
- MULTIPLY: Multiplies two arguments and stores the result in a register.
|
||||
- DIVIDE: Divides one value by another and stores the result in a register.
|
||||
- MODULO: Calculates the division remainder of two values and stores the result in a register.
|
||||
- POWER: Raises one value to the power of another and stores the result in a register.
|
||||
|
||||
##### Logic
|
||||
|
||||
Logic instructions work differently from arithmetic and comparison instructions, but they are still
|
||||
essentially binary operations with a left and a right argument. Rather than performing some
|
||||
calculation and storing a result, the logic instructions perform a check on the left-hand argument
|
||||
and, based on the result, either skip the right-hand argument or allow it to be executed. A `TEST`
|
||||
is always followed by a `JUMP`. If the left argument passes the test (a boolean equality check), the
|
||||
`JUMP` instruction is skipped and the right argument is executed. If the left argument fails the
|
||||
test, the `JUMP` is not skipped and it jumps past the right argument.
|
||||
|
||||
- TEST
|
||||
- TEST_SET
|
||||
|
||||
##### Comparison
|
||||
|
||||
- EQUAL
|
||||
- LESS
|
||||
- LESS_EQUAL
|
||||
|
||||
##### Unary operations
|
||||
|
||||
- NEGATE
|
||||
- NOT
|
||||
|
||||
##### Execution
|
||||
|
||||
- CALL
|
||||
- CALL_NATIVE
|
||||
- JUMP
|
||||
- RETURN
|
||||
|
||||
|
||||
The A, B, and C
|
||||
fields are used for usually used as indexes into the constant list or stack, but they can also hold
|
||||
other information, like the number of arguments for a function call.
|
||||
|
||||
### Virtual Machine
|
||||
|
||||
The virtual machine is simple and efficient. It uses a stack of registers, which can hold values or
|
||||
pointers. Pointers can point to values in the constant list, locals list, or the stack itself. If it
|
||||
points to a local, the VM must consult its local definitions to find which register hold's the
|
||||
value. Those local defintions are stored as a simple list of register indexes.
|
||||
pointers. Pointers can point to values in the constant list, locals list, or the stack itself.
|
||||
|
||||
While the compiler has multiple responsibilities that warrant more complexity, the VM is simple
|
||||
enough to use a very straightforward design. The VM's `run` function uses a simple `while` loop with
|
||||
|
@ -13,30 +13,30 @@
|
||||
//! # Output
|
||||
//!
|
||||
//! The output of [Disassembler::disassemble] is a string that can be printed to the console or
|
||||
//! written to a file. Below is an example of the disassembly for a simple "Hello, world!" program.
|
||||
//! written to a file. Below is an example of the disassembly for a simple "Hello world!" program.
|
||||
//!
|
||||
//! ```text
|
||||
//! ┌──────────────────────────────────────────────────────────────────────────────┐
|
||||
//! │ dust │
|
||||
//! │ │
|
||||
//! │ write_line("hello_world") │
|
||||
//! │ │
|
||||
//! │ 3 instructions, 1 constants, 0 locals, returns none │
|
||||
//! │ │
|
||||
//! │ Instructions │
|
||||
//! │ ------------ │
|
||||
//! │ i POSITION OPERATION TYPE INFO │
|
||||
//! │ --- ---------- ------------- -------------- -------------------------------- │
|
||||
//! │ 0 (11, 24) LOAD_CONSTANT str R0 = C0 │
|
||||
//! │ 1 (0, 25) CALL_NATIVE none write_line(R0..R1) │
|
||||
//! │ 2 (25, 25) RETURN none │
|
||||
//! │┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈│
|
||||
//! │ Constants │
|
||||
//! │ --------- │
|
||||
//! │ i TYPE VALUE │
|
||||
//! │ --- ---------------- ----------------- │
|
||||
//! │ 0 str hello_world │
|
||||
//! └──────────────────────────────────────────────────────────────────────────────┘
|
||||
//! ┌───────────────────────────────────────────────────────────────┐
|
||||
//! │ dust │
|
||||
//! │ │
|
||||
//! │ write_line("Hello world!") │
|
||||
//! │ │
|
||||
//! │ 3 instructions, 1 constants, 0 locals, returns none │
|
||||
//! │ │
|
||||
//! │ Instructions │
|
||||
//! │ ------------ │
|
||||
//! │ i POSITION OPERATION INFO │
|
||||
//! │ --- ---------- ------------- -------------------------------- │
|
||||
//! │ 0 (11, 25) LOAD_CONSTANT R0 = C0 │
|
||||
//! │ 1 (0, 26) CALL_NATIVE write_line(R0..R1) │
|
||||
//! │ 2 (26, 26) RETURN │
|
||||
//! │┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈│
|
||||
//! │ Constants │
|
||||
//! │ --------- │
|
||||
//! │ i TYPE VALUE │
|
||||
//! │ --- ---------------- ----------------- │
|
||||
//! │ 0 str Hello world! │
|
||||
//! └───────────────────────────────────────────────────────────────┘
|
||||
//! ```
|
||||
use std::env::current_exe;
|
||||
|
@ -3,6 +3,9 @@
|
||||
//! A chunk consists of a sequence of instructions and their positions, a list of constants, and a
|
||||
//! list of locals that can be executed by the Dust virtual machine. Chunks have a name when they
|
||||
//! belong to a named function.
|
||||
mod disassembler;
|
||||
|
||||
pub use disassembler::Disassembler;
|
||||
|
||||
use std::fmt::{self, Debug, Display, Write};
|
||||
|
||||
@ -10,7 +13,7 @@ use serde::{Deserialize, Serialize};
|
||||
use smallvec::SmallVec;
|
||||
use smartstring::alias::String;
|
||||
|
||||
use crate::{ConcreteValue, Disassembler, FunctionType, Instruction, Scope, Span, Type};
|
||||
use crate::{ConcreteValue, FunctionType, Instruction, Scope, Span, Type};
|
||||
|
||||
/// In-memory representation of a Dust program or function.
|
||||
///
|
@ -13,7 +13,10 @@ use std::{
|
||||
};
|
||||
|
||||
use colored::Colorize;
|
||||
use optimize::{optimize_control_flow, optimize_set_local};
|
||||
use optimize::{
|
||||
condense_set_local_to_math, optimize_test_with_explicit_booleans,
|
||||
optimize_test_with_loader_arguments,
|
||||
};
|
||||
use smallvec::{smallvec, SmallVec};
|
||||
|
||||
use crate::{
|
||||
@ -1003,7 +1006,7 @@ impl<'src> Compiler<'src> {
|
||||
});
|
||||
|
||||
self.emit_instruction(set_local, Type::None, start_position);
|
||||
optimize_set_local(self)?;
|
||||
condense_set_local_to_math(self)?;
|
||||
|
||||
return Ok(());
|
||||
}
|
||||
@ -1209,9 +1212,8 @@ impl<'src> Compiler<'src> {
|
||||
self.instructions
|
||||
.insert(if_block_start, (jump, Type::None, if_block_start_position));
|
||||
|
||||
if self.instructions.len() >= 4 {
|
||||
optimize_control_flow(&mut self.instructions);
|
||||
}
|
||||
optimize_test_with_explicit_booleans(self);
|
||||
optimize_test_with_loader_arguments(self);
|
||||
|
||||
let else_last_register = self.next_register().saturating_sub(1);
|
||||
let r#move = Instruction::from(Move {
|
||||
@ -1385,13 +1387,24 @@ impl<'src> Compiler<'src> {
|
||||
|
||||
self.emit_instruction(r#return, Type::None, self.current_position);
|
||||
} else {
|
||||
let previous_expression_type = self.get_last_instruction_type();
|
||||
let should_return_value = previous_expression_type != Type::None;
|
||||
let previous_expression_type = self
|
||||
.instructions
|
||||
.iter()
|
||||
.rev()
|
||||
.find_map(|(instruction, r#type, _)| {
|
||||
if instruction.yields_value() {
|
||||
Some(r#type)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
})
|
||||
.unwrap_or(&Type::None);
|
||||
let should_return_value = previous_expression_type != &Type::None;
|
||||
let r#return = Instruction::from(Return {
|
||||
should_return_value,
|
||||
});
|
||||
|
||||
self.update_return_type(previous_expression_type)?;
|
||||
self.update_return_type(previous_expression_type.clone())?;
|
||||
self.emit_instruction(r#return, Type::None, self.current_position);
|
||||
}
|
||||
|
||||
|
@ -1,29 +1,14 @@
|
||||
//! Tools used by the compiler to optimize a chunk's bytecode.
|
||||
|
||||
use crate::{instruction::SetLocal, CompileError, Compiler, Instruction, Operation, Span, Type};
|
||||
use crate::{instruction::SetLocal, CompileError, Compiler, Operation};
|
||||
|
||||
fn get_last_operations<const COUNT: usize>(
|
||||
instructions: &[(Instruction, Type, Span)],
|
||||
) -> Option<[Operation; COUNT]> {
|
||||
let mut n_operations = [Operation::Return; COUNT];
|
||||
|
||||
for (nth, operation) in n_operations.iter_mut().rev().zip(
|
||||
instructions
|
||||
.iter()
|
||||
.rev()
|
||||
.map(|(instruction, _, _)| instruction.operation()),
|
||||
) {
|
||||
*nth = operation;
|
||||
}
|
||||
|
||||
Some(n_operations)
|
||||
}
|
||||
|
||||
/// Optimizes a short control flow pattern.
|
||||
/// Optimizes a control flow pattern by removing redundant instructions.
|
||||
///
|
||||
/// If a comparison instruction is followed by a test instruction, the test instruction may be
|
||||
/// redundant because the comparison instruction already sets the correct value. If the test's
|
||||
/// arguments (i.e. the boolean loaders) are `true` and `false` (in that order) then the boolean
|
||||
/// loaders, jump and test instructions are removed, leaving a single comparison instruction.
|
||||
///
|
||||
/// Comparison and test instructions (which are always followed by a JUMP) can be optimized when
|
||||
/// the next instructions are two constant or boolean loaders. The first loader is set to skip
|
||||
/// an instruction if it is run while the second loader is modified to use the first's register.
|
||||
/// This makes the following two code snippets compile to the same bytecode:
|
||||
///
|
||||
/// ```dust
|
||||
@ -35,15 +20,55 @@ fn get_last_operations<const COUNT: usize>(
|
||||
/// ```
|
||||
///
|
||||
/// The instructions must be in the following order:
|
||||
/// - `Equal`, `Less`, `LessEqual` or `Test`
|
||||
/// - `Equal`, `Less` or `LessEqual`
|
||||
/// - `Test`
|
||||
/// - `Jump`
|
||||
/// - `LoadBoolean`
|
||||
/// - `LoadBoolean`
|
||||
pub fn optimize_test_with_explicit_booleans(compiler: &mut Compiler) {
|
||||
if matches!(
|
||||
compiler.get_last_operations(),
|
||||
Some([
|
||||
Operation::Equal | Operation::Less | Operation::LessEqual,
|
||||
Operation::Test,
|
||||
Operation::Jump,
|
||||
Operation::LoadBoolean,
|
||||
Operation::LoadBoolean,
|
||||
])
|
||||
) {
|
||||
log::debug!("Removing redundant test, jump and boolean loaders after comparison");
|
||||
|
||||
let first_loader = compiler.instructions.iter().nth_back(1).unwrap();
|
||||
let second_loader = compiler.instructions.last().unwrap();
|
||||
let first_boolean = first_loader.0.b != 0;
|
||||
let second_boolean = second_loader.0.b != 0;
|
||||
|
||||
if first_boolean && !second_boolean {
|
||||
compiler.instructions.pop();
|
||||
compiler.instructions.pop();
|
||||
compiler.instructions.pop();
|
||||
compiler.instructions.pop();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Optimizes a control flow pattern.
|
||||
///
|
||||
/// Test instructions (which are always followed by a jump) can be optimized when the next
|
||||
/// instructions are two constant or boolean loaders. The first loader is set to skip an instruction
|
||||
/// if it is run while the second loader is modified to use the first's register. This foregoes the
|
||||
/// use of a jump instruction and uses one fewer register.
|
||||
///
|
||||
/// The instructions must be in the following order:
|
||||
/// - `Test`
|
||||
/// - `Jump`
|
||||
/// - `LoadBoolean` or `LoadConstant`
|
||||
/// - `LoadBoolean` or `LoadConstant`
|
||||
pub fn optimize_control_flow(instructions: &mut [(Instruction, Type, Span)]) {
|
||||
pub fn optimize_test_with_loader_arguments(compiler: &mut Compiler) {
|
||||
if !matches!(
|
||||
get_last_operations(instructions),
|
||||
compiler.get_last_operations(),
|
||||
Some([
|
||||
Operation::Equal | Operation::Less | Operation::LessEqual | Operation::Test,
|
||||
Operation::Test,
|
||||
Operation::Jump,
|
||||
Operation::LoadBoolean | Operation::LoadConstant,
|
||||
Operation::LoadBoolean | Operation::LoadConstant,
|
||||
@ -54,17 +79,17 @@ pub fn optimize_control_flow(instructions: &mut [(Instruction, Type, Span)]) {
|
||||
|
||||
log::debug!("Consolidating registers for control flow optimization");
|
||||
|
||||
let first_loader = &mut instructions.iter_mut().nth_back(1).unwrap().0;
|
||||
let first_loader = &mut compiler.instructions.iter_mut().nth_back(1).unwrap().0;
|
||||
|
||||
first_loader.c = true as u8;
|
||||
|
||||
let first_loader_destination = first_loader.a;
|
||||
let second_loader = &mut instructions.last_mut().unwrap().0;
|
||||
let second_loader = &mut compiler.instructions.last_mut().unwrap().0;
|
||||
|
||||
second_loader.a = first_loader_destination;
|
||||
}
|
||||
|
||||
/// Optimizes a math instruction followed by a SetLocal instruction.
|
||||
/// Optimizes a math assignment pattern.
|
||||
///
|
||||
/// The SetLocal instruction is removed and the math instruction is modified to use the local as
|
||||
/// its destination. This makes the following two code snippets compile to the same bytecode:
|
||||
@ -82,7 +107,7 @@ pub fn optimize_control_flow(instructions: &mut [(Instruction, Type, Span)]) {
|
||||
/// The instructions must be in the following order:
|
||||
/// - `Add`, `Subtract`, `Multiply`, `Divide` or `Modulo`
|
||||
/// - `SetLocal`
|
||||
pub fn optimize_set_local(compiler: &mut Compiler) -> Result<(), CompileError> {
|
||||
pub fn condense_set_local_to_math(compiler: &mut Compiler) -> Result<(), CompileError> {
|
||||
if !matches!(
|
||||
compiler.get_last_operations(),
|
||||
Some([
|
||||
|
@ -365,6 +365,9 @@ impl Instruction {
|
||||
| Operation::Multiply
|
||||
| Operation::Divide
|
||||
| Operation::Modulo
|
||||
| Operation::Equal
|
||||
| Operation::Less
|
||||
| Operation::LessEqual
|
||||
| Operation::Negate
|
||||
| Operation::Not
|
||||
| Operation::Call
|
||||
|
@ -30,7 +30,6 @@
|
||||
|
||||
pub mod chunk;
|
||||
pub mod compiler;
|
||||
pub mod disassembler;
|
||||
pub mod dust_error;
|
||||
pub mod instruction;
|
||||
pub mod lexer;
|
||||
@ -41,9 +40,8 @@ pub mod r#type;
|
||||
pub mod value;
|
||||
pub mod vm;
|
||||
|
||||
pub use crate::chunk::{Chunk, Local};
|
||||
pub use crate::chunk::{Chunk, Disassembler, Local};
|
||||
pub use crate::compiler::{compile, CompileError, Compiler};
|
||||
pub use crate::disassembler::Disassembler;
|
||||
pub use crate::dust_error::{AnnotatedError, DustError};
|
||||
pub use crate::instruction::{Argument, Instruction, Operation};
|
||||
pub use crate::lexer::{lex, LexError, Lexer};
|
||||
|
Loading…
Reference in New Issue
Block a user