1
0

Write docs; Flesh out the benchmarks; Clean up

This commit is contained in:
Jeff 2024-12-10 08:04:47 -05:00
parent 3aed724649
commit 5aa8579fae
26 changed files with 237 additions and 226 deletions

View File

@ -1,11 +1,16 @@
# Dust
Dust is a high-level interpreted programming language with static types that focuses on ease of use,
performance and correctness. The syntax, safety features and evaluation model are inspired by Rust.
The instruction set, optimization strategies and virtual machine are inspired by Lua. Unlike Rust
and other compiled languages, Dust has a very low time to execution. Simple programs compile in
under a millisecond on a modern processor. Unlike Lua and most other interpreted languages, Dust is
type-safe, with a simple yet powerful type system that enhances clarity and prevent bugs.
A programming language that is **fast**, **safe** and **easy to use**.
Dust has a simple, expressive syntax that is easy to read and write. This includes a powerful yet
syntactically modest type system with extensive inference capabilities.
The syntax, safety features and evaluation model are inspired by Rust. The instruction set,
optimization strategies and virtual machine are inspired by Lua and academic research (see the
[Inspiration][] section below). Unlike Rust and other compiled languages, Dust has a very low time
to execution. Simple programs compile in milliseconds, even on modest hardware. Unlike Lua and most
other interpreted languages, Dust is type-safe, with a simple yet powerful type system that enhances
clarity and prevent bugs.
```dust
write_line("Enter your name...")
@ -15,15 +20,28 @@ let name = read_line()
write_line("Hello " + name + "!")
```
## Overview
## Project Status
**Dust is under active development and is not yet ready for general use.** Dust is an ambitious
project that acts as a continuous experiment in language design. Features may be redesigned and
reimplemented at will when they do not meet the project's performance and usability goals. This
approach maximizes the development experience as a learning opportunity and enforces a high standard
of quality but slows down the process of delivering features to users.
## Feature Progress
Dust is still in development. This list may change as the language evolves.
This list is a rough outline of the features that are planned to be implemented as soon as possible.
*This is not an exhaustive list of all planned features.* This list is updated and rearranged to
maintain a docket of what is being worked on, what is coming next and what can be revisited later.
- [X] Lexer
- [X] Compiler
- [X] VM
- [ ] Formatter
- [X] Disassembler (for chunk debugging)
- [ ] Formatter
- [ ] REPL
- CLI
- [X] Run source
- [X] Compile to chunk and show disassembly
@ -32,6 +50,7 @@ Dust is still in development. This list may change as the language evolves.
- [ ] Compile to and run from intermediate formats
- [ ] JSON
- [ ] Postcard
- [ ] Integrated REPL
- Basic Values
- [X] No `null` or `undefined` values
- [X] Booleans
@ -40,13 +59,13 @@ Dust is still in development. This list may change as the language evolves.
- [X] Floats (64-bit)
- [X] Functions
- [X] Integers (signed 64-bit)
- [ ] Ranges
- [X] Strings (UTF-8)
- Composite Values
- [X] Concrete lists
- [X] Abstract lists (optimization)
- [ ] Concrete maps
- [ ] Abstract maps (optimization)
- [ ] Ranges
- [ ] Tuples (fixed-size constant lists)
- [ ] Structs
- [ ] Enums
@ -142,8 +161,8 @@ Dust's virtual machine uses 32-bit instructions, which encode seven pieces of in
Bit | Description
----- | -----------
0-4 | Operation code
5 | Flag indicating if the B argument is a constant
6 | Flag indicating if the C argument is a constant
5 | Flag indicating if the B field is a constant
6 | Flag indicating if the C field is a constant
7 | D field (boolean)
8-15 | A field (unsigned 8-bit integer)
16-23 | B field (unsigned 8-bit integer)
@ -151,7 +170,8 @@ Bit | Description
#### Operations
Five bits are used for the operation, which allows for up to 32 operations.
The 1.0 version of Dust will have more than the current number of operations but cannot exceed 32
because of the 5 bit format.
##### Stack manipulation
@ -161,8 +181,10 @@ Five bits are used for the operation, which allows for up to 32 operations.
##### Value loaders
- LOAD_BOOLEAN: Loads a boolean, the value of which is encoded in the instruction, to a register.
- LOAD_CONSTANT: Loads a constant from the constant list to a register.
- LOAD_BOOLEAN: Loads a boolean to a register. Booleans known at compile-time are not stored in the
constant list. Instead, they are encoded in the instruction itself.
- LOAD_CONSTANT: Loads a constant from the constant list to a register. The VM avoids copying the
constant by using a pointer with the constant's index.
- LOAD_LIST: Creates a list abstraction from a range of registers and loads it to a register.
- LOAD_MAP: Creates a map abstraction from a range of registers and loads it to a register.
- LOAD_SELF: Creates an abstraction that represents the current function and loads it to a register.
@ -226,7 +248,7 @@ other information, like the number of arguments for a function call.
### Virtual Machine
The virtual machine is simple and efficient. It uses a stack of registers, which can hold values or
pointers. Pointers can point to values in the constant list, locals list, or the stack itself.
pointers. Pointers can point to values in the constant list or the stack itself.
While the compiler has multiple responsibilities that warrant more complexity, the VM is simple
enough to use a very straightforward design. The VM's `run` function uses a simple `while` loop with
@ -249,19 +271,27 @@ reintroduced in the future.
## Inspiration
[Crafting Interpreters] by Bob Nystrom was a great resource for writing the compiler, especially the
Pratt parser. The book is a great introduction to writing interpreters.
[A No-Frills Introduction to Lua 5.1 VM Instructions] by Kein-Hong Man was a great resource for the
design of Dust's instructions and operation codes. The Lua VM is simple and efficient, and Dust's VM
attempts to be the same, though it is not as optimized for different platforms. Dust's instructions
were originally 32-bit like Lua's, but were changed to 64-bit to allow for more complex information
about the instruction's arguments. Dust's compile-time optimizations are inspired by Lua
optimizations covered in this paper.
Pratt parser. The book is a great introduction to writing interpreters. Had it been discovered
sooner, some early implementations of Dust would have been both simpler in design and more ambitious
in scope.
[The Implementation of Lua 5.0] by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar
Celes was a great resource for understanding register-based virtual machines and their instructions.
This paper is a great resource when designing new features.
This paper was recommended by Bob Nystrom in [Crafting Interpreters].
[A No-Frills Introduction to Lua 5.1 VM Instructions] by Kein-Hong Man has a wealth of detailed
information on how Lua uses terse instructions to create dense chunks that execute quickly. This was
essential in the design of Dust's instructions. Dust uses compile-time optimizations that are based
on Lua optimizations covered in this paper.
[A Performance Survey on Stack-based and Register-based Virtual Machines] by Ruijie Fang and Siqi
Liup was helpful for a quick yet efficient primer on getting stack-based and register-based virtual
machines up and running. The included code examples show how to implement both types of VMs in C.
The performance comparison between the two types of VMs is worth reading for anyone who is trying to
choose between the two. Some of the benchmarks described in the paper inspired similar benchmarks
used in this project to compare Dust to other languages.
[Crafting Interpreters]: https://craftinginterpreters.com/
[The Implementation of Lua 5.0]: https://www.lua.org/doc/jucs05.pdf
[A No-Frills Introduction to Lua 5.1 VM Instructions]: https://www.mcours.net/cours/pdf/hasclic3/hasssclic818.pdf
[A Performance Survey on Stack-based and Register-based Virtual Machines^3]: https://arxiv.org/abs/1611.00467

View File

@ -1,5 +1,5 @@
let mut i = 0
while i < 1_000_000 {
while i < 5_000_000 {
i += 1
}

View File

@ -0,0 +1,5 @@
var i = 0;
while (i < 5_000_000) {
i++;
}

View File

@ -1,5 +1,5 @@
local i = 1
while i < 1000000 do
while i < 5000000 do
i = i + 1
end

View File

@ -0,0 +1,4 @@
i = 1
while i < 5_000_000:
i += 1

View File

@ -0,0 +1,11 @@
hyperfine \
--shell none \
--prepare 'sync' \
--warmup 5 \
'../../target/release/dust addictive_addition.ds' \
'node addictive_addition.js' \
'deno addictive_addition.js' \
'bun addictive_addition.js' \
'python addictive_addition.py' \
'lua addictive_addition.lua'

View File

@ -1,5 +0,0 @@
var i = 0;
while (i < 1_000_000) {
i++;
}

View File

@ -1,4 +0,0 @@
i = 1
while i < 1_000_000:
i += 1

View File

@ -1,9 +0,0 @@
hyperfine \
--shell none \
--prepare 'sync' \
--warmup 5 \
'../target/release/dust assets/count_to_one_million.ds' \
'node assets/count_to_one_million.js' \
'deno assets/count_to_one_million.js' \
'python assets/count_to_one_million.py' \
'lua assets/count_to_one_million.lua'

9
bench/fibonacci/run.sh Normal file
View File

@ -0,0 +1,9 @@
hyperfine \
--shell none \
--prepare 'sync' \
--warmup 5 \
'../../target/release/dust ../../examples/fibonacci.ds' \
'node fibonacci.js' \
'deno fibonacci.js' \
'bun fibonacci.js' \
'python fibonacci.py'

View File

@ -0,0 +1,9 @@
fn decrement(i: int) -> str {
if i == 0 {
return "Done!";
}
decrement(i - 1)
}
decrement(1000)

View File

@ -0,0 +1,9 @@
function decrement(i) {
if (i == 0) {
return "Done!";
}
return decrement(i - 1);
}
decrement(1000);

View File

@ -0,0 +1,6 @@
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `../../target/release/dust recursion.ds` | 2.0 ± 0.2 | 1.9 | 2.9 | 1.00 |
| `node recursion.js` | 42.4 ± 0.9 | 41.3 | 45.5 | 21.11 ± 1.68 |
| `deno recursion.js` | 21.2 ± 1.7 | 17.3 | 23.9 | 10.57 ± 1.15 |
| `bun recursion.js` | 8.3 ± 0.3 | 7.3 | 9.7 | 4.13 ± 0.36 |

8
bench/recursion/run.sh Normal file
View File

@ -0,0 +1,8 @@
hyperfine \
--shell none \
--prepare 'sync' \
--warmup 5 \
'../../target/release/dust recursion.ds' \
'node recursion.js' \
'deno recursion.js' \
'bun recursion.js'

View File

@ -59,7 +59,7 @@ const INSTRUCTION_BORDERS: [&str; 3] = [
const LOCAL_COLUMNS: [(&str, usize); 5] = [
("i", 5),
("IDENTIFIER", 16),
("VALUE", 10),
("REGISTER", 10),
("SCOPE", 7),
("MUTABLE", 7),
];
@ -142,11 +142,11 @@ impl<'a, W: Write> Disassembler<'a, W> {
write!(&mut self.writer, "{}", c)
}
fn write_str(&mut self, text: &str) -> Result<(), io::Error> {
fn write_colored(&mut self, text: &ColoredString) -> Result<(), io::Error> {
write!(&mut self.writer, "{}", text)
}
fn write_colored(&mut self, text: &ColoredString) -> Result<(), io::Error> {
fn write_str(&mut self, text: &str) -> Result<(), io::Error> {
write!(&mut self.writer, "{}", text)
}
@ -343,8 +343,8 @@ impl<'a, W: Write> Disassembler<'a, W> {
} else {
let mut value_string = value.to_string();
if value_string.len() > 15 {
value_string = format!("{value_string:.12}...");
if value_string.len() > 26 {
value_string = format!("{value_string:.23}...");
}
value_string

View File

@ -13,10 +13,7 @@ use std::{
};
use colored::Colorize;
use optimize::{
condense_set_local_to_math, optimize_test_with_explicit_booleans,
optimize_test_with_loader_arguments,
};
use optimize::{optimize_test_with_explicit_booleans, optimize_test_with_loader_arguments};
use smallvec::{smallvec, SmallVec};
use crate::{
@ -617,14 +614,47 @@ impl<'src> Compiler<'src> {
&mut self,
instruction: &Instruction,
) -> Result<(Argument, bool), CompileError> {
let argument =
instruction
.as_argument()
.ok_or_else(|| CompileError::ExpectedExpression {
let (argument, push_back) = match instruction.operation() {
Operation::LoadConstant => (Argument::Constant(instruction.b), false),
Operation::GetLocal => {
let local_index = instruction.b;
let (local, _) = self.get_local(local_index)?;
(Argument::Register(local.register_index), false)
}
Operation::LoadBoolean
| Operation::LoadList
| Operation::LoadSelf
| Operation::Add
| Operation::Subtract
| Operation::Multiply
| Operation::Divide
| Operation::Modulo
| Operation::Equal
| Operation::Less
| Operation::LessEqual
| Operation::Negate
| Operation::Not
| Operation::Call => (Argument::Register(instruction.a), true),
Operation::CallNative => {
let function = NativeFunction::from(instruction.b);
if function.returns_value() {
(Argument::Register(instruction.a), true)
} else {
return Err(CompileError::ExpectedExpression {
found: self.previous_token.to_owned(),
position: self.previous_position,
});
}
}
_ => {
return Err(CompileError::ExpectedExpression {
found: self.previous_token.to_owned(),
position: self.previous_position,
})?;
let push_back = matches!(argument, Argument::Register(_));
})
}
};
Ok((argument, push_back))
}
@ -648,6 +678,12 @@ impl<'src> Compiler<'src> {
} else {
false
};
if push_back_left {
self.instructions
.push((left_instruction, left_type.clone(), left_position));
}
let operator = self.current_token;
let operator_position = self.current_position;
let rule = ParseRule::from(&operator);
@ -660,11 +696,6 @@ impl<'src> Compiler<'src> {
| Token::PercentEqual
);
if push_back_left {
self.instructions
.push((left_instruction, left_type.clone(), left_position));
}
if is_assignment && !left_is_mutable_local {
return Err(CompileError::ExpectedMutableVariable {
found: self.previous_token.to_owned(),
@ -672,33 +703,6 @@ impl<'src> Compiler<'src> {
});
}
match operator {
Token::Plus | Token::PlusEqual => {
Compiler::expect_addable_type(&left_type, &left_position)?
}
Token::Minus | Token::MinusEqual => {
Compiler::expect_subtractable_type(&left_type, &left_position)?
}
Token::Slash | Token::SlashEqual => {
Compiler::expect_dividable_type(&left_type, &left_position)?
}
Token::Star | Token::StarEqual => {
Compiler::expect_multipliable_type(&left_type, &left_position)?
}
Token::Percent | Token::PercentEqual => {
Compiler::expect_modulable_type(&left_type, &left_position)?
}
_ => {}
}
let r#type = if is_assignment {
Type::None
} else if left_type == Type::Character {
Type::String
} else {
left_type.clone()
};
self.advance()?;
self.parse_sub_expression(&rule.precedence)?;
@ -707,6 +711,7 @@ impl<'src> Compiler<'src> {
match operator {
Token::Plus | Token::PlusEqual => {
Compiler::expect_addable_type(&left_type, &left_position)?;
Compiler::expect_addable_type(&right_type, &right_position)?;
Compiler::expect_addable_types(
&left_type,
@ -716,6 +721,7 @@ impl<'src> Compiler<'src> {
)?;
}
Token::Minus | Token::MinusEqual => {
Compiler::expect_subtractable_type(&left_type, &left_position)?;
Compiler::expect_subtractable_type(&right_type, &right_position)?;
Compiler::expect_subtractable_types(
&left_type,
@ -725,6 +731,7 @@ impl<'src> Compiler<'src> {
)?;
}
Token::Slash | Token::SlashEqual => {
Compiler::expect_dividable_type(&left_type, &left_position)?;
Compiler::expect_dividable_type(&right_type, &right_position)?;
Compiler::expect_dividable_types(
&left_type,
@ -734,6 +741,7 @@ impl<'src> Compiler<'src> {
)?;
}
Token::Star | Token::StarEqual => {
Compiler::expect_multipliable_type(&left_type, &left_position)?;
Compiler::expect_multipliable_type(&right_type, &right_position)?;
Compiler::expect_multipliable_types(
&left_type,
@ -743,6 +751,7 @@ impl<'src> Compiler<'src> {
)?;
}
Token::Percent | Token::PercentEqual => {
Compiler::expect_modulable_type(&left_type, &left_position)?;
Compiler::expect_modulable_type(&right_type, &right_position)?;
Compiler::expect_modulable_types(
&left_type,
@ -759,6 +768,13 @@ impl<'src> Compiler<'src> {
.push((right_instruction, right_type, right_position));
}
let r#type = if is_assignment {
Type::None
} else if left_type == Type::Character {
Type::String
} else {
left_type.clone()
};
let destination = if is_assignment {
match left {
Argument::Register(register) => register,
@ -983,6 +999,7 @@ impl<'src> Compiler<'src> {
.get_local(local_index)
.map(|(local, r#type)| (local, r#type.clone()))?;
let is_mutable = local.is_mutable;
let local_register_index = local.register_index;
if !self.current_scope.contains(&local.scope) {
return Err(CompileError::VariableOutOfScope {
@ -1003,14 +1020,23 @@ impl<'src> Compiler<'src> {
self.parse_expression()?;
let register = self.next_register() - 1;
let set_local = Instruction::from(SetLocal {
register_index: register,
local_index,
});
if self
.instructions
.last()
.map_or(false, |(instruction, _, _)| instruction.is_math())
{
let (math_instruction, _, _) = self.instructions.last_mut().unwrap();
self.emit_instruction(set_local, Type::None, start_position);
condense_set_local_to_math(self)?;
math_instruction.a = local_register_index;
} else {
let register = self.next_register() - 1;
let set_local = Instruction::from(SetLocal {
register_index: register,
local_index,
});
self.emit_instruction(set_local, Type::None, start_position);
}
return Ok(());
}
@ -1181,10 +1207,12 @@ impl<'src> Compiler<'src> {
match else_block_distance {
0 => {}
1 => {
if let Some(skippable) =
self.get_last_jumpable_mut_between(1, if_block_distance as usize)
if let Some([Operation::LoadBoolean | Operation::LoadConstant]) =
self.get_last_operations()
{
skippable.c = true as u8;
let (mut loader, _, _) = self.instructions.last_mut().unwrap();
loader.c = true as u8;
} else {
if_block_distance += 1;
let jump = Instruction::from(Jump {

View File

@ -1,6 +1,6 @@
//! Tools used by the compiler to optimize a chunk's bytecode.
//! Functions used by the compiler to optimize a chunk's bytecode during compilation.
use crate::{instruction::SetLocal, CompileError, Compiler, Operation};
use crate::{Compiler, Operation};
/// Optimizes a control flow pattern by removing redundant instructions.
///
@ -56,8 +56,10 @@ pub fn optimize_test_with_explicit_booleans(compiler: &mut Compiler) {
///
/// Test instructions (which are always followed by a jump) can be optimized when the next
/// instructions are two constant or boolean loaders. The first loader is set to skip an instruction
/// if it is run while the second loader is modified to use the first's register. This foregoes the
/// use of a jump instruction and uses one fewer register.
/// if it is run while the second loader is modified to use the first's register. Foregoing the use
/// a jump instruction is an optimization but consolidating the registers is a necessity. This is
/// because test instructions are essentially control flow and a subsequent SET_LOCAL instruction
/// would not know at compile time which branch would be executed at runtime.
///
/// The instructions must be in the following order:
/// - `Test`
@ -88,48 +90,3 @@ pub fn optimize_test_with_loader_arguments(compiler: &mut Compiler) {
second_loader.a = first_loader_destination;
}
/// Optimizes a math assignment pattern.
///
/// The SetLocal instruction is removed and the math instruction is modified to use the local as
/// its destination. This makes the following two code snippets compile to the same bytecode:
///
/// ```dust
/// let a = 0;
/// a = a + 1;
/// ```
///
/// ```dust
/// let a = 0;
/// a += 1;
/// ```
///
/// The instructions must be in the following order:
/// - `Add`, `Subtract`, `Multiply`, `Divide` or `Modulo`
/// - `SetLocal`
pub fn condense_set_local_to_math(compiler: &mut Compiler) -> Result<(), CompileError> {
if !matches!(
compiler.get_last_operations(),
Some([
Operation::Add
| Operation::Subtract
| Operation::Multiply
| Operation::Divide
| Operation::Modulo,
Operation::SetLocal,
])
) {
return Ok(());
}
log::debug!("Condensing math and SetLocal to math instruction");
let set_local = SetLocal::from(&compiler.instructions.pop().unwrap().0);
let (local, _) = compiler.get_local(set_local.local_index)?;
let local_register_index = local.register_index;
let math_instruction = &mut compiler.instructions.last_mut().unwrap().0;
math_instruction.a = local_register_index;
Ok(())
}

View File

@ -5,8 +5,8 @@
//! Bit | Description
//! ----- | -----------
//! 0-4 | Operation code
//! 5 | Flag indicating if the B argument is a constant
//! 6 | Flag indicating if the C argument is a constant
//! 5 | Flag indicating if the B field is a constant
//! 6 | Flag indicating if the C field is a constant
//! 7 | D field (boolean)
//! 8-15 | A field (unsigned 8-bit integer)
//! 16-23 | B field (unsigned 8-bit integer)
@ -353,6 +353,24 @@ impl Instruction {
})
}
pub fn is_math(&self) -> bool {
matches!(
self.operation(),
Operation::Add
| Operation::Subtract
| Operation::Multiply
| Operation::Divide
| Operation::Modulo
)
}
pub fn is_comparison(&self) -> bool {
matches!(
self.operation(),
Operation::Equal | Operation::Less | Operation::LessEqual
)
}
pub fn as_argument(&self) -> Option<Argument> {
match self.operation() {
Operation::LoadConstant => Some(Argument::Constant(self.b)),

View File

@ -198,10 +198,6 @@ impl<'src> Lexer<'src> {
self.next_char();
while let Some(peek_char) = self.peek_char() {
if peek_char == ' ' {
break;
}
if let '0'..='9' = peek_char {
self.next_char();
@ -224,14 +220,7 @@ impl<'src> Lexer<'src> {
continue;
}
return Err(LexError::ExpectedCharacterMultiple {
expected: &[
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'e', 'E', '+',
'-',
],
actual: peek_char,
position: self.position,
});
break;
}
} else {
break;

View File

@ -35,7 +35,7 @@ pub enum Type {
String,
Struct(StructType),
Tuple {
fields: Option<Box<SmallVec<[Type; 4]>>>,
fields: Box<SmallVec<[Type; 4]>>,
},
}
@ -198,21 +198,17 @@ impl Display for Type {
Type::String => write!(f, "str"),
Type::Struct(struct_type) => write!(f, "{struct_type}"),
Type::Tuple { fields } => {
if let Some(fields) = fields {
write!(f, "(")?;
write!(f, "(")?;
for (index, r#type) in fields.iter().enumerate() {
write!(f, "{type}")?;
if index != fields.len() - 1 {
write!(f, ", ")?;
}
for (index, r#type) in fields.iter().enumerate() {
if index > 0 {
write!(f, ", ")?;
}
write!(f, ")")
} else {
write!(f, "tuple")
write!(f, "{type}")?;
}
write!(f, ")")
}
}
}

View File

@ -1,21 +0,0 @@
count_slowly = fn (
multiplier: int,
) {
i = 0
while i < 10 {
sleep_time = i * multiplier;
thread.sleep(sleep_time)
thread.write_line(i as str)
i += 1
}
}
async {
count_slowly(50)
count_slowly(100)
count_slowly(200)
count_slowly(250)
}

View File

@ -1,5 +0,0 @@
let mut i = 0;
while i < 10000 {
i += 1;
}

View File

@ -1,4 +0,0 @@
input = fs.read_file('examples/assets/data.json')
data = json.parse(input)
length(data)

View File

@ -1,20 +0,0 @@
// This function returns its argument.
foo = fn <T>(x: T) -> T { x }
// Use turbofish to supply type information.
bar = foo::<str>("hi")
// Use type annotation
baz: str = foo("hi")
// The `json.parse` function takes a string and returns the specified type
// Use turbofish
x = json.parse::<int>("1")
// Use type annotation
x: int = json.parse("1")
x: int = {
json.parse("1")
}