2024-03-20 09:31:14 +00:00
|
|
|
# Dust
|
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
Dust is a high-level interpreted programming language with static types that focuses on ease of use,
|
2024-11-30 03:43:13 +00:00
|
|
|
performance and correctness. The syntax, safety features and evaluation model are inspired by Rust.
|
2024-11-30 04:58:33 +00:00
|
|
|
The instruction set, optimization strategies and virtual machine are inspired by Lua. Unlike Rust
|
|
|
|
and other compiled languages, Dust has a very low time to execution. Simple programs compile in
|
|
|
|
under a millisecond on a modern processor. Unlike Lua and most other interpreted languages, Dust is
|
|
|
|
type-safe, with a simple yet powerful type system that enhances clarity and prevent bugs.
|
|
|
|
|
|
|
|
```dust
|
|
|
|
write_line("Enter your name...")
|
|
|
|
|
|
|
|
let name = read_line()
|
|
|
|
|
|
|
|
write_line("Hello " + name + "!")
|
|
|
|
```
|
2024-08-02 19:10:29 +00:00
|
|
|
|
2024-11-30 00:30:08 +00:00
|
|
|
## Feature Progress
|
2024-11-30 00:13:53 +00:00
|
|
|
|
2024-11-30 03:43:13 +00:00
|
|
|
Dust is still in development. This list may change as the language evolves.
|
|
|
|
|
2024-11-30 00:30:08 +00:00
|
|
|
- [X] Lexer
|
|
|
|
- [X] Compiler
|
|
|
|
- [X] VM
|
|
|
|
- [ ] Formatter
|
2024-11-30 04:58:33 +00:00
|
|
|
- [X] Disassembler (for chunk debugging)
|
2024-11-30 00:30:08 +00:00
|
|
|
- CLI
|
|
|
|
- [X] Run source
|
|
|
|
- [X] Compile to chunk and show disassembly
|
2024-12-03 23:46:21 +00:00
|
|
|
- [X] Tokenize using the lexer and show token list
|
2024-11-30 00:30:08 +00:00
|
|
|
- [ ] Format using the formatter and display the output
|
|
|
|
- [ ] Compile to and run from intermediate formats
|
|
|
|
- [ ] JSON
|
|
|
|
- [ ] Postcard
|
2024-11-30 06:27:53 +00:00
|
|
|
- Basic Values
|
2024-11-30 04:58:33 +00:00
|
|
|
- [X] No `null` or `undefined` values
|
|
|
|
- [X] Booleans
|
|
|
|
- [X] Bytes (unsigned 8-bit)
|
|
|
|
- [X] Characters (Unicode scalar value)
|
|
|
|
- [X] Floats (64-bit)
|
2024-11-30 00:30:08 +00:00
|
|
|
- [X] Functions
|
2024-11-30 06:27:53 +00:00
|
|
|
- [X] Integers (signed 64-bit)
|
2024-11-30 03:43:13 +00:00
|
|
|
- [ ] Ranges
|
2024-11-30 04:58:33 +00:00
|
|
|
- [X] Strings (UTF-8)
|
2024-12-02 02:17:22 +00:00
|
|
|
- Composite Values
|
2024-11-30 04:58:33 +00:00
|
|
|
- [X] Concrete lists
|
|
|
|
- [X] Abstract lists (optimization)
|
|
|
|
- [ ] Concrete maps
|
|
|
|
- [ ] Abstract maps (optimization)
|
|
|
|
- [ ] Tuples (fixed-size constant lists)
|
2024-11-30 00:30:08 +00:00
|
|
|
- [ ] Structs
|
2024-11-30 04:58:33 +00:00
|
|
|
- [ ] Enums
|
2024-11-30 00:13:53 +00:00
|
|
|
- Types
|
2024-11-30 06:27:53 +00:00
|
|
|
- [X] Basic types for each kind of basic value
|
2024-11-30 04:58:33 +00:00
|
|
|
- [X] Generalized types: `num`, `any`, `none`
|
2024-11-30 00:30:08 +00:00
|
|
|
- [ ] `struct` types
|
|
|
|
- [ ] `enum` types
|
2024-11-30 03:43:13 +00:00
|
|
|
- [ ] Type aliases
|
2024-11-30 00:30:08 +00:00
|
|
|
- [ ] Type arguments
|
2024-11-30 03:43:13 +00:00
|
|
|
- [ ] Compile-time type checking
|
2024-11-30 00:30:08 +00:00
|
|
|
- [ ] Function returns
|
|
|
|
- [X] If/Else branches
|
|
|
|
- [ ] Instruction arguments
|
|
|
|
- Variables
|
|
|
|
- [X] Immutable by default
|
|
|
|
- [X] Block scope
|
|
|
|
- [X] Statically typed
|
2024-11-30 03:43:13 +00:00
|
|
|
- [X] Copy-free identifiers are stored in the chunk as string constants
|
2024-11-30 00:30:08 +00:00
|
|
|
- Functions
|
|
|
|
- [X] First-class value
|
|
|
|
- [X] Statically typed arguments and returns
|
2024-11-30 03:43:13 +00:00
|
|
|
- [X] Pure (no "closure" of local variables, arguments are the only input)
|
2024-11-30 00:30:08 +00:00
|
|
|
- [ ] Type arguments
|
2024-11-30 03:43:13 +00:00
|
|
|
- Control Flow
|
|
|
|
- [X] If/Else
|
|
|
|
- [ ] Loops
|
|
|
|
- [ ] `for`
|
|
|
|
- [ ] `loop`
|
|
|
|
- [X] `while`
|
|
|
|
- [ ] Match
|
2024-11-30 00:13:53 +00:00
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
## Implementation
|
2024-08-02 19:10:29 +00:00
|
|
|
|
2024-11-30 03:43:13 +00:00
|
|
|
Dust is implemented in Rust and is divided into several parts, most importantly the lexer, compiler,
|
|
|
|
and virtual machine. All of Dust's components are designed with performance in mind and the codebase
|
2024-11-30 04:58:33 +00:00
|
|
|
uses as few dependencies as possible. The code is tested by integration tests that compile source
|
|
|
|
code and check the compiled chunk, then run the source and check the output of the virtual machine.
|
|
|
|
It is important to maintain a high level of quality by writing meaningful tests and preferring to
|
|
|
|
compile and run programs in an optimal way before adding new features.
|
2024-08-02 19:10:29 +00:00
|
|
|
|
2024-11-30 04:58:33 +00:00
|
|
|
### Lexer and Tokens
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
The lexer emits tokens from the source code. Dust makes extensive use of Rust's zero-copy
|
|
|
|
capabilities to avoid unnecessary allocations when creating tokens. A token, depending on its type,
|
|
|
|
may contain a reference to some data from the source code. The data is only copied in the case of an
|
2024-11-30 04:58:33 +00:00
|
|
|
error. In a successfully executed program, no part of the source code is copied unless it is a
|
|
|
|
string literal or identifier.
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
### Compiler
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
The compiler creates a chunk, which contains all of the data needed by the virtual machine to run a
|
|
|
|
Dust program. It does so by emitting bytecode instructions, constants and locals while parsing the
|
|
|
|
tokens, which are generated one at a time by the lexer.
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
#### Parsing
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
Dust's compiler uses a custom Pratt parser, a kind of recursive descent parser, to translate a
|
2024-11-30 03:43:13 +00:00
|
|
|
sequence of tokens into a chunk. Each token is given a precedence and may have a prefix and/or infix
|
|
|
|
parser. The parsers are just functions that modify the compiler and its output. For example, when
|
|
|
|
the compiler encounters a boolean token, its prefix parser is the `parse_boolean` function, which
|
|
|
|
emits a `LoadBoolean` instruction. An integer token's prefix parser is `parse_integer`, which emits
|
|
|
|
a `LoadConstant` instruction and adds the integer to the constant list. Tokens with infix parsers
|
|
|
|
include the math operators, which emit `Add`, `Subtract`, `Multiply`, `Divide`, and `Modulo`
|
|
|
|
instructions.
|
|
|
|
|
|
|
|
Functions are compiled into their own chunks, which are stored in the constant list. A function's
|
|
|
|
arguments are stored in the locals list. The VM must later bind the arguments to runtime values by
|
|
|
|
assigning each argument a register and associating the register with the local.
|
2024-08-02 19:10:29 +00:00
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
#### Optimizing
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
When generating instructions for a register-based virtual machine, there are opportunities to
|
2024-11-30 03:43:13 +00:00
|
|
|
optimize the generated code by using fewer instructions or fewer registers. While it is best to
|
2024-12-04 05:04:56 +00:00
|
|
|
output optimal code in the first place, it is not always possible. Dust's compiler modifies the
|
|
|
|
instruction list during parsing to apply optimizations before the chunk is completed. There is no
|
|
|
|
separate optimization pass, and the compiler cannot be run in a mode that disables optimizations.
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-11-30 04:58:33 +00:00
|
|
|
#### Type Checking
|
|
|
|
|
|
|
|
Dust's compiler associates each emitted instruction with a type. This allows the compiler to enforce
|
|
|
|
compatibility when values are used in expressions. For example, the compiler will not allow a string
|
|
|
|
to be added to an integer, but it will allow either to be added to another of the same type. Aside
|
|
|
|
from instruction arguments, the compiler also checks the types of function arguments and the blocks
|
|
|
|
of `if`/`else` statements.
|
|
|
|
|
|
|
|
The compiler always checks types on the fly, so there is no need for a separate type-checking pass.
|
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
### Instructions
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-12-09 15:30:57 +00:00
|
|
|
Dust's virtual machine uses 32-bit instructions, which encode seven pieces of information:
|
2024-11-30 03:43:13 +00:00
|
|
|
|
|
|
|
Bit | Description
|
|
|
|
----- | -----------
|
2024-12-09 15:30:57 +00:00
|
|
|
0-4 | Operation code
|
|
|
|
5 | Flag indicating if the B argument is a constant
|
|
|
|
6 | Flag indicating if the C argument is a constant
|
|
|
|
7 | D field (boolean)
|
|
|
|
8-15 | A field (unsigned 8-bit integer)
|
|
|
|
16-23 | B field (unsigned 8-bit integer)
|
|
|
|
24-31 | C field (unsigned 8-bit integer)
|
|
|
|
|
|
|
|
#### Operations
|
|
|
|
|
|
|
|
Five bits are used for the operation, which allows for up to 32 operations.
|
|
|
|
|
|
|
|
##### Stack manipulation
|
|
|
|
|
|
|
|
- MOVE: Makes a register's value available in another register by using a pointer. This avoids
|
|
|
|
copying the value or invalidating the original register.
|
|
|
|
- CLOSE: Sets a range of registers to the "empty" state.
|
|
|
|
|
|
|
|
##### Value loaders
|
|
|
|
|
|
|
|
- LOAD_BOOLEAN: Loads a boolean, the value of which is encoded in the instruction, to a register.
|
|
|
|
- LOAD_CONSTANT: Loads a constant from the constant list to a register.
|
|
|
|
- LOAD_LIST: Creates a list abstraction from a range of registers and loads it to a register.
|
|
|
|
- LOAD_MAP: Creates a map abstraction from a range of registers and loads it to a register.
|
|
|
|
- LOAD_SELF: Creates an abstraction that represents the current function and loads it to a register.
|
|
|
|
|
|
|
|
##### Variable operations
|
|
|
|
|
|
|
|
- GET_LOCAL: Loads a variable's value to a register by using a pointer to point to the variable's
|
|
|
|
canonical register (i.e. the register whose index is stored in the locals list).
|
|
|
|
- SET_LOCAL: Changes a variable's register to a pointer to another register, effectively changing
|
|
|
|
the variable's value.
|
|
|
|
|
|
|
|
##### Arithmetic
|
|
|
|
|
|
|
|
Arithmetic instructions use every field except for D. The A field is the destination register, the B
|
|
|
|
and C fields are the arguments, and the flags indicate whether the arguments are constants.
|
|
|
|
|
|
|
|
- ADD: Adds two values and stores the result in a register. Unlike the other arithmetic operations,
|
|
|
|
the ADD instruction can also be used to concatenate strings and characters.
|
|
|
|
- SUBTRACT: Subtracts one argument from another and stores the result in a register.
|
|
|
|
- MULTIPLY: Multiplies two arguments and stores the result in a register.
|
|
|
|
- DIVIDE: Divides one value by another and stores the result in a register.
|
|
|
|
- MODULO: Calculates the division remainder of two values and stores the result in a register.
|
|
|
|
- POWER: Raises one value to the power of another and stores the result in a register.
|
|
|
|
|
|
|
|
##### Logic
|
|
|
|
|
|
|
|
Logic instructions work differently from arithmetic and comparison instructions, but they are still
|
|
|
|
essentially binary operations with a left and a right argument. Rather than performing some
|
|
|
|
calculation and storing a result, the logic instructions perform a check on the left-hand argument
|
|
|
|
and, based on the result, either skip the right-hand argument or allow it to be executed. A `TEST`
|
|
|
|
is always followed by a `JUMP`. If the left argument passes the test (a boolean equality check), the
|
|
|
|
`JUMP` instruction is skipped and the right argument is executed. If the left argument fails the
|
|
|
|
test, the `JUMP` is not skipped and it jumps past the right argument.
|
|
|
|
|
|
|
|
- TEST
|
|
|
|
- TEST_SET
|
|
|
|
|
|
|
|
##### Comparison
|
|
|
|
|
|
|
|
- EQUAL
|
|
|
|
- LESS
|
|
|
|
- LESS_EQUAL
|
|
|
|
|
|
|
|
##### Unary operations
|
|
|
|
|
|
|
|
- NEGATE
|
|
|
|
- NOT
|
|
|
|
|
|
|
|
##### Execution
|
|
|
|
|
|
|
|
- CALL
|
|
|
|
- CALL_NATIVE
|
|
|
|
- JUMP
|
|
|
|
- RETURN
|
|
|
|
|
|
|
|
|
|
|
|
The A, B, and C
|
|
|
|
fields are used for usually used as indexes into the constant list or stack, but they can also hold
|
|
|
|
other information, like the number of arguments for a function call.
|
2024-11-30 03:43:13 +00:00
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
### Virtual Machine
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-11-30 04:58:33 +00:00
|
|
|
The virtual machine is simple and efficient. It uses a stack of registers, which can hold values or
|
2024-12-09 15:30:57 +00:00
|
|
|
pointers. Pointers can point to values in the constant list, locals list, or the stack itself.
|
2024-11-30 04:58:33 +00:00
|
|
|
|
|
|
|
While the compiler has multiple responsibilities that warrant more complexity, the VM is simple
|
|
|
|
enough to use a very straightforward design. The VM's `run` function uses a simple `while` loop with
|
|
|
|
a `match` statement to execute instructions. When it reaches a `Return` instruction, it breaks the
|
|
|
|
loop and optionally returns a value.
|
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
## Previous Implementations
|
2024-08-14 02:25:33 +00:00
|
|
|
|
2024-11-30 04:58:33 +00:00
|
|
|
Dust has gone through several iterations, each with its own design choices. It was originally
|
|
|
|
implemented with a syntax tree generated by an external parser, then a parser generator, and finally
|
|
|
|
a custom parser. Eventually the language was rewritten to use bytecode instructions and a virtual
|
|
|
|
machine. The current implementation is by far the most performant and the general design is unlikely
|
|
|
|
to change.
|
2024-11-30 03:43:13 +00:00
|
|
|
|
|
|
|
Dust previously had a more complex type system with type arguments (or "generics") and a simple
|
|
|
|
model for asynchronous execution of statements. Both of these features were removed to simplify the
|
|
|
|
language when it was rewritten to use bytecode instructions. Both features are planned to be
|
|
|
|
reintroduced in the future.
|
|
|
|
|
2024-11-07 00:18:38 +00:00
|
|
|
## Inspiration
|
2024-03-20 09:31:14 +00:00
|
|
|
|
2024-12-03 23:38:47 +00:00
|
|
|
[Crafting Interpreters] by Bob Nystrom was a great resource for writing the compiler, especially the
|
|
|
|
Pratt parser. The book is a great introduction to writing interpreters.
|
2024-11-30 03:43:13 +00:00
|
|
|
|
|
|
|
[A No-Frills Introduction to Lua 5.1 VM Instructions] by Kein-Hong Man was a great resource for the
|
|
|
|
design of Dust's instructions and operation codes. The Lua VM is simple and efficient, and Dust's VM
|
|
|
|
attempts to be the same, though it is not as optimized for different platforms. Dust's instructions
|
|
|
|
were originally 32-bit like Lua's, but were changed to 64-bit to allow for more complex information
|
2024-12-03 23:38:47 +00:00
|
|
|
about the instruction's arguments. Dust's compile-time optimizations are inspired by Lua
|
|
|
|
optimizations covered in this paper.
|
2024-11-30 03:43:13 +00:00
|
|
|
|
|
|
|
[The Implementation of Lua 5.0] by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar
|
2024-12-03 23:38:47 +00:00
|
|
|
Celes was a great resource for understanding register-based virtual machines and their instructions.
|
|
|
|
This paper is a great resource when designing new features.
|
2024-11-30 03:43:13 +00:00
|
|
|
|
|
|
|
[Crafting Interpreters]: https://craftinginterpreters.com/
|
|
|
|
[The Implementation of Lua 5.0]: https://www.lua.org/doc/jucs05.pdf
|
|
|
|
[A No-Frills Introduction to Lua 5.1 VM Instructions]: https://www.mcours.net/cours/pdf/hasclic3/hasssclic818.pdf
|