1
0

Add more README.md content

This commit is contained in:
Jeff 2024-11-29 23:58:33 -05:00
parent 3b23136eda
commit 98e9a5ed21
2 changed files with 84 additions and 53 deletions

116
README.md
View File

@ -2,9 +2,18 @@
Dust is a high-level interpreted programming language with static types that focuses on ease of use, Dust is a high-level interpreted programming language with static types that focuses on ease of use,
performance and correctness. The syntax, safety features and evaluation model are inspired by Rust. performance and correctness. The syntax, safety features and evaluation model are inspired by Rust.
Due to being interpreted, Dust's total time to execution is much lower than Rust's. Unlike other The instruction set, optimization strategies and virtual machine are inspired by Lua. Unlike Rust
interpreted languages, Dust is type-safe, with a simple yet powerful type system that enhances the and other compiled languages, Dust has a very low time to execution. Simple programs compile in
clarity and correctness of a program. under a millisecond on a modern processor. Unlike Lua and most other interpreted languages, Dust is
type-safe, with a simple yet powerful type system that enhances clarity and prevent bugs.
```dust
write_line("Enter your name...")
let name = read_line()
write_line("Hello " + name + "!")
```
## Feature Progress ## Feature Progress
@ -14,6 +23,7 @@ Dust is still in development. This list may change as the language evolves.
- [X] Compiler - [X] Compiler
- [X] VM - [X] VM
- [ ] Formatter - [ ] Formatter
- [X] Disassembler (for chunk debugging)
- CLI - CLI
- [X] Run source - [X] Run source
- [X] Compile to chunk and show disassembly - [X] Compile to chunk and show disassembly
@ -23,19 +33,25 @@ Dust is still in development. This list may change as the language evolves.
- [ ] JSON - [ ] JSON
- [ ] Postcard - [ ] Postcard
- Values - Values
- [X] Basic values: booleans, bytes, characters, integers, floats, UTF-8 strings - [X] No `null` or `undefined` values
- [X] No `null` or `undefined` - [X] Booleans
- [ ] Enums - [X] Bytes (unsigned 8-bit)
- [X] Characters (Unicode scalar value)
- [X] Floats (64-bit)
- [X] Functions - [X] Functions
- [X] Lists - [X] Integer (signed 64-bit)
- [ ] Maps
- [ ] Ranges - [ ] Ranges
- [X] Strings (UTF-8)
- [X] Concrete lists
- [X] Abstract lists (optimization)
- [ ] Concrete maps
- [ ] Abstract maps (optimization)
- [ ] Tuples (fixed-size constant lists)
- [ ] Structs - [ ] Structs
- [ ] Tuples - [ ] Enums
- [ ] Runtime-efficient abstract values for lists and maps
- Types - Types
- [X] Basic types for each kind of value - [X] Basic types for each kind of value
- [X] Generalized types: `num`, `any` - [X] Generalized types: `num`, `any`, `none`
- [ ] `struct` types - [ ] `struct` types
- [ ] `enum` types - [ ] `enum` types
- [ ] Type aliases - [ ] Type aliases
@ -44,7 +60,6 @@ Dust is still in development. This list may change as the language evolves.
- [ ] Function returns - [ ] Function returns
- [X] If/Else branches - [X] If/Else branches
- [ ] Instruction arguments - [ ] Instruction arguments
- [ ] Runtime type checking for debug compilation modes
- Variables - Variables
- [X] Immutable by default - [X] Immutable by default
- [X] Block scope - [X] Block scope
@ -62,32 +77,23 @@ Dust is still in development. This list may change as the language evolves.
- [ ] `loop` - [ ] `loop`
- [X] `while` - [X] `while`
- [ ] Match - [ ] Match
- Instructions
- [X] Arithmetic
- [X] Boolean
- [X] Call
- [X] Constant
- [X] Control flow
- [X] Load
- [X] Store
- [X] Return
- [X] Stack
- [X] Unar
## Implementation ## Implementation
Dust is implemented in Rust and is divided into several parts, most importantly the lexer, compiler, Dust is implemented in Rust and is divided into several parts, most importantly the lexer, compiler,
and virtual machine. All of Dust's components are designed with performance in mind and the codebase and virtual machine. All of Dust's components are designed with performance in mind and the codebase
uses as few dependencies as possible. uses as few dependencies as possible. The code is tested by integration tests that compile source
code and check the compiled chunk, then run the source and check the output of the virtual machine.
It is important to maintain a high level of quality by writing meaningful tests and preferring to
compile and run programs in an optimal way before adding new features.
### Lexer ### Lexer and Tokens
The lexer emits tokens from the source code. Dust makes extensive use of Rust's zero-copy The lexer emits tokens from the source code. Dust makes extensive use of Rust's zero-copy
capabilities to avoid unnecessary allocations when creating tokens. A token, depending on its type, capabilities to avoid unnecessary allocations when creating tokens. A token, depending on its type,
may contain a reference to some data from the source code. The data is only copied in the case of an may contain a reference to some data from the source code. The data is only copied in the case of an
error, because it improves the usability of the codebase for errors to own their data when possible. error. In a successfully executed program, no part of the source code is copied unless it is a
In a successfully executed program, no part of the source code is copied unless it is a string string literal or identifier.
literal or identifier.
### Compiler ### Compiler
@ -95,8 +101,6 @@ The compiler creates a chunk, which contains all of the data needed by the virtu
Dust program. It does so by emitting bytecode instructions, constants and locals while parsing the Dust program. It does so by emitting bytecode instructions, constants and locals while parsing the
tokens, which are generated one at a time by the lexer. tokens, which are generated one at a time by the lexer.
Types are checked during parsing and each emitted instruction is associated with a type.
#### Parsing #### Parsing
Dust's compiler uses a custom Pratt parser, a kind of recursive descent parser, to translate a Dust's compiler uses a custom Pratt parser, a kind of recursive descent parser, to translate a
@ -119,6 +123,16 @@ optimize the generated code by using fewer instructions or fewer registers. Whil
output optimal code in the first place, it is not always possible. Dust's compiler uses simple output optimal code in the first place, it is not always possible. Dust's compiler uses simple
functions that modify isolated sections of the instruction list through a mutable reference. functions that modify isolated sections of the instruction list through a mutable reference.
#### Type Checking
Dust's compiler associates each emitted instruction with a type. This allows the compiler to enforce
compatibility when values are used in expressions. For example, the compiler will not allow a string
to be added to an integer, but it will allow either to be added to another of the same type. Aside
from instruction arguments, the compiler also checks the types of function arguments and the blocks
of `if`/`else` statements.
The compiler always checks types on the fly, so there is no need for a separate type-checking pass.
### Instructions ### Instructions
Dust's virtual machine is register-based and uses 64-bit instructions, which encode nine pieces of Dust's virtual machine is register-based and uses 64-bit instructions, which encode nine pieces of
@ -127,24 +141,38 @@ information:
Bit | Description Bit | Description
----- | ----------- ----- | -----------
0-8 | The operation code. 0-8 | The operation code.
9 | Boolean flag indicating whether the B argument is a constant. 9 | Boolean flag indicating whether the second argument is a constant
10 | Boolean flag indicating whether the C argument is a constant. 10 | Boolean flag indicating whether the third argument is a constant
11 | Boolean flag indicating whether the A argument is a local. 11 | Boolean flag indicating whether the first argument is a local
12 | Boolean flag indicating whether the B argument is a local. 12 | Boolean flag indicating whether the second argument is a local
13 | Boolean flag indicating whether the C argument is a local. 13 | Boolean flag indicating whether the third argument is a local
17-32 | The A argument, 17-32 | First argument, usually the destination register or local where a value is stored
33-48 | The B argument. 33-48 | Second argument, a register, local, constant or boolean flag
49-63 | The C argument. 49-63 | Third argument, a register, local, constant or boolean flag
Because the instructions are 64 bits, the maximum number of registers is 2^16, which is more than
enough, even for programs that are very large. This also means that chunks can store up to 2^16
constants and locals.
### Virtual Machine ### Virtual Machine
The virtual machine is simple and efficient. It uses a stack of registers, which can hold values or
pointers. Pointers can point to values in the constant list, locals list, or the stack itself. If it
points to a local, the VM must consult its local definitions to find which register hold's the
value. Those local defintions are stored as a simple list of register indexes.
While the compiler has multiple responsibilities that warrant more complexity, the VM is simple
enough to use a very straightforward design. The VM's `run` function uses a simple `while` loop with
a `match` statement to execute instructions. When it reaches a `Return` instruction, it breaks the
loop and optionally returns a value.
## Previous Implementations ## Previous Implementations
Dust has gone through several iterations, each with its own unique features and design choices. It Dust has gone through several iterations, each with its own design choices. It was originally
was originally implemented with a syntax tree generated by an external parser, then a parser implemented with a syntax tree generated by an external parser, then a parser generator, and finally
generator, and finally a custom parser. Eventually the language was rewritten to use bytecode a custom parser. Eventually the language was rewritten to use bytecode instructions and a virtual
instructions and a virtual machine. The current implementation is by far the most performant and the machine. The current implementation is by far the most performant and the general design is unlikely
general design is unlikely to change. to change.
Dust previously had a more complex type system with type arguments (or "generics") and a simple Dust previously had a more complex type system with type arguments (or "generics") and a simple
model for asynchronous execution of statements. Both of these features were removed to simplify the model for asynchronous execution of statements. Both of these features were removed to simplify the
@ -164,7 +192,7 @@ about the instruction's arguments.
[The Implementation of Lua 5.0] by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar [The Implementation of Lua 5.0] by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar
Celes was a great resource for understanding how a compiler and VM tie together. Dust's compiler's Celes was a great resource for understanding how a compiler and VM tie together. Dust's compiler's
optimization functions were inspired by Lua optimizations covered in this paper. optimization functions are inspired by Lua optimizations covered in this paper.
[Crafting Interpreters]: https://craftinginterpreters.com/ [Crafting Interpreters]: https://craftinginterpreters.com/
[The Implementation of Lua 5.0]: https://www.lua.org/doc/jucs05.pdf [The Implementation of Lua 5.0]: https://www.lua.org/doc/jucs05.pdf

View File

@ -1,15 +1,18 @@
//! An operation and its arguments for the Dust virtual machine. //! An operation and its arguments for the Dust virtual machine.
//! //!
//! Each instruction is a 64-bit unsigned integer that is divided into nine fields: //! Each instruction is a 64-bit unsigned integer that is divided into nine fields:
//! - Bits 0-8: The operation code. //!
//! - Bit 9: Boolean flag indicating whether the B argument is a constant. //! Bit | Description
//! - Bit 10: Boolean flag indicating whether the C argument is a constant. //! ----- | -----------
//! - Bit 11: Boolean flag indicating whether the A argument is a local. //! 0-8 | The operation code.
//! - Bit 12: Boolean flag indicating whether the B argument is a local. //! 9 | Boolean flag indicating whether the B argument is a constant.
//! - Bit 13: Boolean flag indicating whether the C argument is a local. //! 10 | Boolean flag indicating whether the C argument is a constant.
//! - Bits 17-32: The A argument, //! 11 | Boolean flag indicating whether the A argument is a local.
//! - Bits 33-48: The B argument. //! 12 | Boolean flag indicating whether the B argument is a local.
//! - Bits 49-63: The C argument. //! 13 | Boolean flag indicating whether the C argument is a local.
//! 17-32 | The A argument,
//! 33-48 | The B argument.
//! 49-63 | The C argument.
//! //!
//! Be careful when working with instructions directly. When modifying an instruction, be sure to //! Be careful when working with instructions directly. When modifying an instruction, be sure to
//! account for the fact that setting the A, B, or C arguments to 0 will have no effect. It is //! account for the fact that setting the A, B, or C arguments to 0 will have no effect. It is