Add more README.md content
This commit is contained in:
parent
3b23136eda
commit
98e9a5ed21
116
README.md
116
README.md
@ -2,9 +2,18 @@
|
|||||||
|
|
||||||
Dust is a high-level interpreted programming language with static types that focuses on ease of use,
|
Dust is a high-level interpreted programming language with static types that focuses on ease of use,
|
||||||
performance and correctness. The syntax, safety features and evaluation model are inspired by Rust.
|
performance and correctness. The syntax, safety features and evaluation model are inspired by Rust.
|
||||||
Due to being interpreted, Dust's total time to execution is much lower than Rust's. Unlike other
|
The instruction set, optimization strategies and virtual machine are inspired by Lua. Unlike Rust
|
||||||
interpreted languages, Dust is type-safe, with a simple yet powerful type system that enhances the
|
and other compiled languages, Dust has a very low time to execution. Simple programs compile in
|
||||||
clarity and correctness of a program.
|
under a millisecond on a modern processor. Unlike Lua and most other interpreted languages, Dust is
|
||||||
|
type-safe, with a simple yet powerful type system that enhances clarity and prevent bugs.
|
||||||
|
|
||||||
|
```dust
|
||||||
|
write_line("Enter your name...")
|
||||||
|
|
||||||
|
let name = read_line()
|
||||||
|
|
||||||
|
write_line("Hello " + name + "!")
|
||||||
|
```
|
||||||
|
|
||||||
## Feature Progress
|
## Feature Progress
|
||||||
|
|
||||||
@ -14,6 +23,7 @@ Dust is still in development. This list may change as the language evolves.
|
|||||||
- [X] Compiler
|
- [X] Compiler
|
||||||
- [X] VM
|
- [X] VM
|
||||||
- [ ] Formatter
|
- [ ] Formatter
|
||||||
|
- [X] Disassembler (for chunk debugging)
|
||||||
- CLI
|
- CLI
|
||||||
- [X] Run source
|
- [X] Run source
|
||||||
- [X] Compile to chunk and show disassembly
|
- [X] Compile to chunk and show disassembly
|
||||||
@ -23,19 +33,25 @@ Dust is still in development. This list may change as the language evolves.
|
|||||||
- [ ] JSON
|
- [ ] JSON
|
||||||
- [ ] Postcard
|
- [ ] Postcard
|
||||||
- Values
|
- Values
|
||||||
- [X] Basic values: booleans, bytes, characters, integers, floats, UTF-8 strings
|
- [X] No `null` or `undefined` values
|
||||||
- [X] No `null` or `undefined`
|
- [X] Booleans
|
||||||
- [ ] Enums
|
- [X] Bytes (unsigned 8-bit)
|
||||||
|
- [X] Characters (Unicode scalar value)
|
||||||
|
- [X] Floats (64-bit)
|
||||||
- [X] Functions
|
- [X] Functions
|
||||||
- [X] Lists
|
- [X] Integer (signed 64-bit)
|
||||||
- [ ] Maps
|
|
||||||
- [ ] Ranges
|
- [ ] Ranges
|
||||||
|
- [X] Strings (UTF-8)
|
||||||
|
- [X] Concrete lists
|
||||||
|
- [X] Abstract lists (optimization)
|
||||||
|
- [ ] Concrete maps
|
||||||
|
- [ ] Abstract maps (optimization)
|
||||||
|
- [ ] Tuples (fixed-size constant lists)
|
||||||
- [ ] Structs
|
- [ ] Structs
|
||||||
- [ ] Tuples
|
- [ ] Enums
|
||||||
- [ ] Runtime-efficient abstract values for lists and maps
|
|
||||||
- Types
|
- Types
|
||||||
- [X] Basic types for each kind of value
|
- [X] Basic types for each kind of value
|
||||||
- [X] Generalized types: `num`, `any`
|
- [X] Generalized types: `num`, `any`, `none`
|
||||||
- [ ] `struct` types
|
- [ ] `struct` types
|
||||||
- [ ] `enum` types
|
- [ ] `enum` types
|
||||||
- [ ] Type aliases
|
- [ ] Type aliases
|
||||||
@ -44,7 +60,6 @@ Dust is still in development. This list may change as the language evolves.
|
|||||||
- [ ] Function returns
|
- [ ] Function returns
|
||||||
- [X] If/Else branches
|
- [X] If/Else branches
|
||||||
- [ ] Instruction arguments
|
- [ ] Instruction arguments
|
||||||
- [ ] Runtime type checking for debug compilation modes
|
|
||||||
- Variables
|
- Variables
|
||||||
- [X] Immutable by default
|
- [X] Immutable by default
|
||||||
- [X] Block scope
|
- [X] Block scope
|
||||||
@ -62,32 +77,23 @@ Dust is still in development. This list may change as the language evolves.
|
|||||||
- [ ] `loop`
|
- [ ] `loop`
|
||||||
- [X] `while`
|
- [X] `while`
|
||||||
- [ ] Match
|
- [ ] Match
|
||||||
- Instructions
|
|
||||||
- [X] Arithmetic
|
|
||||||
- [X] Boolean
|
|
||||||
- [X] Call
|
|
||||||
- [X] Constant
|
|
||||||
- [X] Control flow
|
|
||||||
- [X] Load
|
|
||||||
- [X] Store
|
|
||||||
- [X] Return
|
|
||||||
- [X] Stack
|
|
||||||
- [X] Unar
|
|
||||||
|
|
||||||
## Implementation
|
## Implementation
|
||||||
|
|
||||||
Dust is implemented in Rust and is divided into several parts, most importantly the lexer, compiler,
|
Dust is implemented in Rust and is divided into several parts, most importantly the lexer, compiler,
|
||||||
and virtual machine. All of Dust's components are designed with performance in mind and the codebase
|
and virtual machine. All of Dust's components are designed with performance in mind and the codebase
|
||||||
uses as few dependencies as possible.
|
uses as few dependencies as possible. The code is tested by integration tests that compile source
|
||||||
|
code and check the compiled chunk, then run the source and check the output of the virtual machine.
|
||||||
|
It is important to maintain a high level of quality by writing meaningful tests and preferring to
|
||||||
|
compile and run programs in an optimal way before adding new features.
|
||||||
|
|
||||||
### Lexer
|
### Lexer and Tokens
|
||||||
|
|
||||||
The lexer emits tokens from the source code. Dust makes extensive use of Rust's zero-copy
|
The lexer emits tokens from the source code. Dust makes extensive use of Rust's zero-copy
|
||||||
capabilities to avoid unnecessary allocations when creating tokens. A token, depending on its type,
|
capabilities to avoid unnecessary allocations when creating tokens. A token, depending on its type,
|
||||||
may contain a reference to some data from the source code. The data is only copied in the case of an
|
may contain a reference to some data from the source code. The data is only copied in the case of an
|
||||||
error, because it improves the usability of the codebase for errors to own their data when possible.
|
error. In a successfully executed program, no part of the source code is copied unless it is a
|
||||||
In a successfully executed program, no part of the source code is copied unless it is a string
|
string literal or identifier.
|
||||||
literal or identifier.
|
|
||||||
|
|
||||||
### Compiler
|
### Compiler
|
||||||
|
|
||||||
@ -95,8 +101,6 @@ The compiler creates a chunk, which contains all of the data needed by the virtu
|
|||||||
Dust program. It does so by emitting bytecode instructions, constants and locals while parsing the
|
Dust program. It does so by emitting bytecode instructions, constants and locals while parsing the
|
||||||
tokens, which are generated one at a time by the lexer.
|
tokens, which are generated one at a time by the lexer.
|
||||||
|
|
||||||
Types are checked during parsing and each emitted instruction is associated with a type.
|
|
||||||
|
|
||||||
#### Parsing
|
#### Parsing
|
||||||
|
|
||||||
Dust's compiler uses a custom Pratt parser, a kind of recursive descent parser, to translate a
|
Dust's compiler uses a custom Pratt parser, a kind of recursive descent parser, to translate a
|
||||||
@ -119,6 +123,16 @@ optimize the generated code by using fewer instructions or fewer registers. Whil
|
|||||||
output optimal code in the first place, it is not always possible. Dust's compiler uses simple
|
output optimal code in the first place, it is not always possible. Dust's compiler uses simple
|
||||||
functions that modify isolated sections of the instruction list through a mutable reference.
|
functions that modify isolated sections of the instruction list through a mutable reference.
|
||||||
|
|
||||||
|
#### Type Checking
|
||||||
|
|
||||||
|
Dust's compiler associates each emitted instruction with a type. This allows the compiler to enforce
|
||||||
|
compatibility when values are used in expressions. For example, the compiler will not allow a string
|
||||||
|
to be added to an integer, but it will allow either to be added to another of the same type. Aside
|
||||||
|
from instruction arguments, the compiler also checks the types of function arguments and the blocks
|
||||||
|
of `if`/`else` statements.
|
||||||
|
|
||||||
|
The compiler always checks types on the fly, so there is no need for a separate type-checking pass.
|
||||||
|
|
||||||
### Instructions
|
### Instructions
|
||||||
|
|
||||||
Dust's virtual machine is register-based and uses 64-bit instructions, which encode nine pieces of
|
Dust's virtual machine is register-based and uses 64-bit instructions, which encode nine pieces of
|
||||||
@ -127,24 +141,38 @@ information:
|
|||||||
Bit | Description
|
Bit | Description
|
||||||
----- | -----------
|
----- | -----------
|
||||||
0-8 | The operation code.
|
0-8 | The operation code.
|
||||||
9 | Boolean flag indicating whether the B argument is a constant.
|
9 | Boolean flag indicating whether the second argument is a constant
|
||||||
10 | Boolean flag indicating whether the C argument is a constant.
|
10 | Boolean flag indicating whether the third argument is a constant
|
||||||
11 | Boolean flag indicating whether the A argument is a local.
|
11 | Boolean flag indicating whether the first argument is a local
|
||||||
12 | Boolean flag indicating whether the B argument is a local.
|
12 | Boolean flag indicating whether the second argument is a local
|
||||||
13 | Boolean flag indicating whether the C argument is a local.
|
13 | Boolean flag indicating whether the third argument is a local
|
||||||
17-32 | The A argument,
|
17-32 | First argument, usually the destination register or local where a value is stored
|
||||||
33-48 | The B argument.
|
33-48 | Second argument, a register, local, constant or boolean flag
|
||||||
49-63 | The C argument.
|
49-63 | Third argument, a register, local, constant or boolean flag
|
||||||
|
|
||||||
|
Because the instructions are 64 bits, the maximum number of registers is 2^16, which is more than
|
||||||
|
enough, even for programs that are very large. This also means that chunks can store up to 2^16
|
||||||
|
constants and locals.
|
||||||
|
|
||||||
### Virtual Machine
|
### Virtual Machine
|
||||||
|
|
||||||
|
The virtual machine is simple and efficient. It uses a stack of registers, which can hold values or
|
||||||
|
pointers. Pointers can point to values in the constant list, locals list, or the stack itself. If it
|
||||||
|
points to a local, the VM must consult its local definitions to find which register hold's the
|
||||||
|
value. Those local defintions are stored as a simple list of register indexes.
|
||||||
|
|
||||||
|
While the compiler has multiple responsibilities that warrant more complexity, the VM is simple
|
||||||
|
enough to use a very straightforward design. The VM's `run` function uses a simple `while` loop with
|
||||||
|
a `match` statement to execute instructions. When it reaches a `Return` instruction, it breaks the
|
||||||
|
loop and optionally returns a value.
|
||||||
|
|
||||||
## Previous Implementations
|
## Previous Implementations
|
||||||
|
|
||||||
Dust has gone through several iterations, each with its own unique features and design choices. It
|
Dust has gone through several iterations, each with its own design choices. It was originally
|
||||||
was originally implemented with a syntax tree generated by an external parser, then a parser
|
implemented with a syntax tree generated by an external parser, then a parser generator, and finally
|
||||||
generator, and finally a custom parser. Eventually the language was rewritten to use bytecode
|
a custom parser. Eventually the language was rewritten to use bytecode instructions and a virtual
|
||||||
instructions and a virtual machine. The current implementation is by far the most performant and the
|
machine. The current implementation is by far the most performant and the general design is unlikely
|
||||||
general design is unlikely to change.
|
to change.
|
||||||
|
|
||||||
Dust previously had a more complex type system with type arguments (or "generics") and a simple
|
Dust previously had a more complex type system with type arguments (or "generics") and a simple
|
||||||
model for asynchronous execution of statements. Both of these features were removed to simplify the
|
model for asynchronous execution of statements. Both of these features were removed to simplify the
|
||||||
@ -164,7 +192,7 @@ about the instruction's arguments.
|
|||||||
|
|
||||||
[The Implementation of Lua 5.0] by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar
|
[The Implementation of Lua 5.0] by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar
|
||||||
Celes was a great resource for understanding how a compiler and VM tie together. Dust's compiler's
|
Celes was a great resource for understanding how a compiler and VM tie together. Dust's compiler's
|
||||||
optimization functions were inspired by Lua optimizations covered in this paper.
|
optimization functions are inspired by Lua optimizations covered in this paper.
|
||||||
|
|
||||||
[Crafting Interpreters]: https://craftinginterpreters.com/
|
[Crafting Interpreters]: https://craftinginterpreters.com/
|
||||||
[The Implementation of Lua 5.0]: https://www.lua.org/doc/jucs05.pdf
|
[The Implementation of Lua 5.0]: https://www.lua.org/doc/jucs05.pdf
|
||||||
|
@ -1,15 +1,18 @@
|
|||||||
//! An operation and its arguments for the Dust virtual machine.
|
//! An operation and its arguments for the Dust virtual machine.
|
||||||
//!
|
//!
|
||||||
//! Each instruction is a 64-bit unsigned integer that is divided into nine fields:
|
//! Each instruction is a 64-bit unsigned integer that is divided into nine fields:
|
||||||
//! - Bits 0-8: The operation code.
|
//!
|
||||||
//! - Bit 9: Boolean flag indicating whether the B argument is a constant.
|
//! Bit | Description
|
||||||
//! - Bit 10: Boolean flag indicating whether the C argument is a constant.
|
//! ----- | -----------
|
||||||
//! - Bit 11: Boolean flag indicating whether the A argument is a local.
|
//! 0-8 | The operation code.
|
||||||
//! - Bit 12: Boolean flag indicating whether the B argument is a local.
|
//! 9 | Boolean flag indicating whether the B argument is a constant.
|
||||||
//! - Bit 13: Boolean flag indicating whether the C argument is a local.
|
//! 10 | Boolean flag indicating whether the C argument is a constant.
|
||||||
//! - Bits 17-32: The A argument,
|
//! 11 | Boolean flag indicating whether the A argument is a local.
|
||||||
//! - Bits 33-48: The B argument.
|
//! 12 | Boolean flag indicating whether the B argument is a local.
|
||||||
//! - Bits 49-63: The C argument.
|
//! 13 | Boolean flag indicating whether the C argument is a local.
|
||||||
|
//! 17-32 | The A argument,
|
||||||
|
//! 33-48 | The B argument.
|
||||||
|
//! 49-63 | The C argument.
|
||||||
//!
|
//!
|
||||||
//! Be careful when working with instructions directly. When modifying an instruction, be sure to
|
//! Be careful when working with instructions directly. When modifying an instruction, be sure to
|
||||||
//! account for the fact that setting the A, B, or C arguments to 0 will have no effect. It is
|
//! account for the fact that setting the A, B, or C arguments to 0 will have no effect. It is
|
||||||
|
Loading…
Reference in New Issue
Block a user