1
0

Edit README; Begin 64-bit instruction set

This commit is contained in:
Jeff 2025-01-10 12:54:33 -05:00
parent de426d814a
commit 61f4093da0
3 changed files with 81 additions and 225 deletions

223
README.md
View File

@ -1,24 +1,9 @@
# The Dust Programming Language
# Dust Programming Language
A **fast**, **safe** and **easy to use** language for general-purpose programming.
Dust is **statically typed** to ensure that each program is valid before it is run. Compiling is
fast due to the purpose-built lexer and parser. Execution is fast because Dust uses a custom
bytecode that runs in a multi-threaded VM. Dust combines compile-time safety guarantees and
optimizations with negligible compile times and satisfying runtime speed to deliver a unique set of
features. It offers the best qualities of two disparate categories of programming language: the
highly optimized but slow-to-compile languages like Rust and C++ and the quick-to-start but often
slow and error-prone languages like Python and JavaScript.
Dust's syntax, safety features and evaluation model are based on Rust. Its instruction set,
optimization strategies and virtual machine are based on Lua. Unlike Rust and other languages that
compile to machine code, Dust has a very low time to execution. Unlike Lua and most other
interpreted languages, Dust enforces static typing to improve clarity and prevent bugs.
**Dust is under active development and is not yet ready for general use.**
**Fast**, **safe** and **easy-to-use** general-purpose programming language.
```rust
// "Hello, world" using Dust's built-in I/O functions
// An interactive "Hello, world" using Dust's built-in I/O functions
write_line("Enter your name...")
let name = read_line()
@ -38,7 +23,23 @@ fn fib (n: int) -> int {
write_line(fib(25))
```
## Goals
## 🌣 Highlights
- Easy to read and write
- Single-pass, self-optimizing compiler
- Static typing with extensive type inference
- Multi-threaded register-based virtual machine with concurrent garbage collection
- Beautiful, helpful error messages from the compiler
- Safe execution, runtime errors are treated as bugs
## 🛈 Overview
Dust's syntax, safety features and evaluation model are based on Rust. Its instruction set
and optimization strategies are based on Lua. Unlike Rust and other languages that compile to
machine code, Dust has a very low time to execution. Unlike Lua and most other interpreted
languages, Dust enforces static typing to improve clarity and prevent bugs.
### Project Goals
This project's goal is to deliver a language with features that stand out due to a combination of
design choices and a high-quality implementation. As mentioned in the first sentence, Dust's general
@ -56,10 +57,12 @@ aspirations are to be **fast**, **safe** and **easy**.
superior development experience despite some additional constraints. Like any good statically
typed language, users should feel confident in the type-consistency of their code and not want
to go back to a dynamically typed language.
- **Null-Free** Dust has no "null" or "undefined" values. All values are initialized and have a
type. This eliminates a whole class of bugs that are common in other languages.
- **Memory Safety** Dust should be free of memory bugs. Being implemented in Rust makes this easy
but, to accommodate long-running programs, Dust still requires a memory management strategy.
Dust's design is to use a separate thread for garbage collection, allowing the main thread to
continue executing code while the garbage collector looks for unused memory.
Dust's design is to use a separate thread for garbage collection, allowing other threads to
continue executing instructions while the garbage collector looks for unused memory.
- **Easy**
- **Simple Syntax** Dust should be easier to learn than most programming languages. Its syntax
should be familiar to users of other C-like languages to the point that even a new user can read
@ -72,178 +75,22 @@ aspirations are to be **fast**, **safe** and **easy**.
- **Relevant Documentation** Users should have the resources they need to learn Dust and write
code in it. They should know where to look for answers and how to reach out for help.
## Language Overview
### Author
This is a quick overview of Dust's syntax features. It skips over the aspects that are familiar to
most programmers such as creating variables, using binary operators and printing to the console.
Eventually there should be a complete reference for the syntax.
I'm Jeff and I started this project to learn more about programming languages by implementing a
simple expession evaluator. Initially, the project used an external parser and a tree-walking
interpreter. After several books, papers and a lot of experimentation, Dust has evolved to an
ambitious project that aims to implement lucrative features with a high-quality implementation that
competes with established languages.
### Syntax and Evaluation
## Usage
Dust belongs to the C-like family of languages[^5], with an imperative syntax that will be familiar
to many programmers. Dust code looks a lot like Ruby, JavaScript, TypeScript and other members of
the family but Rust is its primary point of reference for syntax. Rust was chosen as a syntax model
because its imperative code is *obvious by design* and *widely familiar*. Those qualities are
aligned with Dust's emphasis on usability.
**Dust is under active development and is not yet ready for general use.**
However, some differences exist. Dust *evaluates* all the code in the file while Rust only initiates
from a "main" function. Dust's execution model is more like one found in a scripting language. If we
put `42 + 42 == 84` into a file and run it, it will return `true` because the outer context is, in a
sense, the "main" function.
## Installation
So while the syntax is by no means compatible, it is superficially similar, even to the point that
syntax highlighting for Rust code works well with Dust code. This is not a design goal but a happy
coincidence.
### Statements and Expressions
Dust is composed of statements and expressions. If a statement ends in an expression without a
trailing semicolon, the statement evaluates to the value produced by that expression. However, if
the expression's value is suppressed with a semicolon, the statement does not evaluate to a value.
This is identical to Rust's evaluation model. That means that the following code will not compile:
```rust
// !!! Compile Error !!!
let a = { 40 + 2; }
```
The `a` variable is assigned to the value produced by a block. The block contains an expression that
is suppressed by a semicolon, so the block does not evaluate to a value. Therefore, the `a` variable
would have to be uninitialized (which Dust does not allow) or result in a runtime error (which Dust
avoids at all costs). We can fix this code by moving the semicolon to the end of the block. In this
position it suppresses the value of the entire `let` statement. As we saw above, a `let` statement
never evaluates to a value, so the semicolon has no effect on the program's behavior and could be
omitted altogether.
```rust
let a = { 40 + 2 }; // This is fine
let a = { 40 + 2 } // This is also fine
```
Only the final expression in a block is returned. When a `let` statement is combined with an
`if/else` statement, the program can perform conditional side effects before assigning the variable.
```rust
let random: int = random(0..100)
let is_even = if random == 99 {
write_line("We got a 99!")
false
} else {
random % 2 == 0
}
is_even
```
If the above example were passed to Dust as a complete program, it would return a boolean value and
might print a message to the console (if the user is especially lucky). However, note that the
program could be modified to return no value by simply adding a semicolon at the end.
Compared to JavaScript, Dust's evaluation model is more predictable, less error-prone and will never
trap the user into a frustating hunt for a missing semicolon. Compared to Rust, Dust's evaluation
model is more accomodating without sacrificing expressiveness. In Rust, semicolons are *required*
and *meaningful*, which provides excellent consistency but lacks flexibility. In JavaScript,
semicolons are *required* and *meaningless*, which is a source of confusion for many developers.
Dust borrowed Rust's approach to semicolons and their effect on evaluation and relaxed the rules to
accommodate different styles of coding. Rust isn't designed for command lines or REPLs but Dust is
well-suited to those applications. Dust needs to work in a source file or in an ad-hoc one-liner
sent to the CLI. Thus, semicolons are optional in most cases.
There are two things you need to know about semicolons in Dust:
- Semicolons suppress the value of whatever they follow. The preceding statement or expression will
have the type `none` and will not evaluate to a value.
- If a semicolon does not change how the program runs, it is optional.
This example shows three statements with semicolons. The compiler knows that a `let` statement
cannot produce a value and will always have the type `none`. Thanks to static typing, it also knows
that the `write_line` function has no return value so the function call also has the type `none`.
Therefore, these semicolons are optional.
```rust
let a = 40;
let b = 2;
write_line("The answer is ", a + b);
```
Removing the semicolons does not alter the execution pattern or the return value.
```rust
let x = 10
let y = 3
write_line("The remainder is ", x % y)
```
### Type System
All variables have a type that is established when the variable is declared. This usually does not
require that the type be explicitly stated, Dust can infer the type from the value.
The next example produces a compiler error because the `if` block evaluates to and `int` but the
`else` block evaluates to a `str`. Dust does not allow branches of the same `if/else` statement to
have different types.
```rust
// !!! Compile Error !!!
let input = read_line()
let reward = if input == "42" {
write_line("You got it! Here's your reward.")
777 // <- This is an int
} else {
write_line(input, " is not the answer.")
"777" // <- This is a string
}
```
### Basic Values
Dust supports the following basic values:
- Boolean: `true` or `false`
- Byte: An unsigned 8-bit integer
- Character: A Unicode scalar value
- Float: A 64-bit floating-point number
- Function: An executable chunk of code
- Integer: A signed 64-bit integer
- String: A UTF-8 encoded byte sequence
Dust's "basic" values are conceptually similar because they are singular as opposed to composite.
Most of these values are stored on the stack but some are heap-allocated. A Dust string is a
sequence of bytes that are encoded in UTF-8. Even though it could be seen as a composite of byte
values, strings are considered "basic" because they are parsed directly from tokens and behave as
singular values. Shorter strings are stored on the stack while longer strings are heap-allocated.
Dust offers built-in native functions that can manipulate strings by accessing their bytes or
reading them as a sequence of characters.
There is no `null` or `undefined` value in Dust. All values and variables must be initialized to one
of the supported value types. This eliminates a whole class of bugs that permeate many other
languages.
> I call it my billion-dollar mistake. It was the invention of the null reference in 1965.
> - Tony Hoare
Dust *does* have a `none` type, which should not be confused for being `null`-like. Like the `()` or
"unit" type in Rust, `none` exists as a type but not as a value. It indicates the lack of a value
from a function, expression or statement. A variable cannot be assigned to `none`.
## Previous Implementations
Dust has gone through several iterations, each with its own design choices. It was originally
implemented with a syntax tree generated by an external parser, then a parser generator, and finally
a custom parser. Eventually the language was rewritten to use bytecode instructions and a virtual
machine. The current implementation: compiling to bytecode with custom lexing and parsing for a
register-based VM, is by far the most performant and the general design is unlikely to change.
Dust previously had a more complex type system with type arguments (or "generics") and a simple
model for asynchronous execution of statements. Both of these features were removed to simplify the
language when it was rewritten to use bytecode instructions. Both features are planned to be
reintroduced in the future.
Eventually, Dust should be available via package managers and as an embeddable library. For now,
the only way to use Dust is to clone the repository and build it from source.
## Inspiration

View File

@ -1,16 +1,17 @@
//! Instructions for the Dust virtual machine.
//! The Dust instruction set.
//!
//! Each instruction is 32 bits and uses up to seven distinct fields:
//! Each instruction is 64 bits and uses up to eight distinct fields. The instruction's layout is:
//!
//! Bit | Description
//! Bits | Description
//! ----- | -----------
//! 0-4 | Operation code
//! 5 | Flag indicating if the B field is a constant
//! 6 | Flag indicating if the C field is a constant
//! 7 | D field (boolean)
//! 8-15 | A field (unsigned 8-bit integer)
//! 16-23 | B field (unsigned 8-bit integer)
//! 24-31 | C field (unsigned 8-bit integer)
//! 8-15 | Type specifier
//! 16-31 | A field (unsigned 16-bit integer)
//! 32-47 | B field (unsigned 16-bit integer)
//! 48-63 | C field (unsigned 16-bit integer)
//!
//! **Be careful when working with instructions directly**. When modifying an instruction's fields,
//! you may also need to modify its flags. It is usually best to remove instructions and insert new
@ -71,9 +72,9 @@
//! # );
//! // Let's read an instruction and see if it performs addition-assignment,
//! // like in one of the following examples:
//! // - `a += 2`
//! // - `a = a + 2`
//! // - `a = 2 + a`
//! // - `a += 2`
//! // - `a = a + 2`
//! // - `a = 2 + a`
//!
//! let operation = mystery_instruction.operation();
//! let is_add_assign = match operation {
@ -115,6 +116,7 @@ mod set_local;
mod subtract;
mod test;
mod test_set;
mod type_code;
pub use add::Add;
pub use call::Call;
@ -152,25 +154,27 @@ use crate::NativeFunction;
///
/// See the [module-level documentation](index.html) for more information.
#[derive(Clone, Copy, Eq, PartialEq, PartialOrd, Ord, Serialize, Deserialize)]
pub struct Instruction(u32);
pub struct Instruction(u64);
impl Instruction {
pub fn new(
operation: Operation,
a: u8,
b: u8,
c: u8,
type_specifier: u8,
a: u16,
b: u16,
c: u16,
d: bool,
b_is_constant: bool,
c_is_constant: bool,
d: bool,
) -> Instruction {
let bits = operation.0 as u32
| ((b_is_constant as u32) << 5)
| ((c_is_constant as u32) << 6)
| ((d as u32) << 7)
| ((a as u32) << 8)
| ((b as u32) << 16)
| ((c as u32) << 24);
let bits = operation.0 as u64
| ((b_is_constant as u64) << 5)
| ((c_is_constant as u64) << 6)
| ((d as u64) << 7)
| ((type_specifier as u64) << 15)
| ((a as u64) << 31)
| ((b as u64) << 47)
| ((c as u64) << 63);
Instruction(bits)
}
@ -206,29 +210,24 @@ impl Instruction {
}
pub fn set_a_field(&mut self, bits: u8) {
self.0 = (self.0 & 0xFFFF00FF) | ((bits as u32) << 8);
self.0 &= 0xFFFFFFFF00000000 | ((bits as u64) << 31);
}
pub fn set_b_field(&mut self, bits: u8) {
self.0 = (self.0 & 0xFFFF00FF) | ((bits as u32) << 16);
self.0 &= 0xFFFF0000FFFFFFFF | ((bits as u64) << 47);
}
pub fn set_c_field(&mut self, bits: u8) {
self.0 = (self.0 & 0xFF00FFFF) | ((bits as u32) << 24);
}
pub fn set_c_field(&mut self, bits: u8) {}
pub fn decode(self) -> (Operation, InstructionData) {
(
self.operation(),
InstructionData {
a_field: self.a_field(),
b_field: self.b_field(),
c_field: self.c_field(),
b_is_constant: self.b_is_constant(),
c_is_constant: self.c_is_constant(),
d_field: self.d_field(),
},
)
(self.operation(), InstructionData {
a_field: self.a_field(),
b_field: self.b_field(),
c_field: self.c_field(),
b_is_constant: self.b_is_constant(),
c_is_constant: self.c_is_constant(),
d_field: self.d_field(),
})
}
pub fn point(from: u8, to: u8) -> Instruction {

View File

@ -0,0 +1,10 @@
pub struct TypeCode(pub u8);
impl TypeCode {
const INTEGER: u8 = 0;
const FLOAT: u8 = 1;
const STRING: u8 = 2;
const BOOLEAN: u8 = 3;
const CHARACTER: u8 = 4;
const BYTE: u8 = 5;
}