Edit README; Begin 64-bit instruction set
This commit is contained in:
parent
de426d814a
commit
61f4093da0
223
README.md
223
README.md
@ -1,24 +1,9 @@
|
|||||||
# The Dust Programming Language
|
# ✭ Dust Programming Language
|
||||||
|
|
||||||
A **fast**, **safe** and **easy to use** language for general-purpose programming.
|
**Fast**, **safe** and **easy-to-use** general-purpose programming language.
|
||||||
|
|
||||||
Dust is **statically typed** to ensure that each program is valid before it is run. Compiling is
|
|
||||||
fast due to the purpose-built lexer and parser. Execution is fast because Dust uses a custom
|
|
||||||
bytecode that runs in a multi-threaded VM. Dust combines compile-time safety guarantees and
|
|
||||||
optimizations with negligible compile times and satisfying runtime speed to deliver a unique set of
|
|
||||||
features. It offers the best qualities of two disparate categories of programming language: the
|
|
||||||
highly optimized but slow-to-compile languages like Rust and C++ and the quick-to-start but often
|
|
||||||
slow and error-prone languages like Python and JavaScript.
|
|
||||||
|
|
||||||
Dust's syntax, safety features and evaluation model are based on Rust. Its instruction set,
|
|
||||||
optimization strategies and virtual machine are based on Lua. Unlike Rust and other languages that
|
|
||||||
compile to machine code, Dust has a very low time to execution. Unlike Lua and most other
|
|
||||||
interpreted languages, Dust enforces static typing to improve clarity and prevent bugs.
|
|
||||||
|
|
||||||
**Dust is under active development and is not yet ready for general use.**
|
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
// "Hello, world" using Dust's built-in I/O functions
|
// An interactive "Hello, world" using Dust's built-in I/O functions
|
||||||
write_line("Enter your name...")
|
write_line("Enter your name...")
|
||||||
|
|
||||||
let name = read_line()
|
let name = read_line()
|
||||||
@ -38,7 +23,23 @@ fn fib (n: int) -> int {
|
|||||||
write_line(fib(25))
|
write_line(fib(25))
|
||||||
```
|
```
|
||||||
|
|
||||||
## Goals
|
## 🌣 Highlights
|
||||||
|
|
||||||
|
- Easy to read and write
|
||||||
|
- Single-pass, self-optimizing compiler
|
||||||
|
- Static typing with extensive type inference
|
||||||
|
- Multi-threaded register-based virtual machine with concurrent garbage collection
|
||||||
|
- Beautiful, helpful error messages from the compiler
|
||||||
|
- Safe execution, runtime errors are treated as bugs
|
||||||
|
|
||||||
|
## 🛈 Overview
|
||||||
|
|
||||||
|
Dust's syntax, safety features and evaluation model are based on Rust. Its instruction set
|
||||||
|
and optimization strategies are based on Lua. Unlike Rust and other languages that compile to
|
||||||
|
machine code, Dust has a very low time to execution. Unlike Lua and most other interpreted
|
||||||
|
languages, Dust enforces static typing to improve clarity and prevent bugs.
|
||||||
|
|
||||||
|
### Project Goals
|
||||||
|
|
||||||
This project's goal is to deliver a language with features that stand out due to a combination of
|
This project's goal is to deliver a language with features that stand out due to a combination of
|
||||||
design choices and a high-quality implementation. As mentioned in the first sentence, Dust's general
|
design choices and a high-quality implementation. As mentioned in the first sentence, Dust's general
|
||||||
@ -56,10 +57,12 @@ aspirations are to be **fast**, **safe** and **easy**.
|
|||||||
superior development experience despite some additional constraints. Like any good statically
|
superior development experience despite some additional constraints. Like any good statically
|
||||||
typed language, users should feel confident in the type-consistency of their code and not want
|
typed language, users should feel confident in the type-consistency of their code and not want
|
||||||
to go back to a dynamically typed language.
|
to go back to a dynamically typed language.
|
||||||
|
- **Null-Free** Dust has no "null" or "undefined" values. All values are initialized and have a
|
||||||
|
type. This eliminates a whole class of bugs that are common in other languages.
|
||||||
- **Memory Safety** Dust should be free of memory bugs. Being implemented in Rust makes this easy
|
- **Memory Safety** Dust should be free of memory bugs. Being implemented in Rust makes this easy
|
||||||
but, to accommodate long-running programs, Dust still requires a memory management strategy.
|
but, to accommodate long-running programs, Dust still requires a memory management strategy.
|
||||||
Dust's design is to use a separate thread for garbage collection, allowing the main thread to
|
Dust's design is to use a separate thread for garbage collection, allowing other threads to
|
||||||
continue executing code while the garbage collector looks for unused memory.
|
continue executing instructions while the garbage collector looks for unused memory.
|
||||||
- **Easy**
|
- **Easy**
|
||||||
- **Simple Syntax** Dust should be easier to learn than most programming languages. Its syntax
|
- **Simple Syntax** Dust should be easier to learn than most programming languages. Its syntax
|
||||||
should be familiar to users of other C-like languages to the point that even a new user can read
|
should be familiar to users of other C-like languages to the point that even a new user can read
|
||||||
@ -72,178 +75,22 @@ aspirations are to be **fast**, **safe** and **easy**.
|
|||||||
- **Relevant Documentation** Users should have the resources they need to learn Dust and write
|
- **Relevant Documentation** Users should have the resources they need to learn Dust and write
|
||||||
code in it. They should know where to look for answers and how to reach out for help.
|
code in it. They should know where to look for answers and how to reach out for help.
|
||||||
|
|
||||||
## Language Overview
|
### Author
|
||||||
|
|
||||||
This is a quick overview of Dust's syntax features. It skips over the aspects that are familiar to
|
I'm Jeff and I started this project to learn more about programming languages by implementing a
|
||||||
most programmers such as creating variables, using binary operators and printing to the console.
|
simple expession evaluator. Initially, the project used an external parser and a tree-walking
|
||||||
Eventually there should be a complete reference for the syntax.
|
interpreter. After several books, papers and a lot of experimentation, Dust has evolved to an
|
||||||
|
ambitious project that aims to implement lucrative features with a high-quality implementation that
|
||||||
|
competes with established languages.
|
||||||
|
|
||||||
### Syntax and Evaluation
|
## Usage
|
||||||
|
|
||||||
Dust belongs to the C-like family of languages[^5], with an imperative syntax that will be familiar
|
**Dust is under active development and is not yet ready for general use.**
|
||||||
to many programmers. Dust code looks a lot like Ruby, JavaScript, TypeScript and other members of
|
|
||||||
the family but Rust is its primary point of reference for syntax. Rust was chosen as a syntax model
|
|
||||||
because its imperative code is *obvious by design* and *widely familiar*. Those qualities are
|
|
||||||
aligned with Dust's emphasis on usability.
|
|
||||||
|
|
||||||
However, some differences exist. Dust *evaluates* all the code in the file while Rust only initiates
|
## Installation
|
||||||
from a "main" function. Dust's execution model is more like one found in a scripting language. If we
|
|
||||||
put `42 + 42 == 84` into a file and run it, it will return `true` because the outer context is, in a
|
|
||||||
sense, the "main" function.
|
|
||||||
|
|
||||||
So while the syntax is by no means compatible, it is superficially similar, even to the point that
|
Eventually, Dust should be available via package managers and as an embeddable library. For now,
|
||||||
syntax highlighting for Rust code works well with Dust code. This is not a design goal but a happy
|
the only way to use Dust is to clone the repository and build it from source.
|
||||||
coincidence.
|
|
||||||
|
|
||||||
### Statements and Expressions
|
|
||||||
|
|
||||||
Dust is composed of statements and expressions. If a statement ends in an expression without a
|
|
||||||
trailing semicolon, the statement evaluates to the value produced by that expression. However, if
|
|
||||||
the expression's value is suppressed with a semicolon, the statement does not evaluate to a value.
|
|
||||||
This is identical to Rust's evaluation model. That means that the following code will not compile:
|
|
||||||
|
|
||||||
```rust
|
|
||||||
// !!! Compile Error !!!
|
|
||||||
let a = { 40 + 2; }
|
|
||||||
```
|
|
||||||
|
|
||||||
The `a` variable is assigned to the value produced by a block. The block contains an expression that
|
|
||||||
is suppressed by a semicolon, so the block does not evaluate to a value. Therefore, the `a` variable
|
|
||||||
would have to be uninitialized (which Dust does not allow) or result in a runtime error (which Dust
|
|
||||||
avoids at all costs). We can fix this code by moving the semicolon to the end of the block. In this
|
|
||||||
position it suppresses the value of the entire `let` statement. As we saw above, a `let` statement
|
|
||||||
never evaluates to a value, so the semicolon has no effect on the program's behavior and could be
|
|
||||||
omitted altogether.
|
|
||||||
|
|
||||||
```rust
|
|
||||||
let a = { 40 + 2 }; // This is fine
|
|
||||||
let a = { 40 + 2 } // This is also fine
|
|
||||||
```
|
|
||||||
|
|
||||||
Only the final expression in a block is returned. When a `let` statement is combined with an
|
|
||||||
`if/else` statement, the program can perform conditional side effects before assigning the variable.
|
|
||||||
|
|
||||||
```rust
|
|
||||||
let random: int = random(0..100)
|
|
||||||
let is_even = if random == 99 {
|
|
||||||
write_line("We got a 99!")
|
|
||||||
|
|
||||||
false
|
|
||||||
} else {
|
|
||||||
random % 2 == 0
|
|
||||||
}
|
|
||||||
|
|
||||||
is_even
|
|
||||||
```
|
|
||||||
|
|
||||||
If the above example were passed to Dust as a complete program, it would return a boolean value and
|
|
||||||
might print a message to the console (if the user is especially lucky). However, note that the
|
|
||||||
program could be modified to return no value by simply adding a semicolon at the end.
|
|
||||||
|
|
||||||
Compared to JavaScript, Dust's evaluation model is more predictable, less error-prone and will never
|
|
||||||
trap the user into a frustating hunt for a missing semicolon. Compared to Rust, Dust's evaluation
|
|
||||||
model is more accomodating without sacrificing expressiveness. In Rust, semicolons are *required*
|
|
||||||
and *meaningful*, which provides excellent consistency but lacks flexibility. In JavaScript,
|
|
||||||
semicolons are *required* and *meaningless*, which is a source of confusion for many developers.
|
|
||||||
|
|
||||||
Dust borrowed Rust's approach to semicolons and their effect on evaluation and relaxed the rules to
|
|
||||||
accommodate different styles of coding. Rust isn't designed for command lines or REPLs but Dust is
|
|
||||||
well-suited to those applications. Dust needs to work in a source file or in an ad-hoc one-liner
|
|
||||||
sent to the CLI. Thus, semicolons are optional in most cases.
|
|
||||||
|
|
||||||
There are two things you need to know about semicolons in Dust:
|
|
||||||
|
|
||||||
- Semicolons suppress the value of whatever they follow. The preceding statement or expression will
|
|
||||||
have the type `none` and will not evaluate to a value.
|
|
||||||
- If a semicolon does not change how the program runs, it is optional.
|
|
||||||
|
|
||||||
This example shows three statements with semicolons. The compiler knows that a `let` statement
|
|
||||||
cannot produce a value and will always have the type `none`. Thanks to static typing, it also knows
|
|
||||||
that the `write_line` function has no return value so the function call also has the type `none`.
|
|
||||||
Therefore, these semicolons are optional.
|
|
||||||
|
|
||||||
```rust
|
|
||||||
let a = 40;
|
|
||||||
let b = 2;
|
|
||||||
|
|
||||||
write_line("The answer is ", a + b);
|
|
||||||
```
|
|
||||||
|
|
||||||
Removing the semicolons does not alter the execution pattern or the return value.
|
|
||||||
|
|
||||||
```rust
|
|
||||||
let x = 10
|
|
||||||
let y = 3
|
|
||||||
|
|
||||||
write_line("The remainder is ", x % y)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Type System
|
|
||||||
|
|
||||||
All variables have a type that is established when the variable is declared. This usually does not
|
|
||||||
require that the type be explicitly stated, Dust can infer the type from the value.
|
|
||||||
|
|
||||||
The next example produces a compiler error because the `if` block evaluates to and `int` but the
|
|
||||||
`else` block evaluates to a `str`. Dust does not allow branches of the same `if/else` statement to
|
|
||||||
have different types.
|
|
||||||
|
|
||||||
```rust
|
|
||||||
// !!! Compile Error !!!
|
|
||||||
let input = read_line()
|
|
||||||
let reward = if input == "42" {
|
|
||||||
write_line("You got it! Here's your reward.")
|
|
||||||
|
|
||||||
777 // <- This is an int
|
|
||||||
} else {
|
|
||||||
write_line(input, " is not the answer.")
|
|
||||||
|
|
||||||
"777" // <- This is a string
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Basic Values
|
|
||||||
|
|
||||||
Dust supports the following basic values:
|
|
||||||
|
|
||||||
- Boolean: `true` or `false`
|
|
||||||
- Byte: An unsigned 8-bit integer
|
|
||||||
- Character: A Unicode scalar value
|
|
||||||
- Float: A 64-bit floating-point number
|
|
||||||
- Function: An executable chunk of code
|
|
||||||
- Integer: A signed 64-bit integer
|
|
||||||
- String: A UTF-8 encoded byte sequence
|
|
||||||
|
|
||||||
Dust's "basic" values are conceptually similar because they are singular as opposed to composite.
|
|
||||||
Most of these values are stored on the stack but some are heap-allocated. A Dust string is a
|
|
||||||
sequence of bytes that are encoded in UTF-8. Even though it could be seen as a composite of byte
|
|
||||||
values, strings are considered "basic" because they are parsed directly from tokens and behave as
|
|
||||||
singular values. Shorter strings are stored on the stack while longer strings are heap-allocated.
|
|
||||||
Dust offers built-in native functions that can manipulate strings by accessing their bytes or
|
|
||||||
reading them as a sequence of characters.
|
|
||||||
|
|
||||||
There is no `null` or `undefined` value in Dust. All values and variables must be initialized to one
|
|
||||||
of the supported value types. This eliminates a whole class of bugs that permeate many other
|
|
||||||
languages.
|
|
||||||
|
|
||||||
> I call it my billion-dollar mistake. It was the invention of the null reference in 1965.
|
|
||||||
> - Tony Hoare
|
|
||||||
|
|
||||||
Dust *does* have a `none` type, which should not be confused for being `null`-like. Like the `()` or
|
|
||||||
"unit" type in Rust, `none` exists as a type but not as a value. It indicates the lack of a value
|
|
||||||
from a function, expression or statement. A variable cannot be assigned to `none`.
|
|
||||||
|
|
||||||
## Previous Implementations
|
|
||||||
|
|
||||||
Dust has gone through several iterations, each with its own design choices. It was originally
|
|
||||||
implemented with a syntax tree generated by an external parser, then a parser generator, and finally
|
|
||||||
a custom parser. Eventually the language was rewritten to use bytecode instructions and a virtual
|
|
||||||
machine. The current implementation: compiling to bytecode with custom lexing and parsing for a
|
|
||||||
register-based VM, is by far the most performant and the general design is unlikely to change.
|
|
||||||
|
|
||||||
Dust previously had a more complex type system with type arguments (or "generics") and a simple
|
|
||||||
model for asynchronous execution of statements. Both of these features were removed to simplify the
|
|
||||||
language when it was rewritten to use bytecode instructions. Both features are planned to be
|
|
||||||
reintroduced in the future.
|
|
||||||
|
|
||||||
## Inspiration
|
## Inspiration
|
||||||
|
|
||||||
|
@ -1,16 +1,17 @@
|
|||||||
//! Instructions for the Dust virtual machine.
|
//! The Dust instruction set.
|
||||||
//!
|
//!
|
||||||
//! Each instruction is 32 bits and uses up to seven distinct fields:
|
//! Each instruction is 64 bits and uses up to eight distinct fields. The instruction's layout is:
|
||||||
//!
|
//!
|
||||||
//! Bit | Description
|
//! Bits | Description
|
||||||
//! ----- | -----------
|
//! ----- | -----------
|
||||||
//! 0-4 | Operation code
|
//! 0-4 | Operation code
|
||||||
//! 5 | Flag indicating if the B field is a constant
|
//! 5 | Flag indicating if the B field is a constant
|
||||||
//! 6 | Flag indicating if the C field is a constant
|
//! 6 | Flag indicating if the C field is a constant
|
||||||
//! 7 | D field (boolean)
|
//! 7 | D field (boolean)
|
||||||
//! 8-15 | A field (unsigned 8-bit integer)
|
//! 8-15 | Type specifier
|
||||||
//! 16-23 | B field (unsigned 8-bit integer)
|
//! 16-31 | A field (unsigned 16-bit integer)
|
||||||
//! 24-31 | C field (unsigned 8-bit integer)
|
//! 32-47 | B field (unsigned 16-bit integer)
|
||||||
|
//! 48-63 | C field (unsigned 16-bit integer)
|
||||||
//!
|
//!
|
||||||
//! **Be careful when working with instructions directly**. When modifying an instruction's fields,
|
//! **Be careful when working with instructions directly**. When modifying an instruction's fields,
|
||||||
//! you may also need to modify its flags. It is usually best to remove instructions and insert new
|
//! you may also need to modify its flags. It is usually best to remove instructions and insert new
|
||||||
@ -71,9 +72,9 @@
|
|||||||
//! # );
|
//! # );
|
||||||
//! // Let's read an instruction and see if it performs addition-assignment,
|
//! // Let's read an instruction and see if it performs addition-assignment,
|
||||||
//! // like in one of the following examples:
|
//! // like in one of the following examples:
|
||||||
//! // - `a += 2`
|
//! // - `a += 2`
|
||||||
//! // - `a = a + 2`
|
//! // - `a = a + 2`
|
||||||
//! // - `a = 2 + a`
|
//! // - `a = 2 + a`
|
||||||
//!
|
//!
|
||||||
//! let operation = mystery_instruction.operation();
|
//! let operation = mystery_instruction.operation();
|
||||||
//! let is_add_assign = match operation {
|
//! let is_add_assign = match operation {
|
||||||
@ -115,6 +116,7 @@ mod set_local;
|
|||||||
mod subtract;
|
mod subtract;
|
||||||
mod test;
|
mod test;
|
||||||
mod test_set;
|
mod test_set;
|
||||||
|
mod type_code;
|
||||||
|
|
||||||
pub use add::Add;
|
pub use add::Add;
|
||||||
pub use call::Call;
|
pub use call::Call;
|
||||||
@ -152,25 +154,27 @@ use crate::NativeFunction;
|
|||||||
///
|
///
|
||||||
/// See the [module-level documentation](index.html) for more information.
|
/// See the [module-level documentation](index.html) for more information.
|
||||||
#[derive(Clone, Copy, Eq, PartialEq, PartialOrd, Ord, Serialize, Deserialize)]
|
#[derive(Clone, Copy, Eq, PartialEq, PartialOrd, Ord, Serialize, Deserialize)]
|
||||||
pub struct Instruction(u32);
|
pub struct Instruction(u64);
|
||||||
|
|
||||||
impl Instruction {
|
impl Instruction {
|
||||||
pub fn new(
|
pub fn new(
|
||||||
operation: Operation,
|
operation: Operation,
|
||||||
a: u8,
|
type_specifier: u8,
|
||||||
b: u8,
|
a: u16,
|
||||||
c: u8,
|
b: u16,
|
||||||
|
c: u16,
|
||||||
|
d: bool,
|
||||||
b_is_constant: bool,
|
b_is_constant: bool,
|
||||||
c_is_constant: bool,
|
c_is_constant: bool,
|
||||||
d: bool,
|
|
||||||
) -> Instruction {
|
) -> Instruction {
|
||||||
let bits = operation.0 as u32
|
let bits = operation.0 as u64
|
||||||
| ((b_is_constant as u32) << 5)
|
| ((b_is_constant as u64) << 5)
|
||||||
| ((c_is_constant as u32) << 6)
|
| ((c_is_constant as u64) << 6)
|
||||||
| ((d as u32) << 7)
|
| ((d as u64) << 7)
|
||||||
| ((a as u32) << 8)
|
| ((type_specifier as u64) << 15)
|
||||||
| ((b as u32) << 16)
|
| ((a as u64) << 31)
|
||||||
| ((c as u32) << 24);
|
| ((b as u64) << 47)
|
||||||
|
| ((c as u64) << 63);
|
||||||
|
|
||||||
Instruction(bits)
|
Instruction(bits)
|
||||||
}
|
}
|
||||||
@ -206,29 +210,24 @@ impl Instruction {
|
|||||||
}
|
}
|
||||||
|
|
||||||
pub fn set_a_field(&mut self, bits: u8) {
|
pub fn set_a_field(&mut self, bits: u8) {
|
||||||
self.0 = (self.0 & 0xFFFF00FF) | ((bits as u32) << 8);
|
self.0 &= 0xFFFFFFFF00000000 | ((bits as u64) << 31);
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn set_b_field(&mut self, bits: u8) {
|
pub fn set_b_field(&mut self, bits: u8) {
|
||||||
self.0 = (self.0 & 0xFFFF00FF) | ((bits as u32) << 16);
|
self.0 &= 0xFFFF0000FFFFFFFF | ((bits as u64) << 47);
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn set_c_field(&mut self, bits: u8) {
|
pub fn set_c_field(&mut self, bits: u8) {}
|
||||||
self.0 = (self.0 & 0xFF00FFFF) | ((bits as u32) << 24);
|
|
||||||
}
|
|
||||||
|
|
||||||
pub fn decode(self) -> (Operation, InstructionData) {
|
pub fn decode(self) -> (Operation, InstructionData) {
|
||||||
(
|
(self.operation(), InstructionData {
|
||||||
self.operation(),
|
a_field: self.a_field(),
|
||||||
InstructionData {
|
b_field: self.b_field(),
|
||||||
a_field: self.a_field(),
|
c_field: self.c_field(),
|
||||||
b_field: self.b_field(),
|
b_is_constant: self.b_is_constant(),
|
||||||
c_field: self.c_field(),
|
c_is_constant: self.c_is_constant(),
|
||||||
b_is_constant: self.b_is_constant(),
|
d_field: self.d_field(),
|
||||||
c_is_constant: self.c_is_constant(),
|
})
|
||||||
d_field: self.d_field(),
|
|
||||||
},
|
|
||||||
)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn point(from: u8, to: u8) -> Instruction {
|
pub fn point(from: u8, to: u8) -> Instruction {
|
||||||
|
10
dust-lang/src/instruction/type_code.rs
Normal file
10
dust-lang/src/instruction/type_code.rs
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
pub struct TypeCode(pub u8);
|
||||||
|
|
||||||
|
impl TypeCode {
|
||||||
|
const INTEGER: u8 = 0;
|
||||||
|
const FLOAT: u8 = 1;
|
||||||
|
const STRING: u8 = 2;
|
||||||
|
const BOOLEAN: u8 = 3;
|
||||||
|
const CHARACTER: u8 = 4;
|
||||||
|
const BYTE: u8 = 5;
|
||||||
|
}
|
Loading…
x
Reference in New Issue
Block a user