# The Dust Programming Language A **fast**, **safe** and **easy to use** language for general-purpose programming. Dust is **statically typed** to ensure that each program is valid before it is run. Compiling is fast due to the purpose-built lexer and parser. Execution is fast because Dust uses instructions in a highly optimizable, custom bytecode format that runs in a multi-threaded VM. Dust combines compile-time safety guarantees and optimizations with negligible compile times and satisfying speed to deliver a unique set of features. It offers the best qualities of two disparate categories of programming language: the highly optimized but slow-to-compile languages like Rust and C++ and the quick-to-start but often slow and error-prone languages like Python and JavaScript. Dust's syntax, safety features and evaluation model are based on Rust. Its instruction set, optimization strategies and virtual machine are based on Lua. Unlike Rust and other languages that compile to machine code, Dust has a very low time to execution. Unlike Lua and most other interpreted languages, Dust enforces static typing to improve clarity and prevent bugs. Dust is developed with an emphasis on achieving foundational soundness before adding new features. Dust's planned features and design favor programmers who prefer their code to be simple and clear rather than clever and complex. **Dust is under active development and is not yet ready for general use.** **Features discussed in this README may be unimplemented, partially implemented or temporarily removed.** ```rust // "Hello, world" using Dust's built-in I/O functions write_line("Enter your name...") let name = read_line() write_line("Hello " + name + "!") ``` ```rust // The classic, unoptimized Fibonacci sequence fn fib (n: int) -> int { if n <= 0 { return 0 } else if n == 1 { return 1 } fib(n - 1) + fib(n - 2) } write_line(fib(25)) ``` ## Goals This project's goal is to deliver a language that not only *works* but that offers genunine value due to a unique combination of design choices and a high-quality implementation. As mentioned in the first sentence, Dust's general aspirations are to be **fast**, **safe** and **easy**. - **Easy** - **Simple Syntax** Dust should be easier to learn than most programming languages. Its syntax should be familiar to users of other C-like languages to the point that even a new user can read Dust code and understand what it does. Rather than being dumbed down by a lack of features, Dust should be powerful and elegant in its simplicity, seeking a maximum of capability with a minimum of complexity. When advanced features are added, they should never obstruct existing features, including readability. Even the advanced type system should be clear and unintimidating. - **Excellent Errors** Dust should provide helpful error messages that guide the user to the source of the problem and suggest a solution. Errors should be a helpful learning ressource for users rather than a source of frustration. - **Relevant Documentation** Users should have the resources they need to learn Dust and write code in it. They should know where to look for answers and how to reach out for help. - **Safe** - **Static Types** Typing should prevent runtime errors and improve code quality, offering a superior development experience despite some additional constraints. Like any good statically typed language, users should feel confident in the type-consistency of their code and not want to go back to a dynamically typed language. - **Memory Safety** Dust should be free of memory bugs. Being implemented in Rust makes this easy but, to accomodate long-running programs, Dust still requires a memory management strategy. Dust's design is to use a separate thread for garbage collection, allowing the main thread to continue executing code while the garbage collector looks for unused memory. - **Fast** - **Fast Compilation** Despite its compile-time abstractions, Dust should compile and start executing quickly. The compilation time should feel negligible to the user. - **Fast Execution** Dust should be generally faster than Python, Ruby and NodeJS. It should be competitive with highly optimized, modern, register-based VM languages like Lua. Dust should be benchmarked during development to inform decisions about performance. - **Low Resource Usage** Despite its performance, Dust's use of memory and CPU power should be conservative and predictable enough to accomodate a wide range of devices. ## Language Overview This is a quick overview of Dust's syntax features. It skips over the aspects that are familiar to most programmers such as creating variables, using binary operators and printing to the console. Eventually there should be a complete reference for the syntax. ### Syntax and Evaluation Dust belongs to the C-like family of languages[^5], with an imperative syntax that will be familiar to many programmers. Dust code looks a lot like Ruby, JavaScript, TypeScript and other members of the family but Rust is its primary point of reference for syntax. Rust was chosen as a syntax model because its imperative code is *obvious by design* and *widely familiar*. Those qualities are aligned with Dust's emphasis on usability. However, some differences exist. Dust *evaluates* all of the code in the file while Rust only initiates from a "main" function. Dust's execution model is more like one found in a scripting language. If we put `42 + 42 == 84` into a file and run it, it will return `true` because the outer context is, in a sense, the "main" function. So while the syntax is by no means compatible, it is superficially similar, even to the point that syntax highlighting for Rust code works well with Dust code. This is not a design goal but a happy coincidence. ### Statements and Expressions Dust is composed of statements and expressions. If a statement ends in an expression without a trailing semicolon, the statement evaluates to the value produced by that expression. However, if the expression's value is suppressed with a semicolon, the statement does not evaluate to a value. This is identical to Rust's evaluation model. That means that the following code will not compile: ```rust // !!! Compile Error !!! let a = { 40 + 2; } ``` The `a` variable is assigned to the value produced by a block. The block contains an expression that is suppressed by a semicolon, so the block does not evaluate to a value. Therefore, the `a` variable would have to be uninitialized (which Dust does not allow) or result in a runtime error (which Dust avoids at all costs). We can fix this code by moving the semicolon to the end of the block. In this position it suppresses the value of the entire `let` statement. As we saw above, a `let` statement never evaluates to a value, so the semicolon has no effect on the program's behavior and could be omitted altogether. ```rust let a = { 40 + 2 }; // This is fine let a = { 40 + 2 } // This is also fine ``` Only the final expression in a block is returned. When a `let` statement is combined with an `if/else` statement, the program can perform side effects before evaluating the value that will be assigned to the variable. ```rust let random: int = random(0..100) let is_even = if random == 99 { write_line("We got a 99!") false } else { random % 2 == 0 } is_even ``` If the above example were passed to Dust as a complete program, it would return a boolean value and might print a message to the console (if the user is especially lucky). However, note that the program could be modified to return no value by simply adding a semicolon at the end. Compared to JavaScript, Dust's evaluation model is more predictable, less error-prone and will never trap the user into a frustating hunt for a missing semicolon. Compared to Rust, Dust's evaluation model is more accomodating without sacrificing expressiveness. In Rust, semicolons are *required* and *meaningful*, which provides excellent consistency but lacks flexibility. In JavaScript, semicolons are *required* and *meaningless*, which is a source of confusion for many developers. Dust borrowed Rust's approach to semicolons and their effect on evaluation and relaxed the rules to accomated different styles of coding. Rust isn't designed for command lines or REPLs but Dust is well-suited to those applications. Dust needs to work in a source file or in an ad-hoc one-liner sent to the CLI. Thus, semicolons are optional in most cases. There are two things you need to know about semicolons in Dust: - Semicolons suppress the value of whatever they follow. The preceding statement or expression will have the type `none` and will not evaluate to a value. - If a semicolon does not change how the program runs, it is optional. This example shows three statements with semicolons. The compiler knows that a `let` statement cannot produce a value and will always have the type `none`. Thanks to static typing, it also knows that the `write_line` function has no return value so the function call also has the type `none`. Therefore, these semicolons are optional. ```rust let a = 40; let b = 2; write_line("The answer is ", a + b); ``` Removing the semicolons does not alter the execution pattern or the return value. ```rust let x = 10 let y = 3 write_line("The remainder is ", x % y) ``` The next example produces a compiler error because the `if` block returns a value of type `int` but the `else` block does not return a value at all. Dust does not allow branches of the same `if/else` statement to have different types. In this case, adding a semicolon after the `777` expression fixes the error by supressing the value. ```rust // !!! Compile Error !!! let input = read_line() let reward = if input == "42" { write_line("You got it! Here's your reward.") 777 // <- We need a semicolon here } else { write_line(input, " is not the answer.") } ``` #### Type System All variables have a type that is established when the variable is declared. This usually does not require that the type be explicitly stated, Dust can infer the type from the value. Types are also associated with the arms of `if/else` statements and the return values of functions, which prevents different runtime scenarios from producing different types of values. #### Null-Free There is no `null` or `undefined` value in Dust. All values and variables must be initialized to one of the supported value types. This eliminates a whole class of bugs that permeate many other languages. "I call it my billion-dollar mistake. It was the invention of the null reference in 1965." - Tony Hoare Dust *does* have a `none` type, which should not be confused for being `null`-like. Like the `()` or "unit" type in Rust, `none` exists as a type but not as a value. It indicates the lack of a value from a function, expression or statement. A variable cannot be assigned to `none`. ### Basic Values Dust supports the following basic values: - Boolean: `true` or `false` - Byte: An unsigned 8-bit integer - Character: A Unicode scalar value - Float: A 64-bit floating-point number - Function: An executable chunk of code - Integer: A signed 64-bit integer - String: A UTF-8 encoded byte sequence Dust's "basic" values are conceptually similar because they are singular as opposed to composite. Most of these values are stored on the stack but some are heap-allocated. A Dust string is a sequence of bytes that are encoded in UTF-8. Even though it could be seen as a composite of byte values, strings are considered "basic" because they are parsed directly from tokens and behave as singular values. Shorter strings are stored on the stack while longer strings are heap-allocated. Dust offers built-in native functions that can manipulate strings by accessing their bytes or reading them as a sequence of characters. ## Previous Implementations Dust has gone through several iterations, each with its own design choices. It was originally implemented with a syntax tree generated by an external parser, then a parser generator, and finally a custom parser. Eventually the language was rewritten to use bytecode instructions and a virtual machine. The current implementation: compiling to bytecode with custom lexing and parsing for a register-based VM, is by far the most performant and the general design is unlikely to change, although it has been optimized and refactored several times. For example, the VM was refactored to manage multiple threads. Dust previously had a more complex type system with type arguments (or "generics") and a simple model for asynchronous execution of statements. Both of these features were removed to simplify the language when it was rewritten to use bytecode instructions. Both features are planned to be reintroduced in the future. ## Inspiration [Crafting Interpreters] by Bob Nystrom was a great resource for writing the compiler, especially the Pratt parser. The book is a great introduction to writing interpreters. Had it been discovered sooner, some early implementations of Dust would have been both simpler in design and more ambitious in scope. [The Implementation of Lua 5.0] by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar Celes was a great resource for understanding register-based virtual machines and their instructions. This paper was recommended by Bob Nystrom in [Crafting Interpreters]. [A No-Frills Introduction to Lua 5.1 VM Instructions] by Kein-Hong Man has a wealth of detailed information on how Lua uses terse instructions to create dense chunks that execute quickly. This was essential in the design of Dust's instructions. Dust uses compile-time optimizations that are based on Lua optimizations covered in this paper. [A Performance Survey on Stack-based and Register-based Virtual Machines] by Ruijie Fang and Siqi Liup was helpful for a quick yet efficient primer on getting stack-based and register-based virtual machines up and running. The included code examples show how to implement both types of VMs in C. The performance comparison between the two types of VMs is worth reading for anyone who is trying to choose between the two. Some of the benchmarks described in the paper inspired similar benchmarks used in this project to compare Dust to other languages. ## License Dust is licensed under the GNU General Public License v3.0. See the `LICENSE` file for details. ## References [^1]: [Crafting Interpreters](https://craftinginterpreters.com/) [^2]: [The Implementation of Lua 5.0](https://www.lua.org/doc/jucs05.pdf) [^3]: [A No-Frills Introduction to Lua 5.1 VM Instructions](https://www.mcours.net/cours/pdf/hasclic3/hasssclic818.pdf) [^4]: [A Performance Survey on Stack-based and Register-based Virtual Machines](https://arxiv.org/abs/1611.00467) [^5]: [List of C-family programming languages](https://en.wikipedia.org/wiki/List_of_C-family_programming_languages) [^6]: [ripgrep is faster than {grep, ag, git grep, ucg, pt, sift}](https://blog.burntsushi.net/ripgrep/#mechanics)