1
0
Programming language with a focus on ease of use, performance and correctness.
Go to file
2024-12-25 10:04:35 -05:00
bench Write docs 2024-12-18 06:00:42 -05:00
dust-cli Clean up 2024-12-21 13:20:57 -05:00
dust-lang Fix tests 2024-12-25 10:04:35 -05:00
examples Continue new VM implementation; Write docs 2024-12-17 16:31:32 -05:00
.gitignore Update .gitignore 2024-12-04 00:40:24 -05:00
build.sh Break up tests; Write docs 2024-12-11 01:22:40 -05:00
Cargo.lock Clean up 2024-12-21 13:20:57 -05:00
Cargo.toml Add a license; Improve CLI "tokenize" feature 2024-12-10 09:10:34 -05:00
LICENSE Add a license; Improve CLI "tokenize" feature 2024-12-10 09:10:34 -05:00
README.md Clean up the README and style the CLI 2024-12-18 08:49:45 -05:00

The Dust Programming Language

A fast, safe and easy to use language for general-purpose programming.

Dust is statically typed to ensure that each program is valid before it is run. Compiling is fast due to the purpose-built lexer and parser. Execution is fast because Dust uses instructions in a highly optimizable, custom bytecode format that runs in a multi-threaded VM. Dust combines compile-time safety guarantees and optimizations with negligible compile times and satisfying speed to deliver a unique set of features. It offers the best qualities of two disparate categories of programming language: the highly optimized but slow-to-compile languages like Rust and C++ and the quick-to-start but often slow and error-prone languages like Python and JavaScript.

Dust's syntax, safety features and evaluation model are based on Rust. Its instruction set, optimization strategies and virtual machine are based on Lua. Unlike Rust and other languages that compile to machine code, Dust has a very low time to execution. Unlike Lua and most other interpreted languages, Dust enforces static typing to improve clarity and prevent bugs.

Dust is developed with an emphasis on achieving foundational soundness before adding new features. Dust's planned features and design favor programmers who prefer their code to be simple and clear rather than clever and complex.

Dust is under active development and is not yet ready for general use.

Features discussed in this README may be unimplemented, partially implemented or temporarily removed.

// "Hello, world" using Dust's built-in I/O functions
write_line("Enter your name...")

let name = read_line()

write_line("Hello " + name + "!")
// The classic, unoptimized Fibonacci sequence
fn fib (n: int) -> int {
    if n <= 0 {
        return 0
    } else if n == 1 {
        return 1
    }

    fib(n - 1) + fib(n - 2)
}

write_line(fib(25))

Dust is an interpreted language that is compiled to bytecode by a hand-written lexer and parser. It uses a custom multi-threaded register-based virtual machine with separate-thread garbage collection. Competing with the runtime performance of Rust or C++ is not a goal. But competing with the approachability and simplicity of those languages is a goal. Dust does intend to be faster than Python, Ruby and NodeJS while also offering a superior development experience and more reliable code due to its static typing. Dust's development approach is informed by some books1 and academic research2 as well as practical insight from papers3 written by language authors. See the Inspiration section for more information or keep reading to learn about Dust's features.

Goals

This project's goal is to deliver a language that not only works but that offers genunine value due to a unique combination of design choices and a high-quality implementation. As mentioned in the first sentence, Dust's general aspirations are to be fast, safe and easy.

  • Easy
    • Simple Syntax Dust should be easier to learn than most programming languages. Its syntax should be familiar to users of other C-like languages to the point that even a new user can read Dust code and understand what it does. Rather than being dumbed down by a lack of features, Dust should be powerful and elegant in its simplicity, seeking a maximum of capability with a minimum of complexity. When advanced features are added, they should never obstruct existing features, including readability. Even the advanced type system should be clear and unintimidating.
    • Excellent Errors Dust should provide helpful error messages that guide the user to the source of the problem and suggest a solution. Errors should be a helpful learning ressource for users rather than a source of frustration.
    • Relevant Documentation Users should have the resources they need to learn Dust and write code in it. They should know where to look for answers and how to reach out for help.
  • Safe
    • Static Types Typing should prevent runtime errors and improve code quality, offering a superior development experience despite some additional constraints. Like any good statically typed language, users should feel confident in the type-consistency of their code and not want to go back to a dynamically typed language.
    • Memory Safety Dust should be free of memory bugs. Being implemented in Rust makes this easy but, to accomodate long-running programs, Dust still requires a memory management strategy. Dust's design is to use a separate thread for garbage collection, allowing the main thread to continue executing code while the garbage collector looks for unused memory.
  • Fast
    • Fast Compilation Despite its compile-time abstractions, Dust should compile and start executing quickly. The compilation time should feel negligible to the user.
    • Fast Execution Dust should be generally faster than Python, Ruby and NodeJS. It should be competitive with highly optimized, modern, register-based VM languages like Lua. Dust should be benchmarked during development to inform decisions about performance.
    • Low Resource Usage Despite its performance, Dust's use of memory and CPU power should be conservative and predictable enough to accomodate a wide range of devices.

These are the project's general design goals. There are many more implementation goals. Among them are:

  • Effortless Concurrency: Dust should offer an excellent experience for writing multi-threaded programs. The language's native functions should offer an API for spawning threads, sending messages and waiting for results. When using these features, Dust should be much faster than any single-threaded language. However, Dust should be fast even when running on a single thread. Single-threaded performce is the best predictor of multi-threaded performance so continuing to optimize how each thread executes instructions, accesses memory and moves pointers is the best way to ensure that Dust is fast in all scenarios.
  • Embeddability: The library should be easy to use so that Dust can be built into other applications. Dust should compile to WebAssembly and offer examples of how to use it in a web application. The user should be able to query the VM for information about the program's state and control the program's execution. It should be possible to view and modify the value of a variable and inspect the call stack.
  • Data Fluency: Dust's value type should support conversion to and from arbitrary data in formats like JSON, YAML, TOML and CSV. Pulling data into a Dust program should be easy, with built-in functions offering conversion for the most widely used formats.
  • Portability: Dust should run on as many architectures and operating systems as possible. Using fewer dependencies and avoiding platform-specific code will help Dust achieve this goal. The Dust library should be available as a WebAssembly module.
  • Developer Experience: Dust should be fun and easy to use. That implies easy installation and the availability of tutorials and how-to guides. The CLI should be predictable and feature-rich, with features that make it easy to write and debug Dust code like formatting, bytecode disassembly and logging.
  • Advanced Type System: Dust should implement composite types, aliases and generics. The type system should use a descriptive syntax that is easy to understand. Dust's type system should be static, meaning that types are checked before a program reaches the VM. Dust is not a graduallly typed language, its VM is and should remain type-agnostic.
  • Thorough Testing: Primarily, the output of Dust's compiler and VM should be tested with programs that cover all of the language's features. The tests should be actively maintained and should be changed frequently to reflect a growing project that is constantly discovering new optimizations and opportunities for improvement.

Project Status

This project is maintained by a single developer. For now, its primary home is on a private git server. The GitHub mirror is updated automatically and should carry the latest branches. There are no other contributors at this time but the project is open to feedback and should eventually accept contributions.

For now, both the library API and the implementation details are freely changed and the CLI has not been published. Dust is both an ambitious project and a continuous experiment in language design. Features may be redesigned and reimplemented at will when they do not meet the project's performance or usability goals. This approach maximizes the development experience as a learning opportunity and enforces a high standard of quality but slows down the process of delivering features to users. Eventually, Dust will reach a stable release and will be ready for general use. As the project approaches this milestone, the experimental nature of the project will be reduced and a replaced with a focus on stability and improvement.

Language Overview

This is a quick overview of Dust's syntax features. It skips over the aspects that are familiar to most programmers such as creating variables, using binary operators and printing to the console. Eventually there should be a complete reference for the syntax.

Syntax and Evaluation

Dust belongs to the C-like family of languages4, with an imperative syntax that will be familiar to many programmers. Dust code looks a lot like Ruby, JavaScript, TypeScript and other members of the family but Rust is its primary point of reference for syntax. Rust was chosen as a syntax model because its imperative code is obvious by design and widely familiar. Those qualities are aligned with Dust's emphasis on usability.

However, some differences exist. Dust evaluates all of the code in the file while Rust only initiates from a "main" function. Dust's execution model is more like one found in a scripting language. If we put 42 + 42 == 84 into a file and run it, it will return true because the outer context is, in a sense, the "main" function.

So while the syntax is by no means compatible, it is superficially similar, even to the point that syntax highlighting for Rust code works well with Dust code. This is not a design goal but a happy coincidence.

Semicolons

Dust borrowed Rust's approach to semicolons and their effect on evaluation and relaxed the rules to accomated different styles of coding. Rust, isn't designed for command lines or REPLs but Dust could be well-suited to those applications. Dust needs to work in a source file or in an ad-hoc one-liner sent to the CLI. Thus, semicolons are optional in most cases.

There are two things you need to know about semicolons in Dust:

  • Semicolons suppress the value of whatever they follow. The preceding statement or expression will have the type none and will not evaluate to a value.
  • If a semicolon does not change how the program runs, it is optional.

This example shows three statements with semicolons. The compiler knows that a let statement cannot produce a value and will always have the type none. Thanks to static typing, it also knows that the write_line function has no return value so the function call also has the type none. Therefore, these semicolons are optional.

let a = 40;
let b = 2;

write_line("The answer is ", a + b);

Removing the semicolons does not alter the execution pattern or the return value.

let x = 10
let y = 3

write_line("The remainder is ", x % y)

The next example produces a compiler error because the if block returns a value of type int but the else block does not return a value at all. Dust does not allow branches of the same if/else statement to have different types. In this case, adding a semicolon after the 777 expression fixes the error by supressing the value.

// !!! Compile Error !!!
let input = read_line()
let reward = if input == "42" {
    write_line("You got it! Here's your reward.")

    777 // <- We need a semicolon here
} else {
    write_line(input, " is not the answer.")
}

Statements and Expressions

Dust is composed of statements and expressions. If a statement ends in an expression without a trailing semicolon, the statement evaluates to the value produced by that expression. However, if the expression's value is suppressed with a semicolon, the statement does not evaluate to a value. This is identical to Rust's evaluation model. That means that the following code will not compile:

// !!! Compile Error !!!
let a = { 40 + 2; }

The a variable is assigned to the value produced by a block. The block contains an expression that is suppressed by a semicolon, so the block does not evaluate to a value. Therefore, the a variable would have to be uninitialized (which Dust does not allow) or result in a runtime error (which Dust avoids at all costs). We can fix this code by moving the semicolon to the end of the block. In this position it suppresses the value of the entire let statement. As we saw above, a let statement never evaluates to a value, so the semicolon has no effect on the program's behavior and could be omitted altogether.

let a = { 40 + 2 }; // This is fine
let a = { 40 + 2 }  // This is also fine

Only the final expression in a block is returned. When a let statement is combined with an if/else statement, the program can perform side effects before evaluating the value that will be assigned to the variable.

let random: int = random(0..100)
let is_even = if random == 99 {
    write_line("We got a 99!")

    false
} else {
    random % 2 == 0
}

is_even

If the above example were passed to Dust as a complete program, it would return a boolean value and might print a message to the console (if the user is especially lucky). However, note that the program could be modified to return no value by simply adding a semicolon at the end.

Compared to JavaScript, Dust's evaluation model is more predictable, less error-prone and will never trap the user into a frustating hunt for a missing semicolon. Compared to Rust, Dust's evaluation model is more accomodating without sacrificing expressiveness. In Rust, semicolons are required and meaningful, which provides excellent consistency but lacks flexibility. In JavaScript, semicolons are required and meaningless, which is a source of confusion for many developers.

Control Flow

-- TODO --

Functions

-- TODO --

Type System

All variables have a type that is established when the variable is declared. This usually does not require that the type be explicitly stated, Dust can infer the type from the value. Types are also associated with the arms of if/else statements and the return values of functions, which prevents different runtime scenarios from producing different types of values.

Null-Free

There is no null or undefined value in Dust. All values and variables must be initialized to one of the supported value types. This eliminates a whole class of bugs that permeate many other languages. "I call it my billion-dollar mistake. It was the invention of the null reference in 1965." - Tony Hoare

Dust does have a none type, which should not be confused for being null-like. Like the () or "unit" type in Rust, none exists as a type but not as a value. It indicates the lack of a value from a function, expression or statement. A variable cannot be assigned to none.

Immutability by Default

TODO

Memory Safety

TODO

Basic Values

Dust supports the following basic values:

  • Boolean: true or false
  • Byte: An unsigned 8-bit integer
  • Character: A Unicode scalar value
  • Float: A 64-bit floating-point number
  • Function: An executable chunk of code
  • Integer: A signed 64-bit integer
  • String: A UTF-8 encoded byte sequence

Dust's "basic" values are conceptually similar because they are singular as opposed to composite. Most of these values are stored on the stack but some are heap-allocated. A Dust string is a sequence of bytes that are encoded in UTF-8. Even though it could be seen as a composite of byte values, strings are considered "basic" because they are parsed directly from tokens and behave as singular values. Shorter strings are stored on the stack while longer strings are heap-allocated. Dust offers built-in native functions that can manipulate strings by accessing their bytes or reading them as a sequence of characters.

Composite Values

TODO

Previous Implementations

Dust has gone through several iterations, each with its own design choices. It was originally implemented with a syntax tree generated by an external parser, then a parser generator, and finally a custom parser. Eventually the language was rewritten to use bytecode instructions and a virtual machine. The current implementation: compiling to bytecode with custom lexing and parsing for a register-based VM, is by far the most performant and the general design is unlikely to change, although it has been optimized and refactored several times. For example, the VM was refactored to manage multiple threads.

Dust previously had a more complex type system with type arguments (or "generics") and a simple model for asynchronous execution of statements. Both of these features were removed to simplify the language when it was rewritten to use bytecode instructions. Both features are planned to be reintroduced in the future.

Inspiration

[Crafting Interpreters] by Bob Nystrom was a great resource for writing the compiler, especially the Pratt parser. The book is a great introduction to writing interpreters. Had it been discovered sooner, some early implementations of Dust would have been both simpler in design and more ambitious in scope.

[The Implementation of Lua 5.0] by Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar Celes was a great resource for understanding register-based virtual machines and their instructions. This paper was recommended by Bob Nystrom in [Crafting Interpreters].

[A No-Frills Introduction to Lua 5.1 VM Instructions] by Kein-Hong Man has a wealth of detailed information on how Lua uses terse instructions to create dense chunks that execute quickly. This was essential in the design of Dust's instructions. Dust uses compile-time optimizations that are based on Lua optimizations covered in this paper.

[A Performance Survey on Stack-based and Register-based Virtual Machines] by Ruijie Fang and Siqi Liup was helpful for a quick yet efficient primer on getting stack-based and register-based virtual machines up and running. The included code examples show how to implement both types of VMs in C. The performance comparison between the two types of VMs is worth reading for anyone who is trying to choose between the two. Some of the benchmarks described in the paper inspired similar benchmarks used in this project to compare Dust to other languages.

License

Dust is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

References