Edit article

This commit is contained in:
Jeff 2023-08-18 13:48:48 -04:00
parent d869a421f3
commit 23bafa53ba

View File

@ -4,21 +4,23 @@ I recently wrote a command-line shell and programming language called [whale]. I
## The String Problem
The old shells (i.e sh, bash, fish and zsh) differ in meaningful ways. For example, control flow, variable assignment, how they read startup scripts, etc. But fundamentally, they all represent data the same way: as text strings. This is a problem because, unless the user entered it, a string is rarely the most meaninful way to present data.
Traditional shells (e.g. sh, bash, fish and zsh) differ in meaningful ways. For example, control flow, variable assignment, how they read startup scripts, etc. But fundamentally, they all represent data the same way: as text strings. This is a problem because, unless the user entered it, a string is rarely the most meaningful way to present data.
String-based shells work well when you are evoking commands with flags and arguments because those are all user-entered strings. But when you fetch some JSON data from the web or export a database as CSV, you are suddenly left with an unwieldy wall of text that the shell can't handle on its own. This is not actually a design flaw, these shells were designed to solve problems related to text because they ran on systems that were almost entirely text-based.
String-based shells work well when you are evoking commands with flags and arguments because those are all user-entered strings. But when you fetch some JSON data from the web or export a database as CSV, you are suddenly left with an unwieldy wall of text that the shell can't handle on its own. This is not actually a design flaw, these shells were designed to solve problems related to text because they ran on systems that were entirely text-based.
> All the core UNIX system tools were designed so that they could operate
> together. The original text-based editors (and even TeX and LaTeX) use ASCII
> (the American text encoding standard; an open standard) and you can use tools
> such as; sed, awk, vi, grep, cat, more, tr and various other text-based tools
> in conjunction with these editors.
>
> [The Unix Tools Philosophy]
But structured data has come a long way since those shells were designed. Three of the most popular shells (sh, bash and zsh) are older than both JSON and CSV. The CSV format was formally defined in 2005, the year that fish was released. And JSON's formal definition did not come until 2017. These shells use a single ambiguous data type (the string) because it is universal and ad-hoc parsing through sed and awk was the best that could be done.
## The Unix Tool Problem
The whole point of a command-line shell, in my estimation, is to allow the user to pass data to and between single-purpose programs. The [unix tool] philosophy is to do one thing and do it well. One writes programs by processing input into output. This is a simple yet monumental computing and programming paradigm.
The whole point of a command-line shell, in my estimation, is to allow the user to pass data to and between single-purpose programs. The unix tool philosophy is to do one thing and do it well. One writes programs by processing input into output. This is a simple yet monumental computing and programming paradigm.
Is it possible to disambiguate data on the command line by using more than just strings without violating this philosophy? In this paradigm, it may seem that the shell *should* only recognize data in the form of an ambiguous type like the string because the meaningful work is supposed to happen inside of programs, not between them.
@ -82,5 +84,9 @@ json_string = to_json(parsed_data);
## Writing Unix Tools for Structured Data
Rust users are lucky to have [serde], a library that can create in-memory values from data formats. If you are compiling a program to be used as a command-line tool, I would urge you to think about how you might use structured data as inputs and/or outputs. [Pandoc], for example, can use a YAML file as an alternative to command-line arguments but I would prefer it to have an additional option to pass that YAML to standard input.
Rust users are lucky to have [serde], a library that can create in-memory values from data formats and vica versa. If you are compiling a program to be used as a command-line tool, I would urge you to think about how you might use structured data as input and output. [Pandoc], for example, can use a YAML file as an alternative to command-line arguments but I would prefer it to have an additional option to pass that YAML to standard input. In the meantime, it is easy enough to implement pandoc's features in whale by translating a data structure into command line arguments. It is not a perfect solution but it goes a long way to prove that the command line shell is still the gold standard of computing.
[Pandoc]: https://pandoc.org/
[serde]: https://serde.rs/
[nushell]: https://www.nushell.sh/
[whale]: https://git.jeffa.io/jeff/whale