puter/packages/phoenix/doc/devlog.md
2024-04-12 20:53:44 -04:00

42 KiB

2023-05-05

Iframe Shell Architecture

Separating the terminal emulator from the shell will make it possible to re-use Puter's terminal emulator for containers, emulators, and tunnels.

Puter's shell will follow modern approaches for handling data; this means:

  • Commands will typically operate on streams of objects rather than streams of bytes.
  • Rich UI capabilities, such as images, will be possible.

To use Puter's shell, this terminal emulator will include an adapter shell. The adapter shell will delegate to the real Puter shell and provide a sensible interface for an end-user using an ANSI terminal emulator.

This means the scope of this terminal emulator is to be compatible with two types of shells:

  • Legacy ANSI-compatible shells
  • Modern Puter-compatible shells

To avoid duplicate effort, the ANSI adapter for the Puter shell will be accessed by the terminal emulator through cross-document messaging, as though it were any other ANSI shell provided by a third-party service. This will also keep these things loosely coupled so we can separate the adapter in the future and allow other terminal emulators to take advantage of it.

2023-05-06

The Context

In creating the state processor I made a variable called ctx, representing contextual information for state functions.

The context has a few properties on it:

  • constants
  • locals
  • vars
  • externs

constants

Constants are immutable values tied to the context. They can be overriden when a context is constructed but cannot be overwritten within an instance of the context.

variables

Variabes are mutable context values which the caller providing the context might be able to access.

locals

Locals are the same as varaibles but the state processor exports them. This might not have been a good idea; maybe to the user of a context these should appear to be the same as variables, because the code using a context doesn't care what the longevity of locals is vs variables.

Perhaps locals could be a useful concept for values that only change under a sub-context, but this is already true about constants since sub-contexts can override them. After all, I can't think of a compelling reason not to allow overridding constants when you're creating a sub-context.

externs

Externs are like constants in that they're not mutable to the code using a context. However, unlike constants they're not limited to primitive values. They can be objects and these objects can have side-effects.

How to make the context better moving forward?

Composing contexts

The ability to compose context would be useful. For example the readline function could have a context that's a composition of the ANSI context (containing ANSI constantsl maybe also library functions in the future), an outputter context since it outputs characters to the terminal, as well as a context specific to handlers under the readline utility.

Additional reflection

This idea of contexts and compositing contexts is actually something I've been thinking about for a long time. Contexts are an essential component in FOAM for example. However, this idea of separating constants, imports, and side-effect varibles (that is, variables something else is able to access), is not something I thought about until I looked at the source code for ash (an implementation of sh), and considered how I might make that source code more portable by repreasting it as language-agnostic data.

2023-05-07

Conclusion of Context Thing from Yesterday

I just figured something out after re-reading yesterday's devlog entry.

While the State Processor needs a separate concept of variables vs locals, even the state functions don't care about this distinction. It's only there so certain values are cleared at each iteration of the state processor.

This means a context can be composed at each iteration containing both the instance variables and the transient variables.

When Contexts are Equivalent to Pure Functions

In pure-functional logic functions do not have side effects. This means they would never change a value by reference, but they would return a value.

When a subcontext is created prior to a function call, this is equivalent to a pure function under certain conditions:

  • the values which may be changed must be explicity stated
  • the immediate consequences of updating any value are known

2023-05-08

Sending authorization information to the shell

Separating the terminal emulator from the shell currenly means that the terminal is a Puter app and the shell is a service being used by a Puter app, rather than natively being a Puter app.

This may change in the future, but currently it means the terminal emulator needs to - not because it's the terminal emulator, but because it's the Puter application - configure the shell with authorization information.

There are a few different approaches to this:

  • pass query string parameters onto the shell via url
  • send a non-binary postMessage with configuration
  • send an ANSI escape code followed by a binary-encoded configuration message
  • construct a Form object in javascript and do a POST request to the target iframe

The last option seems like it could be a CORS nightmare since right now I'm testing in a situation where the shell happens to be under the same domain name as the terminal emulator, but this may not always be the case.

Passing query string parameters over means authorization tokens are inside the DOM. While this is already true about the parent iframe I'd like to avoid this in case we find security issues with this approach under different situations. For example the parent iframe is in a situation where userselect and the default context menu are disabled, which may be preventing a user from accidentally putting sensitive html attributes in their clipboard.

That leaves the two options for sending a postMessage: either binary, or a non-binary message. The binary approach would require finding handling an OSC escape sequence handler and creating some conventions for how to communicate with Puter's API using ANSI escape codes. While this might be useful in the future, it seems more practical to create a higher-level message protocol first and then eventually create an adapter for OSC codes in the future if need is found for one.

So with that, here are window messages between Puter's ANSI terminal emulator and Puter's ANSI adapter for Puter's shell:

Ready message

Sent by shell when it's loaded.

{ $: 'ready' }

Config message

Sent by terminal emulator after shell is loaded.

{
  $: 'config',
  ...variables
}

All variables are currently keys from the querystring but this may change as the authorization mechanism and available shell features mature.

2023-05-09

Parsing CLI arguments

Node has a bulit-in utility, but using this would be unreliable because it's allowed to depend on system-provided APIs which won't be available in a browser.

There's a polyfill which doesn't appear to depend on any node builtins. It does not support sub-commands, nor does it generate helptext, but it's a starting point.

If each command specifies a parser for CLI arguments, and also provides configuration in a format specific to that parser, there are a few advantages:

  • easy to migrate away from this polyfill later by creating an adapter or updating the commands which use it.
  • easy to add custom argument processors for commands which have an interface that isn't strictly adherent to convention.
  • auto-complete and help can be generated with knowledge of how CLI arguments are processed by a particular command.

2023-05-10

Kind of tangential, but synonyms are annoying

The left side of a UNIX pipe is the

  • source, faucet, producer, upstream

The right side of a UNIX pipe is the

  • target, sink, consumer, downstream

I'm going to go with source and target for any cases like this because they have the same number of letters, and I like when similar lines of code are the same length because it's easier to spot errors.

2023-05-14

Retro: Terminal Architecture

class: PreparedCommand

A prepared command contains information about a command which will be invoked within a pipeline, including:

  • the command to be invoked
  • the arguments for the command (as tokens)
  • the context that the command will be run under

A prepared command is created using the static method PreparedCommand.createFromTokens. It does not have a context until setContext is later called.

class Pipeline

A pipeline contains PreparedCommand instances which represent the commands that will be run together in a pipeline.

A pipeline is created using the static method Pipeline.createFromTokens, which accepts a context under which the pipeline will be constructed. The pipeline's execute method will also be passed a context when the pipeline should begin running, and this context can be different. (this is also the context that will be passed to each PreparedCommand instance before each respective execute method is called).

class Pipe

A pipe is composed of a readable stream and a writable stream. A respective reader and writer are exposed as out and in respectively.

The readable stream and writable stream are tied together.

class Coupler

A coupler aggregates a reader and a writer and begins actively reading, relaying all items to the writer.

This behaviour allows a coupler to be used as a way of connecting two pipes together.

At the time of writing this, it's used to tie the pipe that is created after the last command in a pipeline to the writer of the pseudo terminal target, instead of giving the last command this writer directly. This allows the command to close its output pipe without affecting subsequent functionality of the terminal.

Behaviour of echo escapes

behaviour of \\ should be verified

Based on experimentation in Bash:

  • \\ is always seen by echo as \
    • this means \\a and \a are the same

difference between \x and \0

In echo, \0 initiates an octal escape while \x initiates a hexadecimal escape.

However, \0 without being followed by a valid octal sequence is considered to be NULL, while \x will be outputted literally if not followed with a valid hexadecimal sequence.

If either of these escapes has at least one valid character for its respective numeric base, it will be processed with that value. So, for example, echo -en "\xag" | hexdump -C shows bytes 0A 67, as does the same with \x0ag instead of \xag.

2023-05-15

Synchronization bug in Coupler

this issue was caused by an issue where the Coupler between a pipeline and stdout was still writing after the command was completed. This happens because the listener loop in the Coupler is async and might defer writing until after the pipeline has returned.

This was fixed by adding a member to Coupler called isDone which provides a promise that resolves when the Coupler receives the end of the stream. As a consequence of this it is very important to ensure that the stream gets closed when commands are finished executing; right now the PreparedCommand class is responsible for this behaviour, so all commands should be executed via PreparedCommand.

tail, and echo output chunking

Right now tail outputs the last two items sent over the stream, and doesn't care if these items contain line breaks. For this implementation to work the same as the "real" tail, it must be asserted that each item over the stream is a separate line.

Since ls is outputting each line on a separate call to out.write it is working correctly with tail, but echo is not. This could be fixed in tail itself, having it check each item for line breaks while iterating backwards, but I would rather have metadata on each command specifying how it expects its input to be chunked so that the shell can accommodate; although this isn't how it works in "real" bash, it wouldn't affect the behaviour of shell scripts or input and it's closer to the model of Puter's shell for JSON-like structured data, which may help with improved interoperability and better code reuse.

2023-05-22

Catching Up

There hasn't been much to log in the past few days; most updates to the terminal have been basic command additions.

The next step is adding the redirect operators (< and >), which should involve some information written in this dev log.

Multiple Output Redirect

In Bash, the redirect operator has precedence over pipes. This is sensible but also results in some situations where a prompt entry has dormant pieces, for example two output redirects (only one of them will be used), or an output redirect and a pipe (the pipe will receive nothing from stdout of the left-hand process).

Here's an example with two output redirects:

some-command > a_file.txt > b_file.txt

In Puter's ANSI shell we could allow this as a way of splitting the output. Although, this only really makes sense if stdout will also be passed through the pipeline instead of consumed by a redirect, otherwise the behaviour is counterintuitive.

Maybe for this purpose we can have a couple modes of interpretation, one where the Puter ANSI Shell behaves how Bash would and another where it behaves in a more convenient way. Shell scripts with no hashbang would be interpreted the Bash-like way while shell scripts with a puter-specific hashbang would be interpreted in this more convenient way.

For now I plan to prioritize the way that seems more logical as it will help keep the logic of the shell less complicated. I think it's likely that we'll reach full POSIX compatibility via Bash running in containers or emulators before the Puter ANSI shell itself reaches full POSIX compatibility, so for this reason it makes sense to prioritize making the Puter ANSI shell convenient and powerful over making it behave like Bash. Additionally, we have a unique situation where we're not so bound to backwards compatibility as is a distribution of a personal computer operating system, so we should take advantage of that where we can.

2023-05-23

Adding more coreutils

  • clear was very easy; it's just an escape code
  • printenv was also very easy; most of the effort was already done

First steps to handling tab-completion

Getting desired tab-completion behaviour from input state

Tab-completion needs information about the type of command arguments. Since commands are modelled, it's possible the model of a command can provide this information. For example a registered command could implement getTabCompleterFor(ARG_SPEC).

ARG_SPEC would be an identifier for an argument that is understood by readline. Ex: { $: 'positional', pos: 0 } for the first positional argument, or { $: 'named', name: 'verbose' } for a named parameter called verbose.

The command model already has a nested model specifying how arguments are parsed, so this model could describe the behaviour for a getArgSpecFromInputState(input, i), where input is the current text in readline's buffer and i is the cursor position. This separates the concern of knowing what parameter the user is typing in from readline, allowing modelled commands to support tab completion for arbitrary syntaxes.

revision

It's better if the command model has just one method which readline needs to call, ex: getTabCompleterFromInputState. I've left the above explanation as-is however because it's easier to explain the two halves if its functionality separately.

Trigger background readdir call on PWD change

When working on the FUSE driver for Puter's filesystem I noticed that tab completion makes a readdir call upon the user pressing tab which blocks the tab completion behaviour until the call is finished. While this works fine on local filesystems, it's very confusing on remote filesystems where the ping delay will - for a moment - make it look like tab completion isn't working at all.

Puter's shell can handle this a bit better. Triggering a readdir call whenever PWD changes will allow tab-completion to have a quicker response time. However, there's a caveat; the information about what nodes exist in that directory might be outdated by the time the user tries to use tab completion.

My first thought was for "tab twice" to invoke a readdir to get the most recent result, however this conflicts with pressing tab once to get the completed path and then pressing tab a second time to get a list of files within that path.

My second thougfht is using ctrl + tab. The terminal will need to provide some indication to the user that they can do this and what is happening.

Here are a few thoughts on how to do this with ideal UX:

  • after pressing tab:
    • complete the text if possible
    • highlight the completed portion in a bright color
      • a dim colour would convey that the completion wasn't input yet
    • display in a hint bar the following items:
      • [Ctrl+Tab]: re-complete with recent data
      • [Ctrl+Enter]: more options

Implementation of background readdir

The background readdir could be invoked in two ways:

  • when the current working directory changes
  • at a poll interval

These means the action of invoking background readdir needs to be separate from the method by which it is called.

Also, results from a previous readdir need to be marked invalid when the current working directory changes.

There is a possibility that the user might use tab completion before the first readdir is called for a given pwd, which means the method to get path completions must be async.

if readdir is called because of a pwd change, the poll timer should be reset so that it's not called again too quickly or at the same time.

Concern Mapping

  • PuterANSIShell
    • does not need to be aware of this feature
  • readline
    • needs to trap Tab
    • needs to recognize what command is being entered
    • needs to delegate tab completion logic to the command's model
    • does not need to be aware of how tab completion is implemented
  • readdir action
    • needs WRITE to cached dir lists
  • readdir poll timer
    • needs READ to cached dir lists to check when they were updated
    • needs the path to be polled

Order of implementation

  • First implementation will not have background readdir.
    • Interfaces should be appropriate to implement this after.
  • When tab completion is working for paths, then readdir caching can be implemented.

2023-05-25

Revising the boundary between ANSI view and Puter Shell

Now there are several coreutil commands and a few key shell features, so it's a good time to take a look at the architecture and see if the boundary between the ANSI view and Puter Shell corresponds to the original intention.

Shell I/O instructions
ANSI Adapter TTY text
Puter Shell JSON logical tree

Note from the above table that the Puter Shell itself should be "syntax agnostic" - i.e. it needs the ANSI adapter or a GUI on top of it to be useful at the UI boundary.

Pipelines

The ANSI view should be concerned with pipe syntax, while pipeline execution should be a concern of the syntax-agnostic shell. However, currently the ANSI view is responsible for both. This is because there is no intermediate format for parsed pipeline instructions.

to improve
  • create intermediate representation of pipelines and redirects

Command IO

The ANSI shell does IO in terms of either bytes or strings. When commands output strings instead of bytes, their output is adapted to the Uint8Array type to prevent commands further in the pipeline from misbehaving due to an unexpected input type.

Since pipeline I/O should be handled at the Puter shell, this kind of adapting will happen at that level also.

to improve

  • ANSI view should send full pipeline to Puter Shell
  • Puter Shell protocol should be improved so that the client/view can specify a desired output format (i.e. streams vs objects)

Pipeline IR

The following is an intermediate representation for pipelines which separates the concern of the ANSI shell syntax from the logical behaviour that it respresents.

{
  $: 'pipeline',
  nodes: [
    {
      $: 'command',
      id: 'ls',
      positionals: [
        '/ed/Documents'
      ]
    },
    {
      $: 'command',
      id: 'tail',
      params: {
        n: 2
      }
    }
  ]
}

The $ property identifies the type of a particular node. The space of other properties including the $ symbol is reserved for meta information about nodes; for example properties like $origin and $whitespace could turn this AST into a CST.

For the same of easier explanation here I'm going to coin the term "Abstract Logic Tree" (ALT) and use it along with the conventional terms as follows:

Abrv Name Represents
ALT Abstract Logic Tree What it does
AST Abstract Syntax Tree How it was described
CST Concrete Syntax Tree How it was formatted

The pipeline format described above is an AST for the input that was understood by the ANSI shell adapter. It could be converted to an ALT if the Puter Shell is designed to understand pipelines a little differently.

{
  $: 'tail',
  subject: {
    $: 'list',
    subject: {
      $: 'filepath',
      id: '/ed/Documents'
    }
  }
}

This is not final, but shows how the AST for pipeline syntax can be developed in the ANSI shell adapter without constraining how the Puter Shell itself works.

Syntaxes

Why CST tokenization in a shell would be useful

There are a lot of decisions to make at every single level of syntax parsing. For example, consider the following:

ls | tail -n 2 > "some \"data\".txt"

Tokens can be interpreted at different levels of detail. A standard shell tokenizer would likely eliminate information about escape characters within quoted strings at this point. For example, right now the Puter ANSI shell adapter takes after what a standard shell does and goes for the second option described here:

[
  'ls', '|', 'tail', '-n', '2', '>',
  // now do we do [","some ", "\\\"", ...],
  // or do we do ["some \"data\".txt"] ?
]

This is great for processing and executing commands because this information is no longer relevant at that stage.

However, suppose you wanted to add support for syntax highlighting, or tell a component responsible for a specific context of tab completion where the cursor is with respect to the tokenized information. This is no longer feasible.

For the latter case, the ANSI shell adapter works around this issue by only parsing the commandline input up to the cursor location - meaning the last token will always represent the input up to the cursor location. The input after is truncated however, leading to the familiar inconvenient situation seen in many terminals where tab completion does something illogical with respect the text after your cursor.

i.e. the following, with the cursor position represented by X:

echo "hello" > some_Xfile.txt

will be transformed into the following:

echo "hello" > some_file.txtXfile.txt

What would be more helpful:

  • terminal bell, because some_file.txt is already complete
  • some_other_Xfile.txt if some_other_file.txt exists

So syntax highlighting and tab completion are two reasons why the CST is useful. There may be other uses as well that I haven't thought of. So this seems like a reasonable idea.

Choosing monolithic or composite lexers

Next step, there are also a lot of decisions to make about processing the text into tokens.

For example, we can take after the very feature that make shells so versatile - pipelines - and apply this concept to the lexer.

Level 1 lexer produces:
  ls, |, tail, -n, 2, >, ", some , \", data, \", .txt

Level 2 lexer produces:
  ls, |, tail, -n, 2, >, "some \"data\".txt"

This creates another decision fork, actually. It raises the question of how to associate the token "some "data".txt" with the tokens it was composed from at the previous level or lexing, if this should be done at all, and otherwise if CST information should be stored with the composite token.

If lexers provide verbose meta information there might be a concern about efficiency, however lexers could be configurable in this respect. Furthermore, lexers could be defined separately from their implementation and JIT-compiled based on configuration so you actually get an executable bytecode which doesn't produce metadata (for when it's not needed).

While designing JIT-compilable lexer definitions is incredibly out of scope for now, the knowledge that it's possible justifies the decision to have lexers produce verbose metadata.

If the "Level 1 lexer" in the example above stores CST information in each token, the "Level 2 lexer" can simply let this information propagate as it stores information about what tokens were composed to produce a higher-level token. This means concern about whitespace and formatting is limited to the lowest-level lexer which makes the rest of the lexer stack much easier to maintain.

An interesting philosophical point about lexers and parsers

Consider a stack of lexers that builds up to high-level constructs like "pipeline", "command", "condition", etc. The line between a parser and a lexer becomes blurry, as this is in fact a bottom-up parser composed of layers, each of which behaves like a lexer.

I'm going to call the layers PStrata (singular: PStratum) to avoid collision with these concepts.

The "Implicit Interface Aggregator"

Vanilla javascript doesn't have interfaces, which sometimes seems to make it difficult to have guarantees about type methods an object will implement, what values they'll be able to handle, etc.

To solve some of the drawbacks of not having interfaces, I'm going to use a pattern which Chat GPT just named the Implicit Interface Aggregator Pattern.

The idea is simple. Instead of having an interface, you have a class which acts as the user-facing API, and holds the real implementation by aggregation. While this doesn't fix everything, it leaves the doors open for multiple options in the future, such as using typescript or a modelling framework, without locking either of these doors too early. Since we're potentially developing on a lot of low-level concepts, perhaps we'll even have our own technology that we'd like to use to describe and validate the interfaces of the code we write at some point in the future.

This class can handle concerns such as adapting different types of inputs and outputs; things which an implementation doesn't need to be concerned with. Generally this kind of separation of concerns would be done using an abstract class, but this is an imperfect separation of concerns because the implementor needs to be aware of the abstract class. Granted, this isn't usually a big deal, but what if the abstract class and implementors are compiled separately? It may be advantageous that implementors don't need to have all the build dependencies of the abstract class.

The biggest drawback of this approach is that while the aggregating class can implement runtime assertions, it doesn't solve the issue of the lack of build-time assertions, which are able to prevent type errors from getting to releases entirely. However, it does leave room to add type definitions for this class and its implementors (turning it into the facade pattern), or apply model definitions (or schemas) to the aggregator and the output of a static analysis to the implmentors (turning it into a model definition).

Where this will be used

The first use of this pattern will be PStratum. PStratum is a facade which aggregates a PStratumImplementor using the pattern described above.

The following layers will exist for the shell:

  • StringPStratum will take a string and provide bytes.
  • LowLexPStratum will take bytes and identify all syntax tokens and whitespace.
  • HiLexPStratum will create composite tokens for values such as string literals
  • LogicPStratum will take tokens as input and produce AST nodes. For example, this is when successive instances of the | (pipe) operator will be converted into a pipeline construct.

First results from the parser

It appears that the methods I described above are very effective for implementing a parser with support for concrete syntax trees.

By wrapping implementations of Parser and PStratum in facades it was possible to provide additional functionality for all implementations in one place:

  • fork and join is implemented by PStratum; each implementation does not need to be aware of this feature.
  • the look function (AKA "peek" behaviour) is implemented by PStratum as well.
  • A PStratum implementation can implement the behaviour to reach for previous values, but PStratum has a default implementation. The BytesPStratumImpl overrides this to provide Uint8Arrays instead of arrays of Number values.
  • If parser implementations don't return a value, Parser will create the ParseResult that represents an unrecognized input.

It was also possible to add a Parser factory which adds additional functionality to the sub-parsers that it creates:

  • track the tokens each parser gets from the delegate PStratum and keep a record of what lower-level tokens were composed to produce higher-level tokens
  • track how many tokens each parser has read for CST metadata

A layer called MergeWhitespacePStratumImpl completes this by reading the source bytes for each token and using it to compute a line and column number. After this, the overall parser is capable of starting the start byte, end byte, line number, and column number for each token, as well as preserve this information for each composite token created at higher levels.

The following parser configuration with a hard-coded input was tested:

sp.add(
    new StringPStratumImpl(`
        ls | tail -n 2 > "test \\"file\\".txt"
    `)
);
sp.add(
    new FirstRecognizedPStratumImpl({
        parsers: [
            cstParserFac.create(WhitespaceParserImpl),
            cstParserFac.create(LiteralParserImpl, { value: '|' }, {
                assign: { $: 'pipe' }
            }),
            cstParserFac.create(UnquotedTokenParserImpl),
        ]
    })
);
sp.add(
    new MergeWhitespacePStratumImpl()
)

Note that the multiline string literal begins with whitespace. It is therefore expected that each token will start on line 1, and ls will start on column 8.

The following is the output of the parser:

[
  {
    '$': 'symbol',
    text: 'ls',
    '$cst': { start: 9, end: 11, line: 1, col: 8 },
    '$source': Uint8Array(2) [ 108, 115 ]
  },
  {
    '$': 'pipe',
    text: '|',
    '$cst': { start: 12, end: 13, line: 1, col: 11 },
    '$source': Uint8Array(1) [ 124 ]
  },
  {
    '$': 'symbol',
    text: 'tail',
    '$cst': { start: 14, end: 18, line: 1, col: 13 },
    '$source': Uint8Array(4) [ 116, 97, 105, 108 ]
  },
  {
    '$': 'symbol',
    text: '-n',
    '$cst': { start: 19, end: 21, line: 1, col: 18 },
    '$source': Uint8Array(2) [ 45, 110 ]
  },
  {
    '$': 'symbol',
    text: '2',
    '$cst': { start: 22, end: 23, line: 1, col: 21 },
    '$source': Uint8Array(1) [ 50 ]
  }
]

No errors were observed in this output, so I can now continue adding more layers to the parser to get higher-level representations of redirects, pipelines, and other syntax constructs that the shell needs to understand.

2023-05-28

Abstracting away communication layers

As of now the ANSI shell layer and terminal emulator are separate from each other. To recap, the ANSI shell layer and object-oriented shell layer are also separate from each other, but the ANSI shell layer current holds more functionality than is ideal; most commands have been implemented at the ANSI shell layer in order to get more functionality earlier in development.

Although the ANSI shell layer and object-oriented shell layer are separate, they are both coupled with the communication layer that's currently used between them: cross-document messaging. This is ideal for communication between the terminal emulator and ANSI shell, but less ideal for that between that ANSI shell and OO shell. The terminal emulator is a web app and will always be run in a browser environment, which makes the dependency on cross-document messaging acceptable. Furthermore it's a small body of code and it can easily be extended upon to support multiple protocols of communication in the future rather than just cross-document messaging. The ANSI shell on the other hand, which currently communications with the OO shell using cross-document messaging, will not always be run in a browser environment. It is also completely dependent on the OO shell, so it would make sense to bundle the OO shell with it in some environments.

The dependency between the ANSI shell and OO shell is not bidirectional. The OO shell layer is intended to be useful even without the ANSI shell layer; for example a GUI for constructing and executing pipelines would be more elegant built upon the OO shell than the ANSI shell, since there wouldn't be a layer text processing between two layers of object-oriented logic. When also considering that in Puter any alternative layer on top of the OO shell is likely to be built to run in a browser environment, it makes sense to allow the OO shell to be communicated with via cross-document messaging.

The following ASCII diagram describes the communication relationships between various components described above:

note: "XD" means cross-document messaging

[web terminal]
    |
   (XD)
    |
    |- (stdio) --- [local terminal]
    |
[ANSI Shell]
    |
  (direct calls / XD)
    |
    |-- (XD) --- [web power tool]
    |
 [OO Shell]

It should be interpreted as follows:

  • OO shell can communicate with a web power tool via cross-document messaging
  • the OO shell and ANSI shell should communicate via either direct calls (when bundled) or cross-document messaging (when not bundled together)
  • the ANSI shell can be used under a web terminal via cross-document messaging, or a local terminal via the standard I/O mechanism of the host operating system.

2023-05-29

Interfacing with structured data

Right now all the coreutils commands currently implemented output byte streams. However, allowing commands to output objects instead solves some problems with traditional shells:

  • text processing everywhere
    • it's needed to get a desired value from structured data
    • commands are often concerned with the formatting of data rather than the significance of the data
    • commands like awk are archaic and difficult to use, but are often necessary
  • information which a command had to obtain is often lost
    • a good example of this is how ls colourizes different inode types but this information goes away when you pipe it to a command like tail

printing structured data

Users used to a POSIX system will have some expectations about the output of commands. Sometimes the way an item is formatted depends on some input arguments, but does not change the significance of the item itself.

A good example of this is the ls command. It prints the names of files. The object equivalent of this would be for it to output CloudItem objects. Where it gets tricky is ls with no arguments will display just the name, while ls -l will display details about each file such as the mode, owner, group, size, and date modified.

per-command outputters

If the definition for the ls command included an output formatter this could work - if ls' standard output is attached to the PTT instead of another command it would format the output according to the flags.

This still isn't ideal though. If ls is piped to tail this information would be lost. This differs from the expected behaviour from posix systems; for example:

ls -l | tail -n 2 > last_two_lines.txt

this command would output all the details about the last two files to the text file, rather than just the names.

composite output objects with formatter + data

A command outputting objects could also attach a formatter to each object. This has the advantage that an object can move through a pipeline and then be formatted at the end, but it does have a drawback that sometimes the formatter will be the same for every object, and sending a copy of the formatter with each object would be redundant.

using a formatter registry

A transient registry of object formatters, existing for the lifespan of the pipeline, could contain each unique formatter that any command in the pipeline produced for one or more of it's output objects. Each object that it outputs now just needs to refer to an existing formatter which solves the problem of redundant information passing through the pipeline

keeping it simple

This idea of a transient registry for unique implementations of some interface could be useful in a general sense. So, I think it makes sense to actually implement formatters using the more redundant behaviour first (formatter is coupled with each object), and then later create an abstraction for obtaining the correct formatter for an object so that this optimization can be implemented separately from this specific use of the optimization.

2024-02-01

StrataParse and Tokens with Command Substitution

note: this devlog entry was written in pieces as I made significant changes to the parser, so information near the beginning is less accurate than information towards the end.

In the "first half" portion of the terminal parser, which builds a "lexer"* (*not a pure lexer) for parsing, there currently exists an implementation of parsing for quoted strings. I have in the past implemented a quoted string parser at least two different ways - a state machine parser, and with composable parsers. The string parser in buildParserFirstHalf uses the second approach. This is what it looks like represented as a lisp-ish pseudo-code:

sequence(
  literal('"')
  repeat(
    choice(
      characters_until('\\' or '"')
      sequence(
        literal('\\')
        choice(
          literal('"'),
          ...escape_substitutions))))
  literal('"'))

In a BNF grammar, this might be assigned to a symbol name like "quoted-string". In strataparse this is represented by having a layer which recognizes the components of a string (like each sequence of characters between escapes, each escape, and the closing quotation mark), and then a higher-level layer which composes those to create a single node representing the string.

I really like this approach because the result is a highly configurable parser that will let you control how much information is kept as you advance to higher-level layers (ex: CST instead of AST for tab-completion checks), and only parse to a certain level if desired (ex: only "first half" of the parser is used for tab-completion checks).

The trouble is the POSIX Shell Command Language allows part of a token to be a command substitution, which means a stack needs to be maintianed to track nested states. Implementing this in the current hand-written parser was very tricky.

Partway through working on this I took a look at existing shell syntax parsers for javascript. The results weren't very promising. None of the available parsers could produce a CST, which is needed for tab completion and will aid in things like syntax highlighting in the future.

Between the modules shell-parse and bash-parser, the first was able to parse this syntax while the second threw an error:

echo $TEST"something to $($(echo echo) do)"with-this another-token

Another issue with existing parsers, which makes me wary of even using pegjs (what shell-parse uses) directly is that the AST they produce requires a lot of branching in the interpreter. For example it's not known when parsing a token whether you'll get a literal, or a concatenation with an array of "pieces" which might contain literals. This is a perfectly valid representation of the syntax considering what I mentioned above about command substitution, but if there can be an array of pieces I would rather always have an array of pieces. I'm much more concerned with the simplicity and performance of the interpreter than the amount of memory the AST consumes.

Finally, my "favourite" part: when you run a script in bash it doesn't parse the entire script and then run it; it either parses just one line or, if the line is a compound command (a structure like if; then ...; done) it parses multiple lines until it has parsed a valid compound command. This means any parser that can only parse complete inputs with valid syntax would need to repeatedly parse (1 line, 2 lines, 3 lines...) at each line until one of the parses is successful, if we wish to mimic the behaviour of a real POSIX shell.

In conclusion, I'm keeping the hand-written parser and solving command substitution by maintaining state via stacks in both halves of the parser, and we will absolutely need to do static analysis and refactoring to simplify the parser some time in the future.

2024-02-04

Platform Support and Deprecation of separate puter-shell repo

To prepare for releasing the Puter Shell under an open-source license, it makes sense to move everything that's currently in puter-shell into this repo. The separation of concerns makes sense, but it belongs in a place called "platform support" inside this repo rather than in another repo (that was an oversight on my part earlier on).

This change can be made incrementally as follows:

  • Expose an object which implements support for the current platform to all the commands in coreutils.
  • Incrementally update commands as follows:
    • add the necessary function(s) to puter platform support
      • while doing this, use the instance of the Puter SDK owned by dev-ansi-terminal instead of delegating to the wrapper in the puter-shell repo via postMessage
    • update the command to use the new implementation
  • Once all commands are updated, the XDocumentPuterShell class will be dormant and can safely be removed.