9P2020 (or 9P200X?)

9P200X

List of ideas for the 9P2000 successor.

Name ideas

Rejected ideas

General / misc changes

Open/create changes

Stat changes

Walk changes

Read changes

Write changes

9P chaining

All messages contain two extra fields: chnbits[1] chntag[2].

If chntag is different from NOTAG, the server promises not to execute the message before the request with the specified tag completes (not necessarily successfully). A message with that tag must have been sent already. If CHNFAIL is set in chnbits, the request is aborted if the original request returned an error, was flushed or aborted.

Aborted requests generate a new Rabort message which includes a reason[1] bitfield.

The server is expected to translate chntag into pointers to the right requests on receipt. Clients can reuse tags as soon as they know they do not wish to issue further requests chained to the original message.

The main problem with this is that the server needs to remember which requests completed successfully. Solution ideas:

9P grouping

(Alternative to chaining)

All messages contain two extra fields: grpbits[1] grptag[2]. Servers guarantee not to reorder messages with identical grptag. If grpbits contains GRPKILL then a failed or flushed message will mark the grptag as killed. If grpbits contains GRPMORTAL then the message is aborted if grptag is killed by a previous message.

NOTAG refers to no group (or rather, each use of it refers to a separate group), i.e. different messages with NOTAG can be reordered freely and grpbits has no effect.

A new message Tclunkgrp[2] resets the kill tag. Servers can assume that the group tag refers to a new group after this request (i.e. the reordering dependency chain is broken).

Chaining / grouping comments

Both chaining and grouping exist to allow clients to send out batch requests. Chaining is potentially more powerful but grouping seems like it might be vastly simpler to implement. To implement chaining all servers and intermediates need to keep track of the last N messages sent.

It's unclear how either would be accessible from user space right now. They can however be used to chain fixed sequences in syscalls, the most common example being walk and open.

9P streaming

size[4] Treads tag[2] {chaining/grouping bits} fid[4] offset[8] count[4] mcount[4]
size[4] Rreads tag[2] final[1] offset[8] count[4] data[count]

Sending this message is equivalent to performing multiple Tread operations of size mcount. The server will continue to perform read operations (equivalent to Tread) until count bytes have been read, an error occurs or EOF has been reached. Each read operation will generate at least one Rreads response. Unlike other requests, this request continues to be active until it is explicitly terminated.

The offset for the read operations starts at offset and increases by the returned number of bytes after each response. Servers must follow this even if the actual read operation ignores the offset field.

As long as the request is active, clients can send another request with the same tag to reset the offset, count and mcount variables to specified values. If offset is –1, offset is unchanged and count is added to the existing count. The server should complete outstanding requests it made with the old values and return them normally.

Unlike Rread, Rreads with an empty count does not carry a special meaning.

FINBNDRY in final denotes a "message boundary" which means that this is the last Rreads response corresponding to an underlying read operation. If there are two message boundaries with no data in betweeen, this constitutes an end-of-file signal. After end-of-file the server must perform no more reads and explicitly terminate the request.

FINDONE in final indicates that the server terminates the request and will send no more responses. This is not an error and does not imply end-of-file has been reached. FINDONE is processed after the rest of the message.

Clients can terminate the request with Tflush. The server may respond with more data before sending Rflush.

If an error occurs, Rerror is sent and the request is terminated.

The server must guarantee that no data is lost, by sending outstanding data before a Rflush or Rerror response if necessary. In this case it is permitted for the server to return up to count+mcount bytes.

size[4] Twrites tag[2] {chaining/grouping bits} fid[4] offset[8] count[4] data[count]
size[4] Rwrites tag[2] offset[8] count[4] window[4]

Twrites functions like Twrite, but the request remains "active" as long as it has not been explicitly terminated. As long as it's active, clients can send further Twrites messages with the same tag, which execute in order.

Clients are required to keep a local window counter and subtract from this counter every time they send a message. The window counter starts at 64 KB and should never become negative. The window field from responses should be added to this counter. Servers can send messages with count=0 to add to the counter.

Servers can terminate Twrites by setting window to –1. This is not an error. Clients can terminate Twrites with Tflush.

Unlike Rwrite, partial writes do not indicate an error. If servers wish to signal an error, they should send a Rerror message.

Streaming comments

Streaming + chaining/grouping allows clients to do "read file" in one round trip (for short files). Longer files are transmitted at potentially the full bandwidth.

Streaming can be implemented in userspace by adding a new ORBUF flag to open(2) that indicates the kernel is allowed to read ahead and a OWBUF flag to indicate the kernel can buffer writes. bio(2) should be able to set this flag.

The idea is that any program can use ORBUF as long as it only ever does fixed reads until EOF is reached.