9P2020 (or 9P200X?)
List of ideas for the 9P2000 successor.
Name ideas
- 9P2020: most boring name (benefit: "hindsight is 2020")
- 9P200X: see picture at top; downside: too much like C1X/C++1X and other monstrosities
- 9PX: sounds like a Harvey invention
- 9P2001: A Spec Oddyssey; cool but given it'd be just 9P2001 most of the time kind of lame.
- anything not starting with 9P: needlessly confusing to the unfamiliar; pretentious (it's more of an iteration of 9P than a new protocol); possibly breaks Tversion assumptions
Rejected ideas
- Tmove: allowing unpriviliged (whatever that means) users to move directories around might break program assumptions, leading to all sort of crazy exploits. probably not worth it. (also the classic argument: why put it in 9p if it doesn't work with most file servers?)
General / misc changes
- Rerror contains an optional
errno
field that clients can use to check against common errors (e.g. EPERM) - alternative: error strings are standardised in the spec, possibly by defining prefixes such as "Eperm:"
- Servers are required to reply to undefined messages with Rerror
Open/create changes
- Tcreate without
OEXCL
functions like Topen if the file exists
Stat changes
- atime and mtime are now 64 bits and in nanoseconds.
- mode
DMSANE
/ qid.typeQTSANE
: the file server promises that Tread respects offset (unlessDMDIR
is set) and has no side effects. - mode
DMCACHE
/ qid.typeQTCACHE
: the file server honors qid version (the file contents are guaranteed to be unchanged if qid version is unchanged) - If some flag (explicit byte in Twstat? funny character in name?) is set, changing the name with Twstat will delete an existing file with the new name, if possible (atomic replace).
If the new name starts with a slash, the file is moved rather than renamed. Support for this is optional.see Tmove- Add stat and error fields to Rwstat; if a server can only do a partial change (which is recommended they should strongly avoid for atomicity reasons), it can signal this by returning a stat field with all successfully changed fields set to -1 and return an error [this change is mostly to support fileservers that translate to another protocol that doesn't make the strong guaratees on attribute changes that 9p2000 makes]
Walk changes
- remove 16 element limit?
- Rwalk contains an error field
- For use with chaining/grouping: A new field badqid[s] is added. It encodes a bitmap of variable length n bits (which has to be a multiple of 8). If the hash of the qid (hash function tbc) modulo n has the corresponding bit set, walks will stop at that qid and fail.
Read changes
EOF
flag inRread
(only used withDMSANE
): another read right now would return 0- Reading a directory with offset –1 overrides the "offset equals last offset plus count" check [maybe get rid of it entirely?]
- Reading a file with offset –1 is equivalent to reading with the last offset + last returned count
Write changes
- Rwrite contains an error field
9P chaining
All messages contain two extra fields: chnbits[1] chntag[2].
If chntag is different from NOTAG
, the server promises not to execute the message before the request with the specified tag completes (not necessarily successfully).
A message with that tag must have been sent already.
If CHNFAIL
is set in chnbits, the request is aborted if the original request returned an error, was flushed or aborted.
Aborted requests generate a new Rabort message which includes a reason[1] bitfield.
The server is expected to translate chntag into pointers to the right requests on receipt. Clients can reuse tags as soon as they know they do not wish to issue further requests chained to the original message.
The main problem with this is that the server needs to remember which requests completed successfully. Solution ideas:
- Servers can respond to a chained request with Rabort and the reason bit ABTDUNNO, implying it does not know about a message with tag chntag. The client should resend the request in that case.
- Servers guarantee to remember the last N (64?) requests in a stream. Clients make sure they don't chain requests longer than this. This complicates intermediates because they need to make similar guarantees.
- Messages also include a reference count, indicating the number of future chained messages the server can expect.
9P grouping
(Alternative to chaining)
All messages contain two extra fields: grpbits[1] grptag[2].
Servers guarantee not to reorder messages with identical grptag.
If grpbits contains GRPKILL
then a failed or flushed message will mark the grptag as killed.
If grpbits contains GRPMORTAL
then the message is aborted if grptag is killed by a previous message.
NOTAG
refers to no group (or rather, each use of it refers to a separate group), i.e. different messages with NOTAG
can be reordered freely and grpbits has no effect.
A new message Tclunkgrp[2] resets the kill tag. Servers can assume that the group tag refers to a new group after this request (i.e. the reordering dependency chain is broken).
Chaining / grouping comments
Both chaining and grouping exist to allow clients to send out batch requests. Chaining is potentially more powerful but grouping seems like it might be vastly simpler to implement. To implement chaining all servers and intermediates need to keep track of the last N messages sent.
It's unclear how either would be accessible from user space right now. They can however be used to chain fixed sequences in syscalls, the most common example being walk and open.
9P streaming
size[4] Treads tag[2] {chaining/grouping bits} fid[4] offset[8] count[4] mcount[4]
size[4] Rreads tag[2] final[1] offset[8] count[4] data[count]
Sending this message is equivalent to performing multiple Tread operations of size mcount. The server will continue to perform read operations (equivalent to Tread) until count bytes have been read, an error occurs or EOF has been reached. Each read operation will generate at least one Rreads response. Unlike other requests, this request continues to be active until it is explicitly terminated.
The offset for the read operations starts at offset and increases by the returned number of bytes after each response. Servers must follow this even if the actual read operation ignores the offset field.
As long as the request is active, clients can send another request with the same tag to reset the offset, count and mcount variables to specified values. If offset is –1, offset is unchanged and count is added to the existing count. The server should complete outstanding requests it made with the old values and return them normally.
Unlike Rread, Rreads with an empty count does not carry a special meaning.
FINBNDRY
in final denotes a "message boundary" which means that this is the last Rreads response corresponding to an underlying read operation.
If there are two message boundaries with no data in betweeen, this constitutes an end-of-file signal.
After end-of-file the server must perform no more reads and explicitly terminate the request.
FINDONE
in final indicates that the server terminates the request and will send no more responses.
This is not an error and does not imply end-of-file has been reached.
FINDONE
is processed after the rest of the message.
Clients can terminate the request with Tflush. The server may respond with more data before sending Rflush.
If an error occurs, Rerror is sent and the request is terminated.
The server must guarantee that no data is lost, by sending outstanding data before a Rflush or Rerror response if necessary. In this case it is permitted for the server to return up to count+mcount bytes.
size[4] Twrites tag[2] {chaining/grouping bits} fid[4] offset[8] count[4] data[count]
size[4] Rwrites tag[2] offset[8] count[4] window[4]
Twrites functions like Twrite, but the request remains "active" as long as it has not been explicitly terminated. As long as it's active, clients can send further Twrites messages with the same tag, which execute in order.
Clients are required to keep a local window counter and subtract from this counter every time they send a message. The window counter starts at 64 KB and should never become negative. The window field from responses should be added to this counter. Servers can send messages with count=0 to add to the counter.
Servers can terminate Twrites by setting window to –1. This is not an error. Clients can terminate Twrites with Tflush.
Unlike Rwrite, partial writes do not indicate an error. If servers wish to signal an error, they should send a Rerror message.
Streaming comments
Streaming + chaining/grouping allows clients to do "read file" in one round trip (for short files). Longer files are transmitted at potentially the full bandwidth.
Streaming can be implemented in userspace by adding a new ORBUF
flag to open(2) that indicates the kernel is allowed to read ahead and a OWBUF
flag to indicate the kernel can buffer writes.
bio(2) should be able to set this flag.
The idea is that any program can use ORBUF
as long as it only ever does fixed reads until EOF is reached.