SSB Log Entry 747

Pigeon Protocol Daily Update, June 3rd 2020

Note to folks just tuning in: I do daily updates related to #pigeon-protocol - a project with similar goals to SSB.

Discussion

Yesterday I had an interesting discussion with @Christian Bundy surrounding serialization formats. It brought up a good question: "Is it worth it to have a custom serialization format?". Pigeon is my response to a project I attempted with SSB a year or so ago. The project ran in an offline environment where Javascript was not appropriate and where data transfer happened over a medium that did not have access to the public internet or DNS.

In attempting to write my own SSB implementation, I noticed that JSON was so flexible that it allowed a lot of room for error. It also allowed the use of arrays and nested objects, which led to things like "dot.notation.to.access.stuff", which I didn't like from an ergonomic standpoint.

The world has a lot of serialization formats. JSON is ubiquitous and well understood. Despite this, I felt that SSB's use of JSON was not a win for discoverability or comprehension. One of the things that has made JSON so popular (and basically the first choice for most projects) is its flexibility. There are no schemas, the delimiters only take up a few bytes, there's no "XML tax" to pay with opening/closing tags, stuff is infinitely nestable, etc…

For a use case like cryptographic signing, I would argue that flexibility is bad and attracts human error while increasing the difficulty of implementation for third party devs.

For those reasons, I decided it would be best to give Pigeon its own serialization format. I wanted to re-invent the wheel with the hopes of creating a round one (quote stolen, sorry not sorry).

The format:

Does not allow the use of flexible lists, which lessens that chance of out-of-order elements and makes life easier for users of languages with manual memory management.
Does not allow nesting of objects within objects, leading to schemas that can be understood in a single glance and which do not allow for infinite scope creep (at some point, you will be forced to create a new schema outside of the current message).
Does not allow the use of comments, which forces developers to either put data in the app or not (as opposed to allowing "magic comments", pragmas and other half baked ideas)
Has a focus on ease of implementation, eg: it's not XML.
Allows for first class implementation of protocol-specific data types. For example, we allow for blob literals instead of forcing everything the be a string that requires application-layer schema validation (the lexer / parser will catch invalid blob/feed/message literals)
Discourages (but does not prevent entirely) ambiguity by having strict rules about order and whitespace (important for signature verification)

With that being said, I think, if this idea is worth pursuing at all, the INI file format is the closest thing to what I want. TOML looked good, but I did not take into account the infinite nesting / array parts of it. It would need to be a strict subset of INI, however. I would not want to allow arbitrary use of whitespace, or the addition of comments which inevitably creeps into out-of-band schemas.

With that, I think I might not investigate third-party serialization formats. Interested to hear feedback, though. Especially if anyone knows of a format that fits the requirements and would reduce the amount of work required to author a third-party pigeon implementation :thinking_face:

Done

Fairly certain the sigil change is complete and the last piece to complete is just updating the test suite.

In Progress

Change @, %, & to FEED., TEXT., FILE., respectively.
Make ED25519 and SHA256 the default and only option for hashing and signing. Deprecate "footers" like .sig.ed25519, .sha256.

TODO

Go back to writing documentation
After the Ruby codebase stabilizes maybe write an implementation in Go that will essentially just be a copy/paste job on the Ruby version. Ruby was great for trying out ideas, but it is not very portable and I've had trouble sharing Ruby code with non-Rubyists in the past. More people would be able to try stuff out if static binaries are available and Go will make that easy while not getting me too bogged down in things like memory management or learning what a borrow checker is.