Thoughts on Canonical S-Expressions
Datashards currently uses Canonical S-Expressions as a data format and after using it for a few months, I have some thoughts.
First things first: If you aren't familiar with the format, let me give you a quick rundown. Canonical S-Expressions are a bit like regular S-Expressions, with a twist. If you already know Lisp, none of this will be new, but for the rest of you, there are two items in an S-Expression- a list and an atom. A list is what it sounds like- a sequence of things. And an atom is a thing. An S-Expressions looks like:
(item1 item2 item3 item4)
[item1, item2, item3, item4]
In Canonical S-Expressions (csexp), every atom is actually a byte object, and we say the size of the byte object by prepending it with the number of bytes, followed by a color:
That's a list of two items, 'hello' and 'world'. I'm putting these in quotes but the values aren't strings, they're bytes. That means it's very efficient to put raw binary data in a csexp. If you put binary data in JSON, you'd have to do something like base64 encode it. No need in csexp!
You can also give a “type hint” in csexp, so if you have a binary object that represents an image, you can stick the mimetype in the csexp, such as:
You can also store other lists inside of a csexp, such as
The good things about Canonical S-Expressions is how darn easy they are to write and to write a parser for. You can write a csexp parser/generator in an afternoon. It's really that easy!
It's also a very efficient format. You can store image data, text data, anything you want!
And it's extremely versatile. The simplicity is the power!
The second problem that I have with csexps is that they're not very useful for describing data. For example in Datashards, we will represent the a file size by an integer, 1000, for example. But in csexp, this is represented as 4:1000 which means that my program has to know to convert the value from bytes to an integer.
I could use type hints for the type of data, such as [int]4:1000 but this doesn't help in practice because the program reading