emacsen

EDIT March 26th I've edited this post for clarity since my original point was lost in some of the details. I've also provided more citations.

Introduction

Just over a year ago, Chris Webber gave a talk at CopyleftConf about how the AGPL is incompatible with a style of computing.

If you want to read the slides, they're at: https://dustycloud.org/misc/boundaries-on-network-copyleft.pdf

Sadly there hasn't been much discussion about it since, so I'm going to throw my hat into this rodeo- or some metaphor to that effect.

Before we wrestle with bulls, let's talk about the goal of the AGPL and why it's important in the Free Software ecosystem.

As most people reading this probably already know, the GNU GPL is a license that says that if you have a program, you're entitled to use it, copy it and modify it and that if you distribute it to others, you must do so under the same terms that you received it. It's “Share and Share Alike”

But what does this mean when we have applications that run remotely, such as web applications where executing the program means executing code on someone else's computer? The AGPL states that if you release a program under the AGPL and make it available to others that they have the same obligation to release it to others, whether you release the program as a binary or make it accessible for execution over a network.

This is a good thing in my opinion. Running a program in a networked way to get around the GPL is an anti-social thing to do.

With that out of the way, let's dive in.

A simple program

Let's first begin with the idea of a program where state is captured inside execution, rather than in variables. If you know what a closure is, then you can skim or skip this part.

If you don't know what a closure is, you might be wondering what the heck I'm talking about, but it's really not that hard to imagine. Let's take an example from Chris's own work

Chris wrote their code in Scheme. I think the use of a Lisp can lead people to come to the conclusion that this is somehow a Lisp related issue, so I'm going to write my code in Python in order to show that the issue is universal.

Chris proposes that some programs may contain private data but at the same time be stateless. This was hard for me to wrap my head around at first, but we can write a program like this fairly easily:

   def make_greeter(greeter_name):
        return lambda guest_name: print(f"Hi {guest_name}, I'm {greeter_name}!")

With this, we can construct a greeter named Alice

    alice = make_greeter("Alice")
    alice("Bob)

And we'd get back “Hi Bob, I'm Alice”. What's important here is that the alice function doesn't maintain state. The “Aliceness” is constructed at the time the function is defined.

The data in this case is actually the “Bob” string and not the “Alice” string. The “Alice” string is part of the alice function's executable code.

It's a nifty trick, but it has some deeper implications.

Turning our program into a service

Imagine that instead of being generated on the Python shell, there was some external database, and instead of just being a name, the function also contained private information.

Let's rewrite our program with that in mind. We'll create a database of people and their favorite colors.

   db = {
        'alice': 'red',
        'bob': 'blue'}

    def make_person(name, color):
        return lambda guest_name: print(f"Hi {guesprogramming model.t_name}, I'm {name} and I like {color}")

    people = [make_person(*record) for record in db.items()]

Remember, our secrets aren't contained within our database- they're contained within the functions themselves. While this example is trivial, we're starting to see how this could become interesting.

Let's up the ante a bit by turning this into a network application.

    from flask import Flask, abort, request
    app = Flask(__name__)

    db = {
        'Alice': 'red',
        'Bob': 'blue'}

    def make_person(name, color):
        return lambda guest_name: f"Hi {guest_name}, I'm {name} and I like {color}.\n"

    people = {name: make_person(name, color) \
              for (name, color) in db.items()}

    @app.route('/<person>')
    def show_greeting(person):
        guest = request.args.get('guest')
        return people[person](guest)
        abort(404)

And run it:

    serge@laptop:~$ curl http://localhost:5000/Alice?guest=Bob
    Hi Bob, I'm Alice and I like red.

Nifty, but not especially different from the previous example, except as it applies to the AGPL.

We can take this example in one of two directions, both of which I believe breaks the AGPL.

The first is that we might imagine the database contains some other secrets, but that we're encoding these secrets as code. Let's imagine that we have a service that lets doctors and other services that we explicitly permit to have access to health-related data about us.

As privacy-oriented developers, we may want to self-host this application. I certainly feel better about running my own services, especially where sensitive/private data is concerned.

As far as the standard GPL is concerned, this is no problem. My private version of my application that only runs on my computer is entirely mine. But the AGPL is different- the network accessibility of the service places the program under the same distribution terms as we would have if we were to distribute the program.

Configuration as Code

How realistic is this scenario of using code for configuration? It's far more common than you might originally think. As Chris's talk points out, it's extremely common in Lisp to use this method- but it's not limited to Lisp by any means. Several popular Python web frameworks use a config.py file, and PHP developers use config.php.

This is because while the licenses do not pertain to running environments, these configuration systems turn the configuration “data” into running an executable. That is distinct from, for example, pulling data from a YAML or config.ini file because in a config.py file, the file is being interpreted as code and becoming part of the program itself.

This is largely a non-issue because in a vast majority of cases there is a distinction between the types of static variables placed inside a configuration file and the dynamic code that's inside the program files, but this doesn't have to be the case. It's possible to write configuration that contains executable code, and if that executable code modifies the behavior of the application itself, then it is indistinguishable from program code.

Does this mean you can't write a Python application that uses config.py or a PHP program that uses config.php under the AGPL? In most cases, the difference between simply storing a variable statically inside one file or another would not make a difference, but as the complexity of configuration may grow to include functionality, that line begins to blur, and while I'm not a lawyer, I believe that without relicensing the configuration files, the answer is that if your configuration is sufficiently complex that it is indistinguishable from code that you will need to publish it as code under the AGPL.

Obviously this is not the intent of the AGPL, and this specific scenario is easily remedied by separating out and separately licensing the config files, but this is a conscious action that the developer must take.

Plugins

Let's take on a more complex version of this problem: What happens when applications are not simply monolithic, stand-alone things, but when they include components that are external in some way?

Chris in a reddit reply to this post, mentions browsers- so let's use that as an example. If you're reading this, you most likely are doing so on a web browser. You're also likely to have one or more plugins. Plugins are application logic that extends the functionality of your application in some way. The plugins may be under a variety of licenses- anything from extremely permissive to entirely proprietary.

If your browser is under the GPL, the waters become very murky as it relates to the licensing requirements of plugins. Wordpress, the popular CMS and blogging platform, has stated that Wordpress plugins should, (or possibly must) be released under the GPL. That is because a plugin is not a stand-alone work. A plugin depends on the Wordpress application framework, and thus plugins are derived (or as GPLv3 calls it, “based on”) the original program.

For GNU GPL applications, this is a bit of an oddity, as while Wordpress may require plugins be under the GPL, they cannot compel users running proprietary plugins to provide source code to them. With the AGPL, a network user of the program has the same rights as person downloading the program.

This is a lot to take in, but we're not quite done yet. In Spritely Goblins, the system Chris is developing, there is no distinction between a local program execution and one that runs on the network. While some developers may be used to thinking about remote procedure calls and remote APIs, the Goblins model makes this distinction largely invisible to the user and even the developer- program logic may be run locally, on a nearby server owned by the same person, or halfway around the world by someone, they've never met.

Goblins, by design, erases the distinction for a programmer about whether the code being run is internally or externally. It erases the distinction for a programmer about whether or not the code is being run at arm's length.

Under the GPL, this is no problem- network services are at arm's length and thus there's no problem with integrating your GPLed internal code with some external proprietary service. But under the AGPL, network services are explicitly included.

A brief review

...That was a lot to cover, so let's review briefly.

  • Some programs are going to be Free Software, but contain “proprietary parts” because they need to for privacy reasons.

  • Plugins that are written for an AGPLed system must be AGPLed, even if they operate across the network

  • Therefore we have an impedance mismatch between the intent of the AGPL (to protect Software Freedom) and personal privacy, which is amplified on a system that makes no distinction between local and network code

In the land of tomorrow...

Now that this is covered, let's get weird...

Spritely Goblins has the potential to do more than just provide remote procedure calls for remote applications- it's designed so that it could also take object code and safely execute it locally.

This may seem strange at first, but a longer-term goal of Spritely appears to be to take in-memory object code and ship it to another machine where it can be safely executed. I use the adjective “apparently” here because I don't see mention of this in the Spritely docs, but it is something Chris and I have discussed privately.

In terms of functionality, this is extremely powerful, but it gets complicated when we talk about source code requirements. As people who have done work in the field of Reproducible Builds know, making software reproducible is not trivial, and if instead of shipping object code, we had to ship source code around, this would be a large burden on the recipient system to then need to not only build the source but possibly also to replicate the remote environment.

Even if we were able to replicate the remote build environment for every single program we might encounter, requiring us to build software just to use it is a high barrier of entry. We in the Free Software world most often distribute programs through binaries because we know what a burden it would be to require every program to be compiled.

Even if we could build every program, it might be practically impossible to do so. We are seeing the beginning of artificial intelligence systems that build models or sometimes build software itself. Models, or software built by artificial intelligence is replicable but is impractical to replicate by virtue of its sheer size.

In a system like Spritely Goblins, the peer-to-peer network design allows us to integrate programs into our own safely by using the OCAP security model. With the security addressed, and the ability to run code either remotely or locally from anywhere, the possibilities for computing start to seem infinite, but if we had to build every single program we encountered, it would be a major wet blanket.

Where does this leave us?

I care deeply about software and user freedom. Heck, I do a podcast about it with Chris. I've mentioned on multiple episodes that Free Software has saved my life. It's a part of me and important.

The goal of the AGPL is noble, and I agree with it, but it's clearly not compatible with the type of programming that is coming down the pike.

So what do we do?

Chris's suggestion is that the GPL is sufficient, but I don't agree.

Instead, I think that we need to capture the spirit of the AGPL is a new license or new revision of the AGPL that can accommodate this new model.

Let the discussion begin!

Listening to the news about the Democratic party can be disheartening at best. This week a story in the New York Times came out discussing how DNC leadership is willing to disenfranchise up to half the party in order to prevent Bernie Sanders from getting the nomination.

They claim that this is in order to solidify a win. They claim that it's the swing voters that they're courting and that those voters would never vote for Sanders. It's policies, they claim, or occasionally they'll claim it's those mid-westerners and their anti-Semitism, usually while engaging in anti-Semetic tropes.

Meanwhile, On the Media put out a story this weekend about the disenfranchized progressive voter, just how many progressives are turned away from voting, or vote for a third party rather than vote for a moderate.

On its face, these two situations don't reconcile. The Democratic Party must want to in, mustn't it? Instead of courting Republicans who might somehow be persuaded to vote for a Democrat (despite Trump's 80% approval rating amongst Republicans) why wouldn't they work to energize the voter base- to register more underprivileged, undercounted, underrepresented people and energize the youth?

Why wouldn't the DNC want to show the country that Trump is wrong in his “Do nothing Democrats” taunt, that the Democratic Party does have a grand vision as a counter to the grand vision of Republicans?

The answer is simpler than it seems... The DNC's fear-mongering about Sanders not being a viable candidate is not for Republicans or the moderates amongst its ranks, but rather they themselves.

We see this reflected not just in political circles, but the corporate “liberal media” where Sanders is consistently painted in a negative light even on self-described liberal news outlets.

The fact is that the critique that many Republicans have had over the hypocrisy of the Democratic party is real. There is a “Limousine Liberal” with a vested interest in the status quo, who decries Trump's “Make America Great Again” slongan, but who pines for the days of the Clinton era, where public programs were cut, but since corporate growth was high, only poor/brown people noticed.

Sanders makes the DNC uncomfortable because he forces the Democratic party to come face to face with the reality that it isn't for poor people, brown people or the youth, but rather to keep things simmering just enough below the surface to keep the lid from popping off.

With Trump in office, the lid has popped off and now the DNC leadership is scrambling to figure out how to keep control of the narrative. They've invented a make-believe voter, a Joe or Jane Republican who watches Fox News but will be persuaded to vote for a “moderate” Democrat.

It's time for the DNC leadership to get honest with itself and the American people. The Democratic coalition is breaking apart at the seams. The party is split between two very different ideas, one where we dream of the 90s and the other where we live in the present and present the people with a comprehensive plan to enact sweeping changes that will save our children, help heal our environment and repair our decaying infrastructure.

I've lived through the 90s and I don't want another Bill Clinton. I want another Franklin Rosevelt.

Datashards is finally getting traction in the world and so it's time to reflect on where we are and where the project is going.

Datashards is a project that offers up a new storage primitive for secure data storage and transmission. With Datashards the data at rest is encrypted and also protected against data shape attacks. Datashards is designed to work either online or offline and even lets you store your data on someone's machine even if you don't trust them.

Datashards has the opportunity to be an entirely transformational technology in terms of being able to safely store and transmit data.

We've already proven the concept works and we can implement it in multiple languages as we have Fixed Datashards (previously Immutable Datashards) implemented in Racket and Python, and we have Updatable Datashards (previously Mutable Datashards) in Racket.

In the next few months, we'll be working to get Updatable Datashards implemented in Python.

We're also working with a talented and dedicated software developer to get a Javascript implementation of Datashards (both Fixed and Updatable), which we hope will open up many new opportunities.

We will be highlighting these libraries on the Datashards website, along with documentation on how Datashards works and implementation guidelines.

In even more exciting news, we're starting work on a protocol built on top of Datashards designed to enable Datashards servers to communicate.

Datashards is a storage primitive. In that way, it's a bit like the concept of a file- useful as a concept but without implementations and application, nothing more than an interesting idea. The protocols that we're building on top of Datashards are akin to a filesystem built on top of those primitives and that will allow developers to build interesting things using Datashards.

In order to accomplish this task well, Chris and I have been working with possible users of the technology as well as spending time researching similar systems in the past, as well as various peer to peer messaging technologies and patterns in order to build something that is pratical, scalable and build on solid engineering principles.

Thoughts on Canonical S-Expressions

Datashards currently uses Canonical S-Expressions as a data format and after using it for a few months, I have some thoughts.

First things first: If you aren't familiar with the format, let me give you a quick rundown. Canonical S-Expressions are a bit like regular S-Expressions, with a twist. If you already know Lisp, none of this will be new, but for the rest of you, there are two items in an S-Expression- a list and an atom. A list is what it sounds like- a sequence of things. And an atom is a thing. An S-Expressions looks like:

(item1 item2 item3 item4)

If you're familiar with Python or Javascript, you can think of that as the same as:

[item1, item2, item3, item4]

In Canonical S-Expressions (csexp), every atom is actually a byte object, and we say the size of the byte object by prepending it with the number of bytes, followed by a color:

(5:hello5:world)

That's a list of two items, 'hello' and 'world'. I'm putting these in quotes but the values aren't strings, they're bytes. That means it's very efficient to put raw binary data in a csexp. If you put binary data in JSON, you'd have to do something like base64 encode it. No need in csexp!

You can also give a “type hint” in csexp, so if you have a binary object that represents an image, you can stick the mimetype in the csexp, such as:

([image/jpeg]1024:)

You can also store other lists inside of a csexp, such as

(9:groceries(4:milk5:bread))

The Good

The good things about Canonical S-Expressions is how darn easy they are to write and to write a parser for. You can write a csexp parser/generator in an afternoon. It's really that easy!

It's also a very efficient format. You can store image data, text data, anything you want!

And it's extremely versatile. The simplicity is the power!

The Bad

The worst problem I have with csexp is that despite its simplicity, if you want to use it, you're probably going to end up writing your own parser/generator for it. I found a library for Python 2.7, but it didn't work for Python 3, so I had to write my own. My friend Chris Webber wre the implementation for Racket. As of the time of writing, I don't know of an implementation for Javascript, Ruby, Golang or Rust. Writing your own library for something this fundamental isn't fun, even if it's not hard.

The second problem that I have with csexps is that they're not very useful for describing data. For example in Datashards, we will represent the a file size by an integer, 1000, for example. But in csexp, this is represented as 4:1000 which means that my program has to know to convert the value from bytes to an integer.

I could use type hints for the type of data, such as [int]4:1000 but this doesn't help in practice because the program reading

Datashards currently uses Canonical S-Expressions as a data format and after using it for a few months, I have some thoughts.

First things first: If you aren't familiar with the format, let me give you a quick rundown. Canonical S-Expressions are a bit like regular S-Expressions, with a twist. If you already know Lisp, none of this will be new, but for the rest of you, there are two items in an S-Expression- a list and an atom. A list is what it sounds like- a sequence of things. And an atom is a thing. An S-Expressions looks like:

(item1 item2 item3 item4)

If you're familiar with Python or Javascript, you can think of that as the same as:

[item1, item2, item3, item4]

In Canonical S-Expressions (csexp), every atom is actually a byte object, and we say the size of the byte object by prepending it with the number of bytes, followed by a colon:

(5:hello5:world)

That's a list of two items, 'hello' and 'world'. I'm putting these in quotes but the values aren't strings, they're bytes. That means it's very efficient to put raw binary data in a csexp. If you put binary data in JSON, you'd have to do something like base64 encode it. No need in csexp!

You can also give a “type hint” in csexp, so if you have a binary object that represents an image, you can stick the mimetype in the csexp, such as:

([image/jpeg]1024:<bytes>)

You can also store other lists inside of a csexp, such as:

(9:groceries(4:milk5:bread))

What I Like

There's a lot to like about Canonical S-Expressions. They're extremely space efficient, very flexible and super easy to parse. Writing a reader for a csexp is fairly trivial. And even if your language doesn't already have a csexp library, you can easily write one in a day, if not an afternoon.

The other thing I like about Canonical S-Expressions is that they do what they claim to do and nothing else. They're a binary format that only does byte strings and lists.

What's Not to Like About Canonical S-Expressions

Working with CSEXP data can be a pain. You're always stuck writing a reader for your data. Your reader will take the resulting abstract parsed data and convert it into something your application will actually consume. In some cases this conversion is easy, 3:100 becomes the integer 100. If you want to store more complex data structures, such as associative arrays, however, then you'll need to think about it.

Since CSEXP doesn't have associative arrays, only lists, you'll end up writing the serialization/deserialization format on your own. You could store them as lists of lists, ((key val) (key val)) or the more compact form of (key val key val) or you could (ab)use the hint system, such as ([key]value [key]value). Whatever choice you make, it will be specific to your application and someone who reads the document will need to think about the choices you made beforehand. Or if you're inheriting data in this format, you may end up having to guess at the meaning of the data structure.

This type of step is necessary for many serialization formats. In some, like Protobufs, it's a requirement. In XML, it was not strictly necessary but almost always done, and in some applications using JSON, it may not be necessary at all.

Canonical S-Expressions occupy a strange middle ground where having a formal schema is not strictly necessary, as it's schema-less, but it's also challenging to work without one.

Flexible (Schema-less) Data Serialization Formats

Flexible data formats are a topic of deep discussion and debate. In the 90s, it seemed that the world had converged on XML as the One Format to Rule Them All. The problem with XML is that even though the format is self-documenting in some ways, ie <tag></tag>, the value inside tags needed to be converted during a secondary reader, separate from the parser.

Since this distinction isn't always clear, the parser parses the raw data into a machine readable data structure, while a reader parses the data (usually post-parsed) into application specific data structures.

Canonical S-Expressions have the same problem in regards to needing a reader that XML does, but unlike XML, you don't have the storage or bandwidth issues of the tags.

JSON seems to have won out the generic data format wars by offering some types, making writing a reader trivial (or in some cases, unnecessary) but anyone who has ever had to work with JSON knows that its thin layer of types is misleading. As an example, “How do you store a date in Javascript?”

You could store it as Unix time, seconds after the epoch, or you could store it in an ISO 8601 formatted string, ie "2008-09-15T15:53:00+05:00" or an RFC 822 date format, or something else entirely. Your parser will happily give you a string, but you're stuck needing a reader to do that final conversion, just like you did with XML.

JSON-LD solves some of this by giving your values semantic meaning, but it makes the parser more complex.

And neither XML nor JSON handle binary data well. To store binary data in either format, you must first convert it to Base64, which introduces an enormous amount of storage and transmission overhead.

Canonical S-Expressions offers none of the overhead of XML and doesn't claim to do type conversions. Since you'll need a reader anyway, you can do your type conversions in that step.

Further Thoughts and Alternatives

In practice, having some type data assistance does offer benefits. It makes your reader simpler, and it makes the format more pleasant to work with, and so while I appreciate cxesp's simplicity, I find working with it to be more challenging than it should be.

One thought that I keep having while I'm using csexp is to use the type hints to store information such as the data type. Imagine if instead of:

20:2019-10-02T07:11:07Z

We instead stored:

[iso8601]20:2019-10-02T07:11:07Z

That would give us the data type and we could let the reader take some of the work off of our programming logic. This is similar to JSON-LD's method of storing semantic data.

I personally like this idea, but it requires changes to the readers to recognize a new “Semantic Canonical S-Expression”.

A simpler idea would be to store some type information alongside the data, so instead of 3:253, you might store I3:253, with “I” indicating that the value is an integer. This is exactly what the Bencoding format does. Bencoding offers many of the same benefits of CSEXP, but because it also supports types, is a bit easier to work with. The downside, as always, is that this helpfulness comes at the cost of storage and bandwidth.

Other formats exist as well. I previously mentioned Bencoding, but there is also MessagePack, ASN.1, CBOR, and the newest, Preserves. Each of these has a different approach, though they center around the same problem- making it easy to store arbitrary data, especially binary data, on disk and on the network.

It's beyond the scope of this post to delve into each of them. I think Preserves is the most interesting of the formats. It's certainly the most expressive despite being compact, but since I haven't used it I don't know if that expressiveness will be something I need or if I could simply use Bencoding or MessagePack to the same effect.

Conclusion

Canonical S-Expressions are a great, flexible, compact data format. It's very fast and efficient. If you have straightforward needs, it's certainly worth checking out. In my use case, Datashards, it fits our current needs. If we end up wanting to store more complex data structures in the format, such as associative arrays, that will be the time to re-evaluate the format choice to see if something else would be a better fit.

On Long-Form Blogging

In 1995, I got my first taste of the World Wide Web. That's a funny thing to think about now, but at the time it was very new and most websites that I found were weird, off the wall, and amazingly amateurish. I found sites about Bonsai Kittens, connecting soda machines to the internet, lucid dreaming and a bunch of vanity websites from people just wanting other people to know they existed.

In 1997, I ran my very own website from my dorm room. It was thanks to Microsoft Personal Web Server, and it let my humble desktop PC present me to the world. I used it to host essays for school... before it crashed.

In the early 2000s, I found LiveJournal and at the time, LiveJournal filled the same role in my life that the Fediverse does now. I had real life friends who followed me on LiveJournal. I had friends from LiveJournal, I met people through people on LiveJournal and was exposed to new thoughts, ideas and experiences through reading about others' lives.

I loved it so much, I was not only a paid subscriber, but I paid for a lifetime membership. ...Until the site was bought out by a Russian company and I closed my account.

When Twiter came onto my radar, it was through geeky friends who had seen it at a conference. It was a Rails project and it was a bridge to SMS texting. It felt more like IRC than blogging. Blogging was at least a few paragraphs, and they spoke to something about the person's experience. They might be personal, or technical, but they felt intimate and connecting. Twitter was 140 characters.

Mastodon made a choice to be 500 characters, which was more than three times better! But as time has gone on, I've found myself writing posts that span three, four or five toots. This isn't a limitation of ActivityPub- it's a design choice of Mastodon itself to limit itself to microblogging.

But I miss blogging, and if Medium has taught me anything, it's taught me that other people miss it too, and they're even willing to put up with Medium to have it!

So I'm using Write Freely/Write.as to blog again. With ActivityPub, people can subscribe to my posts just as easily as they could on LiveJournal, either from ActivityPub or RSS. And who knows, maybe this whole thing will take off and I'll be able to feel like I really know people's thoughts and feeling again. Maybe we can bring the humanity back to social networking.