Room 101: Bitrot Revisited: Local First Software and Orthogonal Synchronization

This post is based on a invited talk I gave recently at the Programming 22 conference.

The talk wasn't recorded but I've recorded a reprise at: https://youtu.be/qx6ekxXdidI

The definition of insanity not withstanding, I decided to revisit a topic I have discussed many times before: Objects as Software Services. In particular, I wanted to relate it to recent work others have been doing.

The earliest public presentation I gave on this at the DLS in 2005. There are recordings of talks I gave at Google and Microsoft Research, as well as several blog posts ( March 2007, April 2008, April 2009, January 2010 and April 2010). You can also download the original write up.

The goal is software that combines the advantages of old-school personal computing and modern web-based applications. There are two parts to this.

First, software should be available at all times. Like native apps, software should be available even if the network is slow, unreliable or absent, or if the cloud is otherwise inaccessible (say due to denial-of-service, natural disaster, war or whatever). And, like a cloud app, it should be accessible from any machine at any location (modulo network access if it hasn't run there before). Recently, this idea has started to get more attention under the name local-first software.

Tangent: I've proposed a number of terms for closely related ideas in older posts such as Software Objects, Rich Network Enabled Clients, Network Serviced Applications and Full-Service Computing, but whatever name gets traction is fine with me.

Second, software should always be up-to-date (this is where Bitrot comes in). That means we always want to run the latest version available, just like a web page. This implies automatically updating application code over the network without disrupting the end-user. This latter point goes well beyond the idea of local-first software as I've seen it discussed.

Let's take these two goals in order.

For offline availability, one has to store the application and its data locally on the client device. However, unlike classical personal computing, the data has to be made available, locally, to multiple clients. Now we have multiple replicas of our data, and they have to be kept in sync somehow. My proposal was to turn that responsibility over to the programming language via a concept I dubbed Orthogonal Synchronization. The idea was to extend the concept of orthogonal persistence, which held that the program would identify which fields in every data structure were deemed persistent, and the system would take care of serializing and deserializing their contents, recursively. With orthogonal synchronization, the data would not only be persisted automatically, but synchronized.
To keep the software up-to-date without disrupting the user, we want good support for dynamic software update. When the application code changes, we update the app live. How do we know when the code changes? Well, code is just data, albeit of a particular kind. Hence we sync it, just like any other persistent data. We reuse much of the same orthogonal synchronization mechanism, and since we sync both code and data at the same time, we can migrate data seamlessly whenever the code and data format changes. As I've discussed in the past, this has potentially profound implications for versioning, release cycles and software development. All this goes well beyond the focus of local-first software, and is way outside the scope of this post. See the original materials cited above for more on that aspect.

There's only one small problem: merge conflicts. The natural tendency is to diff the persistent representations to compute a set of changes and detect conflicts. An alternative is to record changes directly, whenever setters of persistent objects are called. Either way, we are comparing the application state at the level of individual objects. This is very low level; it is an extensional approach, which yields no insight into the intention of the changes. As an example, consider a set, represented as an array of elements and an integer indicating the cardinality of the set. If two clients each add a distinct object to the set, we find that they both have the same set object, but the arrays differ. The system has no way to resolve the conflict in a satisfactory manner: choosing either replica is wrong. If one understands the intention of the change, one could decide to resolve the conflict by performing both additions on the original set.

Local first computing approaches this problem differently. It still needs to synchronize the replicas. However, the problem of conflicts is elegantly defined away. The idea is to use Conflict-free Replicated Data Types (CRDTs) for all shareable data, and so conflicts cannot arise. This is truly brilliant as far as it goes. And CRDTs go further than one might think.

CRDT libraries record intentional changes at the level of the CRDT object (in our example, the set, assuming we use a CRDT implementation of a set); sync is then just the union of the change sets, and no conflicts arise. However, the fact that no formal conflict occurs does not necessarily mean that the result is what we actually expect. And CRDTs don't provide a good solution for code update.

Can we apply lessons from CRDTs to orthogonal synchronization? The two approaches seem quite contradictory: CRDTs fly in the face of orthogonal persistence/synchronization. The 'orthogonal' in these terms means that persistence/synchronization is orthogonal to the datatype being persisted/synced. You can persist/sync any datatype. In contrast, using CRDTs for sync means you have to use specific datatypes. One conclusion might be that orthogonal sync is just a bad idea. Maybe we should build software services by using CRDTs for data, and structured source control for code. However, perhaps there's another way.

Notice that the concept of capturing intentional changes is distinct from the core idea of CRDTs. It's just that, once you have intentional changes, CRDTs yield an exceptionally simple merge strategy. So perhaps we can use orthogonal sync, but incorporate intentional change data and then use custom merge functions for specific datatypes. CRDTs could fit into this framework; they'd just use a specific merge strategy that happens to be conflict-free. However, we now have additional options. For example, we can merge code with a special strategy that works a bit like traditional source control (we can do better, but that's not my point here). As a default merge strategy when no intent is specified, we could treat setter operations on persistent slots as changes and just ask the user for help in case of conflict. We always have the option to specify an alternate strategy such as last-write-wins (LWW).

How might we specify what constitutes an intentional change, and what merge strategy to use? One idea is to annotate mutator methods with metadata indicating that they are changes associated with a given merge strategy. Here is what this might look like for a simple counter CRDT:

class Counter = (| count ::= 0. |)(
public value = (^count)
public increment (* :crdt_change: *) = (
count: count + 1
)
public decrement (* :crdt_change: *) = (
count: count - 1
)))
The metadata tag (crdt_change in this case) identifies a tool that modifies the annotated method so that calls are recorded as change records with salient information (name of called method, timestamp, arguments) as well as a merge method that processes such changes according to a standardized API.

Now, to what extent is this orthogonal sync anymore? Unlike orthogonal persistence, we can't just mark slots as persistent and be done; we have to provide merge strategies. Well, since we have a default, we can still argue that sync is supported regardless of datatype. Besides, quibbling over terminology is not the point. We've gained real flexibility, in that we can support both CRDTs and non-CRDTs like code. And CRDT implementations don't need to incorporate special code for serialization and change reporting. The system can do that for them based on the metadata.

I've glossed over many details. If you watch the old talks, you'll see many issues discussed and answered. Of course, the proof of the pudding is in creating such a system and building working applications on top. I only managed to gather funding for such work once, which is how we created Newspeak, but that funding evaporated before we got very far with the sync problem. Sebastián Krynski worked on some prototypes, but again, without funding it's hard to make much progress. Nevertheless, there is more recognition that there is a problem with traditional cloud-based apps. As the saying goes: this time it's different.

7 comments:

patrickdlogan said...: I'm trying to build up my stamina for blogging. Here's a small intro and pointer to Paul Dourish's thesis on divergence and synchrony.

https://patrickdlogan.blogspot.com/2022/05/open-implementation-and-flexibility-in.html; 5/13/2022 7:06 PM
Gilad Bracha said...: Thanks for this!; 5/16/2022 8:35 AM
John Cowan said...: Sometimes you don't want the latest version of the application, because the programmer has broken it either functionally or UI-wise. The top complaint I hear from non-professional app users is "The latest version has changed everything, and I don't know how to use it any more." Treating the local cache as first-class helps to solve this problem, as the user gets to say that for themselves the cached version is the current version. (Obvs this won't work if the client-server or peer-to-peer protocol has changed.); 7/27/2022 6:38 AM
Gilad Bracha said...: Indeed, the idea that applications are always up-to-date is not a tenet of local-first. On the contrary, one of the suggested benefits is that users can keep their old apps unchanged, as you say. My approach differs in that respect.

There is a fundamental tension here between collaboration and autonomy. Local first seeks to resolve this, but if you don't update your code, the ability to collaborate is likely to atrophy, as collaborators end up with distinct versions that may not be able to agree on the kinds of data they share and how they are represented. You are back to versioning hell. The web avoids that, and that is an advantage I wish to keep.; 8/03/2022 11:10 AM
Bernhard Pieber said...: I feel that for CRDTs to work really – in the sense of resulting in sensible merges – you will need specialized/domain-specific CRDTs for different domains. A good example is Peritext, a CRDT for rich text: https://www.inkandswitch.com/peritext/

So why not use s CRDTs for code, with refactoring operations as the intentional changes?; 10/12/2022 10:28 PM
Gilad Bracha said...: Bernhard, sorry I missed your comment, so replying 2 years late :-(. You make a good point. Refactoring operations might make for a CRDT for code. The problem being that not all changes are refactorings. It's not clear what the UI would be etc. But worth exploring.; 7/25/2024 11:18 AM
Bernhard Pieber said...: Hi Gilad, no worries. :-) The topic is still as relevant (and unresolved) as two years ago. With respect to refactoring you are right, of course. However, I was being sloppy in my terminology. What I meant was specifying all (most?) code changes intentionally, not just refactorings, e.g.
a) rename instance variable (refactoring), but also
b) add instance variable (not a refactoring)
c) (more high-level) add cache for method

I don't think the UI would be the most difficult problem. One could take state-of-the-art refactoring UIs in IDEs like IntelliJ as inspiration. Finding good high-level code changes, designing the CRDT data structure and especially sensible merge strategies would probably be more difficult. Alas, I have no experience in designing CRDTs.; 7/26/2024 12:14 AM

Room 101

Tuesday, April 19, 2022

Bitrot Revisited: Local First Software and Orthogonal Synchronization

7 comments:

About Me

Blog Archive