Room 101

A place to be (re)educated in Newspeak

Monday, August 15, 2022

What You Want Is What You Get

How do we resolve the classic tension between WYSIWYG and markup . Alas, one can't explain that properly in blogger, but if you follow this link, you'll see what I mean.

Thursday, June 30, 2022

The Prospect of an Execution: The Hidden Objects Among Us

Depend upon it, Sir, when a man knows he is to be hanged in a fortnight, it concentrates his mind wonderfully.
-- Samuel Johnson

I wish to concentrate your mind, gentle reader, by focusing on an execution (not yours of course! I see you are already losing focus - no matter ...). My goal is to make you see the objects that are in front of you every day, hiding in plain sight.

So who are we executing? Or what? The condemned operates under a wide variety of aliases that obscure its true nature: a.out alias .exe alias ELF file alias binary and more. I mean to expose the identity that hides beneath these guises: it is an object!

When we run an executable file, we call a function in that file, and that function accesses the data in the file, possibly calling other functions in the same file recursively. Replace function with method, call with invoke and file with object in the previous sentence and you will begin to see what I mean:

When we run an executable object, we invoke a method in that object, and that method accesses the data in the object, possibly invoking other methods in the same object recursively.

The initial function, the entry point, is often called main(). Consider the venerable a.out : it is a serialized object on disk. When the system loads it, it's deserializing it. The system then invokes the object's main() method; essentially, the system expects the executable to have an interface:

interface Executable {main(argc: Integer, argv: Array[String])}

ELF can also be viewed as a serialization format for objects in this way. We aren't used to thinking of loading and running in this way, but that doesn't detract from the point. Once you see it, you cannot unsee it.

Newspeak makes this point of view explicit. In Newspeak, an application is an object which supports the method main:args:. This method takes two arguments: a platform object, and an array object whose elements are any specific arguments required. The platform object provides access to standard Newspeak platform functionality that is not part of the application itself. To deploy an application, we serialize it using conventional object serialization. Objects reference their class, and classes reference mixins which reference methods. All of these are objects, and get serialized when we serialize the application object. So all the application's code gets serialized with it. Running the deployed app is a matter of deserializing it and calling main:args:.

The advantage of recognizing this explicitly is conceptual parsimony, which yields an economy of mechanism. You can literally reuse your object serializer as a deployment format. Serializing data and serializing code are one and the same.

Executables aren't the only objects that aren't recognized as such. Libraries are also objects. It doesn't matter if we are talking about DLLs at the operating system level or about packages/modules/units at the programming language level, or packages in the package-manager sense. The key point about all these things is that they support an API - an Application Programming Interface. We'll dispense with the acronym-happy jargon and just say interface. In all these cases, we have a set of named procedures that are made accessible to callers. They may make use of additional procedures, some publicly available via the interface, and some not. They may access data encapsulated behind the interface; that data may be mutable or not. The key thing is the notion of an interface.

Even if you are programming in a pure functional setting, such objects will make an appearance. The packages of Haskell, and certainly the structures of ML, are not all that different. They may be statically typed or they may be not. They may be statically bound at some level - but as long as we have separate compilation, this is just an optimization that relies on certain rigidities of the programming model. That is, your language may not treat these things a first class values, but your compilation units can bind to different implementations of the same package interface, even if they can only bind to one at a time. So even if the language itself does not treat these entities as true objects with dynamicly bound properties, they have to act as objects in the surrounding environment.

In many systems, the API might expose variables directly. And they very often may expose classes directly as well. Nevertheless, these are all late-bound at the level of linking across compilation units or OS libraries.

The notion of an interface is what truly characterizes objects - not classes, not inheritance, not mutable state. Read William Cook's classic essay for a deep discussion on this.

So the next time someone tells you that they don't believe in objects, that objects are bad and one shouldn't and needn't use them, you can politely inform them that they shouldn't confuse objects with Java and the like, or even with imperative programming. Objects are always with us, because the concept abstracting over implementations via an interface is immensely valuable.

Tuesday, April 19, 2022

Bitrot Revisited: Local First Software and Orthogonal Synchronization

This post is based on a invited talk I gave recently at the Programming 22 conference.

The talk wasn't recorded but I've recorded a reprise at: https://youtu.be/qx6ekxXdidI


The definition of insanity not withstanding, I decided to revisit a topic I have discussed many times before: Objects as Software Services. In particular, I wanted to relate it to recent work others have been doing.

The earliest public presentation I gave on this at the DLS in 2005. There are recordings of talks I gave at Google and Microsoft Research, as well as several blog posts ( March 2007, April 2008, April 2009, January 2010 and April 2010). You can also download the original write up.

The goal is software that combines the advantages of old-school personal computing and modern web-based applications. There are two parts to this.

First, software should be available at all times. Like native apps, software should be available even if the network is slow, unreliable or absent, or if the cloud is otherwise inaccessible (say due to denial-of-service, natural disaster, war or whatever). And, like a cloud app, it should be accessible from any machine at any location (modulo network access if it hasn't run there before). Recently, this idea has started to get more attention under the name local-first software.

Tangent: I've proposed a number of terms for closely related ideas in older posts such as Software Objects, Rich Network Enabled Clients, Network Serviced Applications and Full-Service Computing, but whatever name gets traction is fine with me.

Second, software should always be up-to-date (this is where Bitrot comes in). That means we always want to run the latest version available, just like a web page. This implies automatically updating application code over the network without disrupting the end-user. This latter point goes well beyond the idea of local-first software as I've seen it discussed.

Let's take these two goals in order.
  • For offline availability, one has to store the application and its data locally on the client device. However, unlike classical personal computing, the data has to be made available, locally, to multiple clients. Now we have multiple replicas of our data, and they have to be kept in sync somehow. My proposal was to turn that responsibility over to the programming language via a concept I dubbed Orthogonal Synchronization. The idea was to extend the concept of orthogonal persistence, which held that the program would identify which fields in every data structure were deemed persistent, and the system would take care of serializing and deserializing their contents, recursively. With orthogonal synchronization, the data would not only be persisted automatically, but synchronized.
  • To keep the software up-to-date without disrupting the user, we want good support for dynamic software update. When the application code changes, we update the app live. How do we know when the code changes? Well, code is just data, albeit of a particular kind. Hence we sync it, just like any other persistent data. We reuse much of the same orthogonal synchronization mechanism, and since we sync both code and data at the same time, we can migrate data seamlessly whenever the code and data format changes. As I've discussed in the past, this has potentially profound implications for versioning, release cycles and software development. All this goes well beyond the focus of local-first software, and is way outside the scope of this post. See the original materials cited above for more on that aspect.
There's only one small problem: merge conflicts. The natural tendency is to diff the persistent representations to compute a set of changes and detect conflicts. An alternative is to record changes directly, whenever setters of persistent objects are called. Either way, we are comparing the application state at the level of individual objects. This is very low level; it is an extensional approach, which yields no insight into the intention of the changes. As an example, consider a set, represented as an array of elements and an integer indicating the cardinality of the set. If two clients each add a distinct object to the set, we find that they both have the same set object, but the arrays differ. The system has no way to resolve the conflict in a satisfactory manner: choosing either replica is wrong. If one understands the intention of the change, one could decide to resolve the conflict by performing both additions on the original set.

Local first computing approaches this problem differently. It still needs to synchronize the replicas. However, the problem of conflicts is elegantly defined away. The idea is to use Conflict-free Replicated Data Types (CRDTs) for all shareable data, and so conflicts cannot arise. This is truly brilliant as far as it goes. And CRDTs go further than one might think.

CRDT libraries record intentional changes at the level of the CRDT object (in our example, the set, assuming we use a CRDT implementation of a set); sync is then just the union of the change sets, and no conflicts arise. However, the fact that no formal conflict occurs does not necessarily mean that the result is what we actually expect. And CRDTs don't provide a good solution for code update.

Can we apply lessons from CRDTs to orthogonal synchronization? The two approaches seem quite contradictory: CRDTs fly in the face of orthogonal persistence/synchronization. The 'orthogonal' in these terms means that persistence/synchronization is orthogonal to the datatype being persisted/synced. You can persist/sync any datatype. In contrast, using CRDTs for sync means you have to use specific datatypes. One conclusion might be that orthogonal sync is just a bad idea. Maybe we should build software services by using CRDTs for data, and structured source control for code. However, perhaps there's another way.

Notice that the concept of capturing intentional changes is distinct from the core idea of CRDTs. It's just that, once you have intentional changes, CRDTs yield an exceptionally simple merge strategy. So perhaps we can use orthogonal sync, but incorporate intentional change data and then use custom merge functions for specific datatypes. CRDTs could fit into this framework; they'd just use a specific merge strategy that happens to be conflict-free. However, we now have additional options. For example, we can merge code with a special strategy that works a bit like traditional source control (we can do better, but that's not my point here). As a default merge strategy when no intent is specified, we could treat setter operations on persistent slots as changes and just ask the user for help in case of conflict. We always have the option to specify an alternate strategy such as last-write-wins (LWW).

How might we specify what constitutes an intentional change, and what merge strategy to use? One idea is to annotate mutator methods with metadata indicating that they are changes associated with a given merge strategy. Here is what this might look like for a simple counter CRDT:

class Counter = (| count ::= 0. |)(
public value = (^count)
public increment (* :crdt_change: *) = (
count: count + 1
)
public decrement (* :crdt_change: *) = (
count: count - 1
)))
The metadata tag (crdt_change in this case) identifies a tool that modifies the annotated method so that calls are recorded as change records with salient information (name of called method, timestamp, arguments) as well as a merge method that processes such changes according to a standardized API.

Now, to what extent is this orthogonal sync anymore? Unlike orthogonal persistence, we can't just mark slots as persistent and be done; we have to provide merge strategies. Well, since we have a default, we can still argue that sync is supported regardless of datatype. Besides, quibbling over terminology is not the point. We've gained real flexibility, in that we can support both CRDTs and non-CRDTs like code. And CRDT implementations don't need to incorporate special code for serialization and change reporting. The system can do that for them based on the metadata.

I've glossed over many details. If you watch the old talks, you'll see many issues discussed and answered. Of course, the proof of the pudding is in creating such a system and building working applications on top. I only managed to gather funding for such work once, which is how we created Newspeak, but that funding evaporated before we got very far with the sync problem. Sebastián Krynski worked on some prototypes, but again, without funding it's hard to make much progress. Nevertheless, there is more recognition that there is a problem with traditional cloud-based apps. As the saying goes: this time it's different.

Thursday, August 12, 2021

How is a Programmer Like a Pathologist?

Blogging platforms like Blogger are totally inadequate, because they don't support embedding interactive code in posts. So this is just an indirection for the real post at: https://blog.bracha.org/exemplarDemo/exemplar2021.html?snapshot=BankAccountExemplarDemo.vfuel#

Sunday, May 24, 2020

Bits of History, Words of Advice

"Why do you jackasses use these inferior linguistic vehicles when we have something here that’s so
precious, so elegant, which gives me so much pleasure? How can you be so blind and so foolish?"
That debate you’ll never win, and I don’t think you ought to try.

- Alan Perlis, 1978


In the late 1970s, researchers at Xerox Parc invented modern computing.  Of course, there were others
elsewhere - but Parc made a vastly disproportionate contribution.

A large part of that was done in, and based upon, the Smalltalk programming language. Forty years
ago, Smalltalk's dynamic update and reflection capabilities were more advanced than in any
mainstream language today. The language leveraged those capabilities to provide an IDE that in
many ways still puts the eclipses, black holes, red dwarfs and other travesties that currently
masquerade under that term to shame.  The Smalltalk image provided a much better Docker than
Docker.

Smalltalk and Smalltalkers invented not only IDEs, but window systems and their related paraphernalia
(pop-up menus, scroll bars, the bit-blt primitives that make them possible) as well as GUI builders,
unit testing, refactoring and agile development (ok, so nobody's perfect).

And yet, today Smalltalk is relegated to a small niche of true believers.  Whenever two or more
Smalltalkers gather over drinks, the question is debated: Why? 

The answer is unknowable, since we cannot run parallel universes and tweak things to see which
makes a difference

 I did describe such an alternate universe in a talk in 2016; it may be the best talk I ever gave.

Nevertheless, I think we can learn something from looking into this question. I'll relate parts of history
that I deem relevant, as I know them. I'm sure there are inaccuracies in the account below.
There are certainly people who were closer to the history than I.  My hope is that they'll expand on my
comments and correct me as needed. I'm sure I'll be yelled at for some of this. See if I care.

On with the show.

Lack of a Standard. Smalltalk had (and still has) multiple implementations - more so than much
more widely used languages. In a traditional business, having multiple sources for a technology
would be considered an advantage. However, in Smalltalk's case, things proved to be quite different. 

Each vendor had a slightly different version - not so much a different language, as a different platform.
In particular, Smalltalk classes do not have a conventional syntax; instead, they are defined via
reflective method invocation. Slight differences in the reflection API among vendors meant that the
program definitions themselves were not portable, irrespective of other differences in APIs used by the
programs. 

There were of course efforts to remedy this. Smalltalk standardization efforts go back to the late 80s,
but were pushed further in the 90s. Alas, in practice they had very little impact.

Newspeak of course, fixed this problem thoroughly, along with many others. But we were poorly funded
after the 2008 crash, and never garnered much interest from the Smalltalk community.
The community's lack of interest in addressing weaknesses in the Smalltalk-80 model will be a
recurring theme throughout this post.

Business model. Smalltalk vendors had the quaint belief in the notion of "build a better mousetrap
and the world will beat a path to your door".  Since they had built a vastly better mousetrap, they
thought they might charge money for it. 

This was before the notion of open source was even proposed; though the Smalltalk compilers, tools
and libraries were provided in source form; only the VMs were closed source.

Alas, most software developers would rather carve their programs onto stone tablets using flint tools
held between their teeth than pay for tools, no matter how exquisite. Indeed, some vendors charged
not per-developer-seat, but per deployed instance of the software. Greedy algorithms are often
suboptimal, and this approach was greedier and less optimal than most. Its evident success speaks
for itself.

In one particularly egregious and tragic case, I'm told ParcPlace declined an offer from Sun
Microsystems to allow ParcPlace Smalltalk to be distributed on Sun workstations. Sun would pay a per
machine license fee, but it was nowhere near what ParcPlace was used to charging.

Eventually, Sun developed another language; something to do with beans, I forget. Fava, maybe?
Again, dwell on that and what alternative universe might have come about.


Performance and/or the illusion thereof.

Smalltalk was and is a lot slower than C, and more demanding in terms of memory. In the 1980s and
early 1990s, these were a real concern. In the mid-1990s, when we worked on Strongtalk, Swiss
banks were among our most promising potential customers. They already had Smalltalk applications
in the field. They could afford to do so where others could not. For example, they were willing to equip
their tellers with powerful computers that most companies found cost-prohibitive - IBM PCS with a
massive 32Mb of memory! 

It took a long time for implementation technology to catch up, and when it did, it got applied to lesser
languages. This too was a cruel irony. JITs originated in APL, but Smalltalk was also a pioneer in that
field (the Deutsch-Schiffman work), and even more so Self, where adaptive JITs were invented. 

Strongtalk applied Self's technology to Smalltalk, and made it practical. 

Examples: Self needed 64Mb, preferably 96, and only ran on Sun workstations. Strongtalk ran in 8Mb
on a PC. This mattered a lot. And Strongtalk had an FFI, see below.

Then, Java happened.  Strongtalk was faster than Java in 1997, but Strongtalk was acquired by Sun;
the VM technology was put in the service of making Java run fast.  

The Smalltalk component of Strongtalk was buried alive until it was too late. By the time I finally got it
open-sourced , bits had rotted or disappeared, the system had no support, and the world had moved on.
And yet, the fact that the Smalltalk community took almost no interest in the project is still telling.

Imagine if all the engineering efforts sunk into the JVM had focused on Smalltalk VMs.

It's also worth dwelling on the fact that raw speed is often much less relevant than people think.
Java was introduced as a client technology (anyone remember applets?). The vision was programs
running in web pages. Alas, Java was a terrible client technology. In contrast, even a Squeak
interpreter, let alone Strongtalk, had much better start up times than Java, and better interactive
response as well. It also had much smaller footprint. It was a much better basis for performant client
software than Java. The implications are staggering. 

On the one hand, Netscape developed a scripting language for the browser. After all Java wouldn't cut
it. Sun gave them permission to use the Java name for their language.  You may have heard of this
scripting language; it's called Javascript. 

Eventually, people found a way to make Javascript fast. Which people? Literally some of the same
people who made Strongtalk fast (Lars Bak), using much the same principles.

Imagine if Sun had a workable client technology. Maybe the Hot Java web browser would still be
around.

On the other hand, the failure of Java on the client led to an emphasis on server side Java instead.
This seemed like a good idea at the time, but ultimately commoditized Sun's product and contributed
directly to Sun's downfall. Sun had a superb client technology in Strongtalk, but the company's
leadership would not listen. 

Of course, why would they? They had shut down the Self project some years earlier to focus on Java.
Two years later, they spent an order of magnitude more money than it cost to develop Self, to buy back
essentially the same technology so they could make Java performant.


Interaction with the outside world.

Smalltalk had its unique way of doing things. Often, though not always, these ways were much better
than mainstream practice. Regardless, it was difficult to interact with the surrounding software
environment. Examples:

FFIs. Smalltalk FFIs were awkward, restrictive and inefficient. After all, why would you want to reach
outside the safe, beautiful bubble into the dirty dangerous world outside?

We addressed this back in the mid-90s in Strongtalk, and much later, again, in Newspeak.  

Windowing. Smalltalk was the birthplace of windowing. Ironically, Smalltalks continued to run on top
of their own idiosyncratic window systems, locked inside a single OS window. 

Strongtalk addressed this too; occasionally, so did others, but the main efforts remained focused on
their own isolated world, graphically as in every other way. Later, we had a native UI for Newspeak as
well. 

Source control. The lack of a conventional syntax meant that Smalltalk code could not be managed
with conventional source control systems. Instead, there were custom tools. Some were great - but
they were very expensive.

In general, saving Smalltalk code in something so mundane as a file was problematic. Smalltalk used
something called file-out format, which is charitably described as a series of reflective API calls, along
with meta-data that includes things like times and dates when the code was filed out. This compounded
the source control problem.


Deployment. Smalltalk made it very difficult to deploy an application separate from the programming
environment. The reason for this is that Smalltalk was never a programming language in the traditional
sense. It was a holistically conceived programming system. In particular, the idea is that computation
take place among communicating objects, which all exist in some universe,  a "sea of objects". Some
of these object know how to create new ones; we call them classes (and that is why there was no
syntax for declaring a class, see above).

What happens when you try to take some objects out of the sea in which they were created (the IDE)?
Well, it's a tricky serialization problem.  Untangling the object graph is very problematic.

If you want to deploy an application by separating it from the IDE (to reduce footprint, or protect your IP,
or avoid paying license fees for the IDE on each deployed copy) it turns out to be very hard.


The Self transporter addressed this problem in a clever way. Newspeak addressed it much more
fundamentally and simply, both by recognizing that the traditional linguistic perspective need not
contradict the Smalltalk model, and by making the language strictly modular.

The problem of IP exposure is much less of a concern today. It doesn't matter much for server based
applications, or for open source software. Wasted footprint is still a concern, though in many cases you
can do just fine. Avi Bryant once explained to me how he organized the server for the late, great
Dabble DB. It was so simple you could just cry, and it performed like a charm using Squeak images.
Another example of the often illusory focus on raw performance.


So why didn't Smalltalk take over the world? 

With 20/20 hindsight, we can see that from the pointy-headed boss perspective, the Smalltalk value
proposition was: 

Pay a lot of money to be locked in to slow software that exposes your IP, looks weird on screen and
cannot interact well with anything else; it is much easier to maintain and develop though!

On top of that, a fair amount of bad luck.

And yet, those who saw past that, are still running Smalltalk systems today with great results; efforts to
replace them with modern languages typically fail at huge cost.

All of the problems I've cited have solutions and could have been addressed. 
Those of us who have tried to address them have found that the wider world did not want to listen -
even when it was in its own best interest. This was true not only of short sighted corporate leadership,
but of the Smalltalk community itself.  

My good friends in the Smalltalk community effectively ignored both Strongtalk and Newspeak.
It required commitment and a willingness to go outside their comfort zone.

I believe the community has been self-selected to consist of those who are not bothered by Smalltalk's
initial limitations, and so are unmotivated to address them or support those who do. In fact, they often
could not even see these limitations staring them in the face, causing them to adopt unrealistic
business policies that hurt them more than anyone else.

Perhaps an even deeper problem with Smalltalk is that it attracts people who are a tad too creative and
imaginative; organizing them into a cohesive movement is like herding cats.

Nevertheless, Smalltalk remains in use, much more so than most people realize. Brave souls continue
to work on Smalltalk systems, both commercial and open source. Some of the issues I cite have been
addressed to a certain degree, even if I feel they haven't been dealt with as thoroughly and effectively
as they might. More power to them. Likewise, we still spend time trying to bring Newspeak back to a
more usable state. Real progress is not made by the pedantic and mundane, but by the dreamers who
realize that we can do so much better.



Eppur si muove

Monday, January 06, 2020

The Build is Always Broken

Programmers are always talking about broken builds: "The build is broken", "I broke the build" etc. However, the real problem is that the very concept of the build is broken. The idea that every time an application is modified, it needs to be reconstructed from scratch is fundamentally flawed. A practical problem with the concept is that it induces a very long and painful feedback loop during development. Some systems address this by being wicked fast. They argue that even if they compile the world when you make a change, it's not an issue since it'll be done before you know it. One problem is that some systems get so large that this doesn't work anymore. A deeper problem is that even if you rebuild instantly, you find that you need to restart your application on every change, and somehow get it back to the stage where you were when you found a problem and decided to make a change. In other words, builds are antithetical to live programming. The feedback loop will always be too long. Fundamentally, one does not recreate the universe every time one changes something. You don't tear down and reconstruct a skyscraper everytime you need to replace a light bulb. A build, no matter how optimized, will never give us true liveness It follows that tools like make and its ilk can never provide a solution. Besides, these tools have a host of other problems. For example, they force us to replicate dependency information that is already embedded in our code: things like imports, includes, uses or extern declarations etc. give us the same file/module level information that we manually enter into build tools. This replication is tedious and error prone. It is also too coarse grain, done at the granularity of files. A compiler can manage these dependencies more precisely, tracking what functions are used where for example. Caveats: Some tools, like GN, can be fed dependency files created by cooperating compilers. That is still too coarse grain though. In addition, the languages these tools provide have poor abstraction mechanisms (compare make to your favorite programming language) and tooling support (what kind of debugger does your build tool provide?). The traditional response to the ills of make is to introduce additional layers of tooling of a similar nature, like Cmake. Enough!

A better response is to produce a better DSL for builds. Internal DSLs, based on a real programming language, are one way to improve matters. Examples are rake and scons, which use Ruby and Python respectively. These tools make defining builds easier - but they are still defining builds, which is the root problem I am concerned with here. So, if we aren't going to use traditional build systems to manage our dependencies, what are we to do? We can start by realizing that many of our dependencies are not fundamental; things like executables, shared libraries, object files and binaries of whatever kind. The only thing one really needs to "build" is source code. After all, when you use an interpreter, you can create only the source you need to get started, and then incrementally edit/grow the source. Using interpreters allows us to avoid the problems of building binary artifacts. The cost is performance. Compilation is an optimization, albeit an important, often essential, one. Compilation relies on a more global analysis than an interpreter, and on pre-computing the conclusions so we need not repeat work during execution. In a sense, the compiler is memoizing some of the work of the interpreter. This is literally the case for many dynamic JITs, but is fundamentally true for static compilation as well - you just memoize in advance. Seen in this light, builds are a form of staged execution, and the binary artifacts that we are constantly building are just caches. One can address the performance difficulties of interpreters by mixing interpretation with compilation. Many systems with JIT compilers do exactly that. One advantage is that we don't have to wait for the optimization before starting our application. Another is that we can make changes, and have them take effect immediately by reverting to interpretation, while re-optimizing. Of course, not all JITs do that; but it has been done for decades, in, e.g., Smalltalk VMs. One of the many beauties of working in Smalltalk is that you rarely confront the ugliness of builds. And yet, even assuming you have an engine with a JIT that incrementally (re)optimizes code as it evolves, you may still be confronted with barriers to live development, barriers that seem to require a build. Types. What if your code is inconsistent, say, due to type errors? Again, there is no need for a build step to detect this. Incremental typecheckers should catch these problems the moment inconsistent code is saved. Of course, incremental typecheckers have traditionally been very rare; it is not a coincidence that live systems have historically been developed using dynamically typed languages. Nevertheless, there is no fundamental reason why statically typed languages cannot support incremental development. The techniques go back at least as far as Cecil; See this paper on Scala.js for an excellent discussion of incremental compilation in a statically typed language. Tests. A lot of times, the build process incorporates tests, and the broken build is due to a logical error in the application detected by the tests. However, tests are not themselves part of the build, and need not rely on one - the build is just one way to obtain an updated application. In a live system, the updated application immediately reflects the source code. In such an environment, tests can be run on each update, but the developer need not wait for them. Resources. An application may incorporate resources of various kinds - media, documentation, data of varying kinds (source files or binaries, or tables or machine learning models etc.). Some of these resources may require computation of their own (say, producing PDF or HTML from documentation sources like TeX or markdown), adding stages that are seldom live or incremental. Even if the resources are ready to consume, we can induce problems through gratuitous reliance on file system structure. The resources are typically represented as files. The deployed structure may differ from the source repository. Editing components in the source repo won't change them in the built structure. It isn't easy to correct these problems, and software engineers usually don't even try. Instead, they lean on the build process more and more. It doesn't have to be that way. We can treat the resources as cached objects and generate them on demand. When we deploy the application, we ensure that all the resources are precomputed and cached at locations that are fixed relative to the application - and these should be the same relative locations where the application will place them during development in case of a cache miss. The software should always be able to tell where it was installed, and therefore where cached resources stored at application-relative locations can be found. The line of reasoning above makes sense when the resource is accessed via application logic. What about resources that are not used by the application, but made available to the user? In some cases, documentation and sample code and attached resources might fall under this category. The handling of such resources is not part of the application proper, and so it is not a build issue, but a deployment issue. That said, deployment is simply computation of a suitable object to serialized to a given location, and should be viewed in much the same way as the build; maybe I'll elaborate on that in separate post. Dealing with Multiple Languages. Once we are dealing with multiple languages, we may be pushed into using a build system because some of the languages do not support incremental development. Assuming that the heart of our application is in a live language, we should treat other languages as resources; their binaries are resources to be dynamically computed during development and cached.

Summary


  • Builds kill liveness.
  • Compilation artifacts are a form of cached resource, the result of staged execution.
  • To achieve liveness in industrial settings, we need to structure our development environments so that any staging is strictly an optimization
    • Staged results should be cached and invalidated automatically when the underlying basis for the cached value is out of date.
    • This applies regardless of whether the staged value is a resource, a shared library/binary or anything else. 
    • The data necessary to compute the cached value, and to determine the cache's validity, must be kept at a fixed location, relative to the application. 

It's high time we build a new, brave, build-free world.


Saturday, January 12, 2019

Much Ado About Nothing

What sweet nothing does the title refers to? It could be about null, but it in fact will say nothing about that. The nothing in question is whitespace in program text. Specifically, whether whitespace should be significant in a programming language.

My instinct has always been that it should not. Sadly, there are always foolish souls who will not accept my instinct as definitive evidence, and so one must stoop to logical arguments instead.

Significant whitespace, by definition, places the burden of formatting on the programmer. In return, it can be leveraged to reduce syntactic noise such as semicolons and matching braces. The alleged
benefit is that in practice, programmers often deal with both formatting and syntactic noise, so eliminating one of the two is a win.

However, this only holds in a world without civilized tooling, which in turn may explain the fondness for significant whitespace, as civilized tooling (and anything civilized, really),  is scarce. Once you assume proper tooling support, a live pretty printer can deal with formatting as you type, so there is no reason for you to be troubled by formatting. So now you have a choice between two inconveniences. Either:

  • You use classical syntax, and programmers learn where to put the semicolons and braces,  and stop worrying about formatting, Or
  • You make whitespace significant, cleanup the syntax, and have programmers take care of the formatting.

At this point, you might say this a matter of personal preference, and can devolve into the kind of religious argument we all know and love. To tip the scales, pray consider the line of reasoning below. I don’t recall encountering it before which is what motivated this post.

In the absence of significant whitespace, a pretty printing (aka code formatting) is an orthogonal concern. We can choose whatever pretty printing style we like and implement a tool to enforce it.  Such a pretty-printer/code-formatter can be freely composed with any code source we have - a programmer typing into an editor, an old repository, and most importantly, other tools that spit out code - whether they transpile into our language or generate code in some other way.

Once whitespace is significant, all those code sources have to be cognizant of formatting.  The tool writer has to be worried about both syntax and formatting, whereas before only syntax was a concern.

You might argue that the whitespace is just another form of syntax; the problem is that it is not always context-free syntax. For example, using indentation to nest constructs is context sensitive, as the number of spaces/tabs (or backspaces/backtabs) depends on context.

In short, significant whitespace (or at least significant indentation) is a tax on tooling. Taxing tooling not only wastes the time and energy of tool builders - it discourages tooling altogether. And so, rather than foist significant whitespace on a language, think in terms of a broader system which includes tools. Provide a pretty printer with your language (like in Go).  Ideally, there's a version of the pretty printer that live edits your code as you type.

As a bonus,  all the endless discussions about formatting Go away, as the designers of Go have noted.  Sometimes the best way to address a problem is to define it away.

There. That was mercifully brief, right? Nothing to it.