A place to be (re)educated in Newspeak

Saturday, October 31, 2009

Atomic Install

A few months ago, I wrote a post about the become: operation in the Smalltalk family of languages. The post elicited a good discussion. One of the interesting points that came up, was that you could implement become: by changing the class of objects, provided all the changes were atomic - changing the class, modifying the object’s schema accordingly, and copying the data between the objects.

Setting the discussion of become: aside, I observe that in general, reflective change should be atomic. What I mean by this, is that one should be able to group an arbitrary number of distinct reflective changes to the program, and install them into the running program all at once, in one atomic operation. I will use the term atomic install to refer to this ability.

Oddly enough, this point is not well understood. Most languages that support reflective change do not provide a way for atomically performing a set of changes.

Smalltalk provides a reflective API centered around classes. You can ask a class to modify itself in a variety of ways - adding or removing variables, methods or changing its superclass. There is, however, no way to change several classes at once.

The CLOS APIs are similar in this respect.

Scripting languages don’t really have a reflective API as such. Instead, the reflective changes may come about as the result of executing some code (e.g., an assignment may add a variable). In other cases, the reflective capacity of the language comes about directly from exposing the data structures of the language implementation back to the language.

In all these cases, one cannot perform a set of program changes as an atomic unit.

Why do you need to atomically apply a set of changes instead of sequentially applying one change after another?

One reason is that you’d like to apply the changes as a transaction. If a change fails (say, because you created a cycle in the class graph, or duplicated an instance variable) you don’t want any further changes to take place. It’s easier if you don’t have to catch exceptions etc. after each step.

Another reason is that the changes are often dependent on each other. Applying one change without the other leaves your program in a broken state. The fact that intermediate states may be inconsistent even though the overall set of changes is correct means that it isn't sufficient to wrap a series of reflective changes in a transaction.

Of course, most people don’t rely on reflective modification to develop their programs. Rather, they suffer through the classic edit-compile-run-debug cycle. In the absence of anything better, you typically edit source code in files. You then compile these files and load the resulting program. This actually has one big advantage: the load is atomic - all the changes that resulted from your edits are loaded as a unit.

Evolving your program reflectively has the advantage that you can make corrections to the running program. Often, this is done during fix-and-continue debugging. Even then, in most cases, the “program” in question is an application that is stopped in the debugger, and you can apply fixes sequentially. But as noted above, it’s still easier if you can apply the changes as a transaction.

More interesting cases are when you are modifying a program, and need to run it between the individual modifications. Unlike most people, I run into this often, when modifying the IDE I am using. However, there are additional situations of this nature.

Long lived applications that must provide continuous service have this flavor. Erlang allows you to replace an entire module as a unit in these situations. In Java, you use class loaders to get the desired effect; it’s complicated, but it’s your only way out. A scenario I'm especially interested in is software services: the service updates applications on the fly without shutting them down - and in particular, I may want to update the update mechanism itself.

If you can’t make all the changes in one go, you find that you have to break the transition into a series of steps, each of which leaves the system in a consistent state while leading to the desired final program. This is tricky and error prone: you need to apply the right changes in just the right order.

Scripting languages like Ruby often go through a series of program changes during program start up. Different modules are loaded, modifying the program in the process. This process is also order dependent, and therefore brittle. In many cases, this sort of reflective change isn’t actually essential; rather it’s an artifact of the language semantics. However, I suspect there are situations where it is used to real advantage.

Overall, one can live for a long time without the ability to atomically apply a set of program changes. And yet it seems that there are some situations where atomic install seems to be very useful. It also has another advantage: batching the changes is more performant. Often one doesn’t care, but it doesn’t hurt to be faster, and on some occasions it actually can matter. One might as well see if one can come up with a reflective API that supports atomic install.

In Strongtalk, the VM supported atomic install as a primitive operation in the VM. More recently, in the latest Newspeak update release, I added an atomic install facility written entirely in Newspeak. Exactly 372 lines of code (whitespace and copious comments included). It is a tribute to the Squeak design that this is doable without any privileged access to the VM .

Tangent1: Thanks to Peter Ahe and Eliot Miranda for the discussions that led to this scheme; and to Lars Bak, for the discussions that led to the original notion of atomic install.


Tangent2: Squeak’s existing mechanism for changing classes, the ClassBuilder, is, on the other hand, rather unattractive. It’s three times as long, vastly more complicated, and provides only a subset of the functionality. It shows how tricky this kind of reflective change can get if you don’t conceptualize it the right way.

Naturally, the actual atomic step here is done using a variant of become:. Specifically, it’s a one-way become: on an array of objects. Given two object arrays a and b, both of size n, all references to a[i] are changed to refer to b[i], for i =1 .. n.

The atomic installation process nice and simple. The input is a list of mirrors describing mixins, and a namespace listing which of those mixins already exist in the system. The namespace parameter is crucial BTW, as there is no global namespace in Newspeak.

We then produce a fresh set of mixins based on the input list. For any existing mixins, we locate all classes that are invocations of those mixins, and all subclasses thereof. We make fresh versions of these as well, reflecting the changed mixins involved. For each such class, we locate all its instances (does your system have allInstances?) and produce new instances based on the new descriptions. Each new object is associated with the old object by keeping them in parallel arrays. Then we just do the become: and we’re done.

Unlike Smalltalk, in Newspeak we never have to recompile code that hasn’t been modified at the source level - because Newspeak code is representation independent.

It’s clearly harder to do atomic install in a system that does sophisticated JIT compilation and inlining - but as I said, Strongtalk supported it (albeit at the VM level, largely because of the extar complexity involved) in the mid-90s.

How do we get this kind of reflective power on mainstream platforms? It is harder, because the widely used platforms don’t support the needed abstractions as well as Smalltalk VMs do.

At least on the web browser, I expect to be able to get similar effects with Javascript, albeit with a very different implementation.

In Java, the absence of the necessary primitives tends to force one to build one’s own custom object representation, which is costly in both developer time and machine time.

Ironically (but not coincidentally), the bulk of the necessary machinery already exists in the Hotspot JVM, which is capable of changing at least method implementations on the fly (via JVMDI), including deoptimizing compiled code that may have inlined a method that has been modified. The problem is exposing it to user - and especially exposing it securely. Mirrors can help here - but that is for a future post.

On .Net, the DLR helps one construct one’s own custom representation. Conversely, there’s little support for deoptimization etc. in the CLR itself.

Of course, one goal of this post is to encourage implementors to add such support, and do it in the right way.

11 comments:

Pascal Costanza said...

Context-oriented Programming can also help here: You package the changes you want to make into layers, which can contain modifications of several classes in one layer, and you can then 'activate' and 'deactivate' such layers in one go (either globally, or with dynamic scope). Using this for atomic software updates hasn't been fully worked out yet, but was definitely one of the motivating ideas behind it.

ahe said...

How is a "tribute to the Squeak design?" My understanding is that you can do this in any Smalltalk. I don't think Squeak does this particularly better than other Smalltalks. Also, why is it desirable to be able to implement atomic install without requiring privileges in a capability based system? Or in any secure system for that matter?

Why would having "allInstances" be desirable in a capability based system? It is a method on Class, for goodness sake. I believe you can find all instances of a class on most JVMs using some kind of debugger/monitoring API, but at least such functionality is not provided as a completely insecure mechanism directly accessible from any class.

How come allInstances is useful for doing atomic install? It would seem to have horrible performance if you where to change more than one class at a time as it is implemented by scanning the heap in Squeak. For toy examples it doesn't matter, but real world programs often have heaps in the GB range.

Gilad Bracha said...

Hi Peter,

Squeak was designed for flexibility not security. It does rate very highly for flexibility (and very low for security). Not every Smalltalk would give you direct access to the method tables - and if it did, it isn't clear that tampering them would interact well with the actions of the JIT, for example.

Of course, these features are not what you'd look for first in a secure system - at least you'd want access to them to be very carefully controllable. I did not mean to imply otherwise.

Take allInstances. I don't see it in conflict with security if access to it is mediated by a capability.

allInstances does have a performance problem when modifying classes with live instances; it isn't necessary, but it makes things a lot simpler. Otherwise, I think you have to go the lazy modification route.

BTW, I don't recall JVMDI supporting allInstances, and a quick review shows no such functionality. Any one who knows different - please correct me and provide a specific reference on how it's done; it could be really useful.

Gilad Bracha said...

One more point:

Atomic install, as implemented, only examines instances of a class if the layout of the class has changed. Hence, it should be at least as efficient (or inefficient) as the existing Smalltalk mechanism for reflective modification. It just does it all atomically.

For large heaps, there is a performance issue in any case. This was debated as part of the become: discussion 3 months back. There are still many scenarios where heaps are small - embedded devices etc.

In any case, it is the principle that I am focused on here: reflective change should be supported in any general purpose programming language, and it should be done atomically.

ahe said...

You say it is a tribute to Squeak that you can implement atomic install easily. I do not see what Squeak's major contribution in this area is, and furthermore the realization of these features in Squeak is completely insecure and slow.

You are hinting that other systems are somehow inferior for not having these features of Squeak, but somehow forgiving Squeak for realizing these features without any concerns about performance or security.

In Squeak, Class>>allInstances is not mediated by a capability, yet you ask "does your system have allInstances?" without pointing out that it really isn't the best primitive for implementing atomic install or the best model of how to provide such a feature in a security-conscientious system.

So I ask again, what is it that Squeak is contributing? Where these primitives not available in the original Smalltalk system Squeak was derived from?

PS: Sun's JDK supports tools like jmap and jhat, and thus is it conceivable that HotSpot has enough infrastructure to implement allInstances. But why should it? What would be the point? First, allInstances is not the right primitive for atomic install. Second, there is a whole set of other operations missing for you to be able to anything like atomic install in HotSpot.

Gilad Bracha said...

Peter,

Let me explain what I meant by my comment:

When software needs to deal with needs beyond planned requirements, it typically fails miserably. We're lucky if software works for the purpose it was designed for - even when it does, it usually doesn't do terribly well. So when a system succeeds in meeting an unanticipated requirement, it is a big deal.

Maybe I should have said "It is a tribute to the design of Smalltalk-80 ...". But the same person, Dan Ingalls, designed the core aspects of both. In any case, I see no reason to belabor the point any further.

Unknown said...

Hi Gilad,

it is no more difficult to do in a JIT than in an interpreter because in a JIT one can always throw all existing code away, right?

@Peter, yes, allInstances is problematic. In the VisualWorks VM I added an allInstancesOfAnyOf primitive that takes an Array of classes as its argument and answers all the instances of any of the classes in that array, so that one can collect the set of instances in a single pass over the heap. I added it for Steve Dahl who wrote VisualWorks' class builder precisely so that he could implement updating more than one class as a transaction.

Some simple refactorings require this transactional approach. For example, moving an instance variable from a subclass to a superclass or vice verce will only preserve the value of the instance variable in extant instances if done atomically.

Rob G said...

As usual a thought-provoking article. Keep them coming.

As an aside I was interested to see Newsqueak (sic) cited as an influence on Google's Go language (http://golang.org). I'm staggered - I see no evidenceof influence personally.

Gilad Bracha said...

Rob,

There is no influence. Newsqueak was an unrelated language developed by Rob Pike, one of the developers of Go. Nothing to with Newspeak.

Anyway, thanks for the nice comments!

Ismael Juma said...

Hi Gilad,

"BTW, I don't recall JVMDI supporting allInstances, and a quick review shows no such functionality. Any one who knows different - please correct me and provide a specific reference on how it's done; it could be really useful."

Is any of the following useful?

From the Eclipse Debug FAQ[1]:

"There are 3 significant debugging features that you can take advantage of when using a Java 6.0 or later VM (note that not all VMs may support all of these features).

All Instances - Queries the VM for all instantiated objects of a specific type and lists them in a popup dialog (which can be persisted to the Expressions view). This feature can be accessed from the run menu and several context menus. You must have a java variable selected in the variables or expressions views, or a java type or constructor of a type selected in the Java editor.

All References - Queries the VM for all objects that hold references to a specific object and displays them in a popup dialog (which can be persisted to the Expressions view). This feature can be accessed from the run menu and certain context menus when you have a java variable selected in the variables or expressions views."

Ismael

[1] http://wiki.eclipse.org/Debug_FAQ

Gilad Bracha said...

Isamel:

Thanks. That could be very useful, though the issue remains that it isn't universally supported by the Java platform.