Thursday, July 30, 2009

The Miracle of become:

One of Smalltalk’s most unique and powerful features is also one of the least known outside the Smalltalk community. It’s a little method called become: .

What become: does is swap the identities of its receiver and its argument. That is, after

a become: b

all references to the object denoted by a before the call point refer to the object that was denoted by b, and vice versa.

Take a minute to internalize this; you might misunderstand it as something trivial. This is not about swapping two variables - it is literally about one object becoming another. I am not aware of any other language that has this feature. It is a feature of enormous power - and danger.

Consider the task of extending your language to support persistent objects. Say you want to load an object from disk, but don’t want to load all the objects it refers to transitively (otherwise, it’s just plain object deserialization). So you load the object itself, but instead of loading its direct references, you replace them with husk objects.

The husks stand in for the real data on secondary storage. That data is loaded lazily. When you actually need to invoke a method on a husk, its doesNotUnderstand: method loads the corresponding data object from disk (but again, not transitively).

Then, it does a become:, replacing all references to the husk with references to the newly loaded object, and retries the call.

Some persistence engines have done this sort of thing for decades - but they usually relied on low level access to the representation. Become: lets you do this at the source code level.

Now go do this in Java. Or even in another dynamic language. You will recognize that you can do a general form of futures this way, and hence laziness. All without privileged access to the workings of the implementation. It’s also useful for schema evolution - when you add an instance variable to a class, for example. You can “reshape” all the instances as needed.

Of course, you shouldn’t use become: casually. It comes at a cost, which may be prohibitive in many implementations. In early Smalltalks, become: was cheap, because all objects were referenced indirectly by means of an object table. In the absence of an object table, become: traverses the heap in a manner similar to a garbage collector. The more memory you have, the more expensive become: becomes.

Having an object table takes up storage and slows down access; but it does buy you a great deal of flexibility. Hardware support could ease the performance penalty. The advantage is that many hard problems become quite tractable if you are willing to pay the cost of indirection via an object table up front. Remember: every problem in computer science can be solved with extra levels of indirection. Alex Warth has some very interesting work that fits in this category, for example.

Become: has several variations - one way become: changes the identity of an object A to that of another object B, so that references to A now point at B; references to B remain unchanged. It is often useful to do become: in bulk - transmuting the identities of all objects in an array (either unidirectionally or bidirectionally). A group become: which does it magic atomically is great for implementing reflective updates to a system, for example. You can change a whole set of classes and their instances in one go.

You can even conceive of type safe become: . Two way become: is only type safe if the type of A is identical to that of B, but one way become: only requires that the new object be a subtype of the old one.

It may be time to reconsider whether having an object table is actually a good thing.

Saturday, July 11, 2009

A Ban on Imports (continued)

In my previous post, I characterized imports as evil, and promised to expand upon non-evil (yay verily, even good) alternatives. First, to recap:

Imports are used for linking modules together. Unfortunately, they are embedded within the modules they link instead of being external to them. This embedding makes the modules containing the imports dependent on the specific linkage configuration the imports represent.

Workarounds like dependency injection are just that: workarounds (e.g., see this post). They are complex, cumbersome, heavyweight. OSGi even more so. Above all, they are unnecessary - provided the language has adequate modularity constructs.

So, which languages have sufficient modularity support? I know of only two such languages: Newspeak and PLT Scheme. ML has a very elaborate module system, but ultimately it does not meet my requirements.

Modules and their definitions (these are two distinct things) should be first class and support mutual recursion. This isn’t the case in ML, though some dialects do support mutual recursion.

Tangent: The difficulty in ML, incidentally, is rooted in the type system. It is very hard to typecheck the kind of abstractions we are talking about. Worse, if you want to make your type declarations modular, your modules end up having types as members. This can lead you into deep water with types of types (making your type system undecidable). To avoid that trap, ML opts to stratify the system, so that modules (that contain types) are not values (that have types).

Not surprisingly then, progress on these issues comes from the dynamically typed world. Over a decade ago, the Schemers introduced Units.

The biggest difference between Newspeak modularity constructs and units is probably the treatment of inheritance. In Newspeak our module definitions are exactly top level classes, which reduces the number of concepts while allowing module definitions to benefit from inheritance.

There are strong arguments against inheritance of module definitions. For example, you cannot reliably add members to a module definition, because they might conflict with identically named members in the heirs of that definition. Specifying a superclass (or super module definition) looks like a hardwired dependency as well.

On the other hand, being able to reuse module definitions via inheritance is very attractive. Especially if you can mix them in freely.

Ultimately, we decided that the benefits of unifying classes and module definitions outweighed the costs.

Take the argument above regarding extending module definitions with new members. Newspeak was designed with an eye toward a completely networked world, where software is a service, not an artifact. In such a world, you can find all your heirs - just as if you were working on your own private application in your IDE. So if you need to add a member to a module definition, you should be able check who is mixing it in and what names they have added.

Tangent: This may still sound radical today, but this world is moving into place as we speak:
V8 gives the web browser the performance needed to be a platform for serious client software.
HTML 5, Gears etc. provide such software with persistent storage on the client
Chrome OS makes it obvious (as if it wasn’t clear enough before) that this in turn commoditizes the OS, and that the missing pieces will keep coming.

Likewise, in the absence of a global namespace, top level classes do not inherit from any specific superclass (and nested classes don’t either because all names are late bound) . Overall, the downside of allowing inheritance on module definitions doesn't apply in Newspeak.

The upside compared to conventional constructs is huge. It means you can easily take entire libraries, create multiple instances of them (each with its own configuration), mix them into new definitions, write polymorphic code that can work simultaneously with different instances or even different implementations of the API etc. You can store the libraries and their instances in variables, pass them as parameters, return them from computations, hold them in data structures, serialize them to disk or over the wire - all with the same mechanisms you use for ordinary classes and objects.

This economy of mechanism is important. It means you don’t have to learn a variety of specialized and complex tools to build modular systems. The same basic tools you use to implement basic CS101 examples will serve across the board. This will carry through to other areas like tooling: an object inspector can be used to inspect a “package”, for example. Altogether, your system can be much smaller - which makes it easier to learn, faster to load, likelier to fit on small devices etc. Simplicity is an advantage in itself.

As I explained in the first half of this series, the only need for a global namespace is for configuration: linking the pieces of an application together. There are several ways you can deal with the configuration/linkage issue. It’s a tooling issue. We use the IDE, as I described in an older post. So using the running example from part 1, we can write:

class SoundSystem usingPlatform: platform andPlayer: player = {

|

(* dependencies on platform might include things like the following: *)

List = platform Collections List. (* You can see how this replaces an import *)

mp3Player = player usingPlatform: platform withDock: self.

|
}{ ... }


class IPhone usingPlatform: platform withDock: dock = {


|

(* dependencies on platform elided *)

myDock = dock.

|

}{ ... }


class Zune usingPlatform: platform withDock: dock = {


|


(* dependencies on platform elided *)

theirDock = dock.

|

}{ ... }

and then create instances in the IDE (which provides us with a namespace where SoundSystem, iPhone(tm) and Zune(tm) are all bound to the classes defined above):

sys1:: SoundSystem usingPlatform: Platform new andPlayer: IPhone.

sys2:: SoundSystem usingPlatform: Platform new andPlayer: Zune.

tm: Did you know? iPhone is trademark of Apple; Zune is a trademark of Microsoft.

Variations on the above are possible; hopefully, you get the idea. If not - well, don’t worry, I probably won’t explain it again.

The absence of a global namespace has additional advantages of course: there’s no static state, and it’s good for security (but that is for another day).