Tuesday, June 30, 2009

A Ban on Imports

This is not, of course, an essay on restricting free trade. Rather, this post is about the evils of the import clause, which occurs in one form or another across a wide array of programming languages, from Ada, Oberon and the various Modulas, through Java and C#, and on to F# and Haskell.

The import clause should be banned because it undermines modularity in a deep and insidious way. This is a point I’ve attempted to convey time and time again, with only limited success. I will now try to illustrate the problem via a hardware inspired example.

Consider the not-so-humble MP3 player. An MP3 player is a hardware module. The market is full of them, as well as other hardware modules they can plug in to. For example, sound systems where on can dock an MP3 player and have it play on stereo speakers.

Let’s try and describe the analog of such a sound system using programming language modularity constructs that rely on imports:

module SoundSystem

import MP3Player;

... wonderful functionality elided ...


end


I want to describe how my sound system works, separately from the description of how an MP3 player works. I would like to later plug in a particular MP3 player, say a Zune(tm) or an iPod(tm)

tm: Zune and iPod are trademarks of Microsoft and Apple respectively, two companies with armies of lawyers who might harass me if I do not state the obvious.

Now the first problem is that neither Zune or iPod are named MP3Player. If I want to connect my sound system to a Zune, I will have to edit the definition of SoundSystem to name the specific module I want to import.

If you’re very petty, you might say that Zune and iPod do not share a common interface and cannot be docked into the same sound system. Imagine that we wish to use our sound system with an iPhone (tm) and an iPod Touch (tm) of some compatible generation.

tm: iPhone and iPod Touch are trademarks of Apple.

Say I decide to go with a Zune.

module SoundSystem

import Zune;

... wonderful functionality elided ...


end


Later I change my mind for some reason, and want to hook up my system to an iPod. It’s easy: I just edit the definition of my system again, to import iPod:

module SoundSystem

import iPod;


... wonderful functionality elided ...


end


The question you should be asking is: Why should I edit the definition of my system each time I change the configuration? In reality, it is unlikely that I am actually the designer of SoundSystem. I probably don’t even have access to its definition. I just want to configure it to work with my MP3 player.

The problem is that import confounds module definition and module configuration. Module definition describes the design of a module; module configuration describes how one hooks up different modules. The former has to do with module internals; the latter should be done externally to the modules involved, to allow them to be used in any context where they could function.

We clearly want our sound system to abstract over the specific player being plugged in to it. Any player with a compatible interface will do. A well known mechanism for abstracting things is parameterization. We might be happier if we defined our sound system parametrically with respect to the MP3 player

module SoundSystem(anMP3Player){
... great wonders using anMP3Player ...
}


We could them configure our system to use an iPod:

SoundSystem(iPod);

or a Zune

SoundSystem(Zune);

without having to modify (or even have access to) the source code for the definition of SoundSystem. Hurray!

The module definition looks a lot like a function, and the configuration code looks like a function application. This is very suggestive. Indeed, ML introduced a module system based on function-like things called functors a quarter century ago. But there’s a bit more to this.
These hardware pieces tend to plug in to each other. For example, the definition of IPod is parametric too:

IPod(dockingStation){... even greater wonders ...}

Our sound system does its thing by behaving like a docking station. It and the MP3 player are mutually recursive modules. Configuration therefore requires support for mutual recursion (which is not allowed in Standard ML):

letrec {

dock = SoundSystem(mp3Player);


mp3Player = IPod(dock);


} in dock;


If this notation is unfamiliar, please brush up on your functional programming skills before you become unemployable. Basically, ignore the first and last line, and treat the two lines involving = as equations.

So the module definitions are a lot like functions that yield modules. You could also think of module definitions as classes yielding instances. The instances are like physical hardware modules.

Now we can add another sound system and use our old Zune

letrec {

dock2 = SoundSystem(oldMP3);


oldMP3 = Zune(dock2);


}
in dock2;

This is what is often called side by side deployment - multiple instances of the same design, configured differently.

Tangent: Yes, Virginia, you can achieve that sort of thing in Java despite imports, using class loaders. Imports hardwire names into your code, and class loaders can counteract that by letting you define multiple namespaces. These can have multiple copies of your code, potentially hardwired to different things (even though they all have the same name). If you think class loaders offer a simple, clean way of doing things that is easy to learn, use, understand and debug, this post is not for you. Nor will any amount of OSGi magic on top fundamentally change things.

We might also choose to define things differently

module SoundSystem(MP3Player) {

player = MP3Player(self);


...


}


Here, we are passing module definitions as parameters. We are also referring to SoundSystem’s current instance from within itself - a lot like classes, no? We might configure things thusly

SoundSystem(iPod);

So it looks like mutual recursion and first class module definitions are very natural things to have. And yet traditional languages do not support this - even though many languages have constructs like classes and functions that are first class values and can be defined in a mutually recursive fashion.

One problem with using these constructs to define modules is that they are usually able to access anything in the global namespace. This makes it very hard to avoid implicit dependencies.

Interestingly, the global namespace is exactly what import requires. Since we don’t need or want import, let’s do away with it and the global namespace. We clearly will get a much more modular system without it; but wait - there seems to be one place where we really want the global namespace. That is when we write our configuration code, the code that wires our modules together.

That’s fine - there are a number of solutions for that. It isn’t always clear that our configuration language is the same language as the programming language(s) that define our modules, for example. If you write a makefile, the global namespace is defined by your file system and accessible within the makefile. Not that I really want to recommend make and its ilk.

I think we do want to code our configuration in a nice general purpose high level programming language. One solution is to have our IDE provide us with an object representing the known global namespace, and write our configuration code with respect to that namespace object. This is essentially what we do in Newspeak.

In the next post, I’ll discuss more of the advantages of this approach, contrast how Newspeak handles things with other languages with powerful module systems, like Scheme (which for the past decade or so has had a system called Units that is quite close to what I’ve discussed so far) and ML, and show once more how one actually does configuration in Newspeak.

To be continued.