Tuesday, December 09, 2008

Living without Global Namespaces

Newspeak differs from most programming languages in that it doesn’t provide a global namespace. And it differs from most imperative programming languages, because it has no static state.

I’ve spoken and written a fair amount about why the absence of static state is a good thing . What I haven’t discussed much is how you actually organize programs in this way. There have been a lot of questions along these lines. This post is an attempt to answer some of them.

Caveat: Some of the details here differ in the current prototype. Some of the features are still incomplete. What's described here is how things are supposed to work. We're not far from that.

First, let’s tackle the question of static state. It should be obvious: anything that you expected to put in a static variable goes in an instance variable of a module. What about singleton classes? How do I ensure that there’s only one instance? The easiest way is to initialize a read only slot of a module with an object literal. What happens if there are multiple instances of the module declaration? Well, each module has its own “singleton”. That’s exactly what happens with singleton classes in Java when they are defined by multiple class loaders.

What if your class defines some service process and you need to be really sure there’s only one in the entire system? First, in many cases you may find that the system in question is your subsystem, defined by your modules, and the answer above applies.

Now if you really mean “the entire system”, then you need to control that via some state in the platform object - through its links to the world’s state (e.g., the file system) or by having some registry in the platform object. Of course, not all code may see the true platform object, so it isn’t really global either; but it won’t matter.

Having no static state doesn’t preclude having a global namespace, as long as that namespace doesn’t contain any stateful objects. The original plan for Newspeak was to have a global namespace of pure values, structured as an inversion of the internet domain namespace. This would have been much like the convention for naming Java packages (except that the scopes of namespaces would nest properly, as you’d expect). It was the only idea from Java that I saw a use for in Newspeak. It’s a good idea, but it turns out to be unnecessary.

So, given no global namespace, what can I write at the top level? Remember, I can’t refer to any names, even things like Object or String that presumably exist in every implementation. This seems awkward. Not to worry - we won’t be writing SKI combinators or even plain old lambdas.

We might be able to write some literal expressions like 1 + 2, but that isn’t all that interesting, and isn’t even necessary. What we need to write are things that produce new kinds of objects, like classes.

Happily, we can write a a top level class declaration, with one caveat: A top level class declaration cannot declare a superclass explicitly since there is no way to name it, because there is no enclosing namespace. In that case, by special dispensation, the superclass will be the class Object provided by the underlying platform. Similar rules apply to object literals (which can be thought of as “anonymous classes done right”).

Ok, so now we can write a class, which can have other classes nested inside it, so it can be an entire library; and since there is no surrounding namespace, it is necessarily independent of any specifics of the environment - it is a module declaration. An example of such a module declaration would be the Newspeak AST

class NewspeakAST usingLib: platform { ....

... lots of nested AST classes ....

}

A similar class would be CombinatorialParsing library I’ve written about before.

There’s just one little problem. How do I use such a class? I gave it a name, but no one can refer to it, since there isn’t any surrounding namespace for the name to be bound!

Suppose I want to create a parser that builds an AST, using the two classes mentioned above. I need a grammar, which should be defined by a subclass of the parser library, and the parser class itself would in turn be a subclass of the grammar. Call these classes Grammar and Parser.
Since I can’t name the superclass of Grammar, I’ll just define it as a mixin, and worry about how to pair it with the superclass later.

class Grammar = { ....}

Likewise with Parser.

class Parser usingLib: platform astLib: ast = { ...}

That way I can define all the actual code required. The problem remaining is how to link all these pieces together.

If I actually had a namespace where I could refer to the pieces, I could write linking code like:

“confused”

main: platform {

MyGrammar = Grammar |> CombinatorialParsing usingLib: platform.

MyParser = Parser usingLib: platform astLib: NewspeakAST |> MyGrammar.

return:: MyParser parse: ‘a string in my language, perhaps?’

}

So how would I go about creating such a namespace? This is ultimately a question of tooling. Suppose my IDE lets me load class objects dynamically - say by reading in serialized class objects saved in files on disk. When it loads such a class object, it can reflect on it to find out its name, and store the class object in a slot of the same name in some new object it creates.

If I choose to load the classes, Grammar, Parser, CombinatorialParsing and NewspeakAST, I can create an object that is precisely the namespace I needed. I can then modify its class by adding the main: method listed above. This object is now an application, whose behavior is defined by its main: method. I can serialize this application object to disk.

Running my program then amounts to deserializing the object, and invoking its main: method with an object representing the current platform.

I’ve glossed over some crucial details here. We don’t really want to serialize the entire object, as it points to objects in our IDE, like Object, Class and a few others. These are standard, and we can cut off the object graph with symbolic links at these standard points, and have the deserializer hook up their equivalents on the destination.

Is using the IDE this way cheating? After all, it ultimately resorts to using the namespace of the underlying file system (or the network, or a global IDE namespace, depending where the IDE fetches class objects from). I think not. The truth is that this is what any language in the world does at some level. Whether we rely on a compiler that uses a CLASSPATH environment variable to define a set of local directories, or on the IDE, or on makefiles in a given directory to link separately compiled files, it is ultimately the same: some tool uses the operating system to find pieces of program.

We don’t have to use the IDE; we could use a preprocessor that understood directives that referred to classes in the file system instead. It could even use something as inane as CLASSPATH. Of course, I’m not really recommending that.

My key point is that the language needs nothing more than objects to serve as its namespaces.

2 comments:

  1. "Happily, we can write a a top level class declaration, with one caveat: A top level class declaration cannot declare a superclass explicitly since there is no way to name it, because there is no enclosing namespace. In that case, by special dispensation, the superclass will be the class Object provided by the underlying platform. Similar rules apply to object literals (which can be thought of as “anonymous classes done right”)."

    In terms of exposing capabilities to a module, I think of it in these terms: there are two ways you can expose capability to a module. The most obvious way is through module instantiation (the objects passed in expose capability). The second is through the compiler (i.e. the binding to Object for a basic superclass and the classes used for literal objects, etc). I suppose the compiler could actually be designed such that even it didn't wire these things up and instead, that happened at instantiation time using some instantiator object that binds literals and such.

    So, by changing the bindings in the compiler (or instantiator), you could also have control over the literals and the default object used for superclasses. This might come in handy for the truly paranoid.

    ReplyDelete
  2. Stephen,

    What you say is quite true. The language compiler always provides some capabilities. The language and platform are designed so that these capabilities are stateless. A module can do no harm by using them beyond using up memory and processor cycles.

    Controlling the literals si something I hope to achieve by making them be the results of message sends as well (with appropriate caching semantics).

    Changing the default Object is indeed a compiler or platform change.

    ReplyDelete