Tuesday, December 09, 2008

Living without Global Namespaces

Newspeak differs from most programming languages in that it doesn’t provide a global namespace. And it differs from most imperative programming languages, because it has no static state.

I’ve spoken and written a fair amount about why the absence of static state is a good thing . What I haven’t discussed much is how you actually organize programs in this way. There have been a lot of questions along these lines. This post is an attempt to answer some of them.

Caveat: Some of the details here differ in the current prototype. Some of the features are still incomplete. What's described here is how things are supposed to work. We're not far from that.

First, let’s tackle the question of static state. It should be obvious: anything that you expected to put in a static variable goes in an instance variable of a module. What about singleton classes? How do I ensure that there’s only one instance? The easiest way is to initialize a read only slot of a module with an object literal. What happens if there are multiple instances of the module declaration? Well, each module has its own “singleton”. That’s exactly what happens with singleton classes in Java when they are defined by multiple class loaders.

What if your class defines some service process and you need to be really sure there’s only one in the entire system? First, in many cases you may find that the system in question is your subsystem, defined by your modules, and the answer above applies.

Now if you really mean “the entire system”, then you need to control that via some state in the platform object - through its links to the world’s state (e.g., the file system) or by having some registry in the platform object. Of course, not all code may see the true platform object, so it isn’t really global either; but it won’t matter.

Having no static state doesn’t preclude having a global namespace, as long as that namespace doesn’t contain any stateful objects. The original plan for Newspeak was to have a global namespace of pure values, structured as an inversion of the internet domain namespace. This would have been much like the convention for naming Java packages (except that the scopes of namespaces would nest properly, as you’d expect). It was the only idea from Java that I saw a use for in Newspeak. It’s a good idea, but it turns out to be unnecessary.

So, given no global namespace, what can I write at the top level? Remember, I can’t refer to any names, even things like Object or String that presumably exist in every implementation. This seems awkward. Not to worry - we won’t be writing SKI combinators or even plain old lambdas.

We might be able to write some literal expressions like 1 + 2, but that isn’t all that interesting, and isn’t even necessary. What we need to write are things that produce new kinds of objects, like classes.

Happily, we can write a a top level class declaration, with one caveat: A top level class declaration cannot declare a superclass explicitly since there is no way to name it, because there is no enclosing namespace. In that case, by special dispensation, the superclass will be the class Object provided by the underlying platform. Similar rules apply to object literals (which can be thought of as “anonymous classes done right”).

Ok, so now we can write a class, which can have other classes nested inside it, so it can be an entire library; and since there is no surrounding namespace, it is necessarily independent of any specifics of the environment - it is a module declaration. An example of such a module declaration would be the Newspeak AST

class NewspeakAST usingLib: platform { ....

... lots of nested AST classes ....

}

A similar class would be CombinatorialParsing library I’ve written about before.

There’s just one little problem. How do I use such a class? I gave it a name, but no one can refer to it, since there isn’t any surrounding namespace for the name to be bound!

Suppose I want to create a parser that builds an AST, using the two classes mentioned above. I need a grammar, which should be defined by a subclass of the parser library, and the parser class itself would in turn be a subclass of the grammar. Call these classes Grammar and Parser.
Since I can’t name the superclass of Grammar, I’ll just define it as a mixin, and worry about how to pair it with the superclass later.

class Grammar = { ....}

Likewise with Parser.

class Parser usingLib: platform astLib: ast = { ...}

That way I can define all the actual code required. The problem remaining is how to link all these pieces together.

If I actually had a namespace where I could refer to the pieces, I could write linking code like:

“confused”

main: platform {

MyGrammar = Grammar |> CombinatorialParsing usingLib: platform.

MyParser = Parser usingLib: platform astLib: NewspeakAST |> MyGrammar.

return:: MyParser parse: ‘a string in my language, perhaps?’

}

So how would I go about creating such a namespace? This is ultimately a question of tooling. Suppose my IDE lets me load class objects dynamically - say by reading in serialized class objects saved in files on disk. When it loads such a class object, it can reflect on it to find out its name, and store the class object in a slot of the same name in some new object it creates.

If I choose to load the classes, Grammar, Parser, CombinatorialParsing and NewspeakAST, I can create an object that is precisely the namespace I needed. I can then modify its class by adding the main: method listed above. This object is now an application, whose behavior is defined by its main: method. I can serialize this application object to disk.

Running my program then amounts to deserializing the object, and invoking its main: method with an object representing the current platform.

I’ve glossed over some crucial details here. We don’t really want to serialize the entire object, as it points to objects in our IDE, like Object, Class and a few others. These are standard, and we can cut off the object graph with symbolic links at these standard points, and have the deserializer hook up their equivalents on the destination.

Is using the IDE this way cheating? After all, it ultimately resorts to using the namespace of the underlying file system (or the network, or a global IDE namespace, depending where the IDE fetches class objects from). I think not. The truth is that this is what any language in the world does at some level. Whether we rely on a compiler that uses a CLASSPATH environment variable to define a set of local directories, or on the IDE, or on makefiles in a given directory to link separately compiled files, it is ultimately the same: some tool uses the operating system to find pieces of program.

We don’t have to use the IDE; we could use a preprocessor that understood directives that referred to classes in the file system instead. It could even use something as inane as CLASSPATH. Of course, I’m not really recommending that.

My key point is that the language needs nothing more than objects to serve as its namespaces.

Friday, December 05, 2008

Unidentified Foreign Objects (UFOs)

I recently found out that Newspeak’s basic foreign function interface (FFI), called Aliens, is being made available in Squeak (though that will require new VMs with the required primitives). Thanks to John McIntosh for doing this.

I should also thank Eliot Miranda for most of the original work on aliens, and Vassili Bykov, Peter Ahe and Bill Maddox for the rest. Also thanks to Lars Bak, whose work on the Strongtalk FFI inspired the VM level view of aliens; and to Dave Ungar, who was the first to understand that objects were all you needed on the language side of an FFI. Lastly, this post benefited immensely from conversations with Vassili.

So I figured I’d write a little bit about Aliens from a high level perspective. As usual, the ideas apply to programming languages in general.

In Smalltalk, there isn’t a standard FFI. Various dialects provide different solutions, with varying degrees of functionality, performance and ease of use. To be honest, they are usually a poor fit with the surrounding language and fairly awkward to use. This inhibits Smalltalk’s interoperability with the rest of the world. I’d argue that the absence of a good, standard FFI has cost the Smalltalk community dearly.

In Java, by contrast, native methods and JNI provide a standardized FFI. This mechanism is far from perfect, but at least there is a more or less standard solution.

What these and other systems have in common is support for a special construct (such as the native modifier for methods, or declarations like extern C, or the truly ugly ad hoc FFI syntax extensions used in various Smalltalks) for foreign functions.

Newspeak’s FFI was strongly influenced by the Strongtalk FFI; but unlike Strongtalk, Newspeak doesn’t have a special syntax for foreign calls. As Self showed many years ago, one doesn’t really need a special syntax for the FFI. The foreign functions, APIs, DLLs etc. can all be represented as objects. They just happen to be foreign objects.

The idea of a foreign object, which we call an alien, is at the foundation of the Newspeak FFI.

For starters, any decent language should be able to represent functions as values; and in an object-oriented language, these values are objects, accessed via a standard interface. Foreign functions are just a different implementation of that interface.

Another natural way to model a foreign function is as a method defined on a foreign object. For example, one can view an entire DLL as an object with a set of methods corresponding to the functions defined by the DLL. Better yet, we could represent an entire API as an object, independently of what DLLs actually defined it.

Aliens can be defined for different foreign languages; for example, while Alien is used to interface with C, we also have a class called ObjectiveCAlien that can be used to interface with ObjectiveC, which is the native language on MacOS X. C Aliens and ObjectiveC Aliens do not interfere with each other, and when/if we need to add Java Aliens or CLR Aliens we can do that as well.

The alien approach is also a good fit with security: one need not be concerned that code may bypass high level language safety guarantees by calling out to C; untrusted code can be prevented from doing that, simply by not providing any Alien library objects to it.

Newspeak’s C Alien implementation is fast, but also dangerous. An alien is basically a blob of memory. The user of an Alien is responsible for interpreting and accessing that data correctly. There is no checking being done for you.

Tangent: It's worth noting that the basic Alien layer may evolve further; for example, we aren't thrilled with the practice of subclassing Alien. It's not clear if the Alien class really needs to change, or just the pattern of using it.

On top of this foundation, safer and/or more convenient abstractions can be built. We have built objects that support not just methods corresponding to the functions of an API, but also methods that provide factories for the various datatypes used in the function’s signatures, including those defined by macros. These objects wrap the basic alien API, and help with error prone book keeping - converting between Newspeak types (e.g., Strings) and foreign types, freeing aliens after use etc.

At the moment, both the declarations of low level aliens and higher level APIs are constructed manually, which is tedious and error prone. We’ve been planning on a higher level tool called CSlick, which would allow you to specify a set of .h files and the requisite DLLs, and obtain an object that supports the desired functions automatically.

As a first approximation, you could think of CSlick as a function:

CSlick: List -> List -> ForeignAPI

The signature above is deliberately curried, because you may actually want to be able to specify just the header files, and later bind different DLLs to provide the actual functionality, just as a .h file can be associated with different .c files.

When this will happen is anyone’s guess right now; but Vassili has done this before (in the context of Lisp) and I’m sure he can do it again.

The resulting foreign API should incorporate the low level alien API, and, as much as possible, a higher level API as well.

The CSlick implementation will need to know how to parse C header files, and how to reflectively manufacture the low level code that actually invokes the C functions. Fortunately we have a strong parsing infrastructure, so that isn’t as daunting as it sounds.

When I’ve told people about CSlick, they often mention SWIG. However, I believe CSlick can be made substantially easier to use than SWIG. SWIG has to cope with multiple languages, each with a pre-existing story on how to do foreign calls. In contrast, we can integrate CSlick more tightly with the language. Ultimately, that should translate to a simpler model for the user.

The key take away is that objects are all you really need to interact with foreign programming languages. They are better than built in language constructs in terms of ease of use, security, and multiple language support. As usual, less is more.