A place to be (re)educated in Newspeak

Monday, December 31, 2007

More on Modules

My posts seem to raise more questions than they answer. This is as it should be, in accordance with the Computational Theologist Full Employment Act. In this post, I’ll try and answer some of the questions that arose from my last one.

How does one actually hook modules together and get something going? As I mentioned before, module definitions are top level classes - classes that are defined in a namespace, rather than in another class.

Defining a top level class makes its name available in the surrounding namespace. More precisely, it causes the compiler to define a getter method with the name of the class on the namespace; the method will return the class object.

Since a module definition is just a class, one needs to instantiate it, by calling a constructor - which is a class side method. Continuing with the example from the previous post:

MovieLister finderClass: ColonDelimitedMovieFinder

Now this isn’t quite realistic, since ColonDelimitedMovieFinder probably needs access to things like collections and files to do its job. So it’s probable that it takes at least one parameter itself. The typical situation is that a module definition takes a parameter representing the necessary parts of the platform libraries. It might look something like this:

ColonDelimitedMovieFinder usingLib: platform = (
|
OrderedCollection = platform Collections OrderedCollection.
FileStream = platform Streams FileStream.
|
)...



So we’d really create the application this way:

MovieLister finderClass: (ColonDelimitedMovieFinder usingLib: Platform new)

where Platform is a predefined module that provides access to the built-in libraries.

Bob Lee points out that if I change MovieLister so that it takes another parameter, I have to update all the creation sites for MovieLister, whereas using a good DIF I only declare what needs to be injected and where.

In many cases, I could address this issue by declaring a secondary constructor that feeds the second argument to the primary one.

Say we changed MovieLister because it too needed access to some platform library:

class MovieLister usingLib: platform finderClass: MovieFinder = ...


We might be able to define a secondary constructor

class MovieLister usingLib: platform finderClass: MovieFinder = (
...
): (
finderClass: MovieFinder = (
^usingLib: Platform new finderClass: MovieFinder
)
)


There are however two situations where this won’t work in Newspeak.

One is inheritance, because subclasses must call the primary constructor. I showed how to deal with that in one of my August 2007 posts - don’t change the primary constructor - change the superclass.

class MovieLister finderClass: MovieFinder = NewMovieLister usingLib: Platform new finderClass: MovieFinder
(...)


The other problematic case is for module definitions. In most cases, the solutions above won’t help; they won’t be able to provide a good default for the additional parameter, because they won’t have access to the surrounding scope. For this last situation I have no good answer yet. I will say that the public API of a top level module definition should be pretty stable, and the number of calls relatively few.

So overall, I think Bob makes an important point - DIFs give you a declarative way of specifying how objects are to be created. On the other had, it gets a bit complicated when different arguments are needed in different places, or if we don’t want to compute so many things up front at object creation time. Guice has mechanisms to help with that, but I find them a bit rich for my blood. In those cases, I really prefer to specify things naturally in my code.

Another advantage of abstracting freely over classes is that you can inherit from classes that are provided as parameters.

class MyCollections usingLib: platform = (
| ArrayList = platform ArrayList. |
)(
ExtendedArray List = ArrayList (...)
)


Now, depending what library you actually provide to MyCollections as an argument, you can obtain distinct subclasses (in fact, there’s an easier way to do this, but this post is once again getting too long). Correct me if I’m wrong, but I don’t think a DIF helps here.

You can also do class hierarchy inheritance: modify an entire library embedded within a module by subclassing it and changing only what’s needed. This is somewhat less modular (inheritance always causes problems) but the tradeoff is well worth it in my opinion.

I spoke about class hierarch inheritance at JAOO, and will likely speak about it again in one or more of my upcoming talks on Newspeak, at Google in Kirkland on January 8th, at FOOL on January 13th, or at Lang.Net 2008 in Redmond in late January.

I’m trying to make each of these talks somewhat different, but they will necessarily have some overlap. I hope that some of these talks will make it onto the net and make these ideas more accessible.

Sunday, December 16, 2007

Lethal Injection

Some months ago, I wrote a couple of posts about object construction and initialization. I made the claim that so-called dependency-injection frameworks (DIFs) are completely unnecessary in a language like Newspeak, and promised to expand on that point in a later post. Four months should definitely qualify as “later”, so here is the promised explanation.

I won’t explain DIFs in detail here - read Martin Fowler’s excellent overview if you need an introduction. The salient information about DIFs is that they are used to write code that does not have undue references to concrete classes embedded in it. These references are usually calls to constructors or static methods. These concrete references create undue intermodule dependencies.

The root of the problem DIFs address is that mainstream languages provide inadequate mechanisms to abstract over classes.

Terminology rant: DIFs should more properly be called dependee-injection frameworks. A dependency is a relationships between a dependent (better called depender) and a dependee. The dependencies are what we do not want in our code; we certainly don’t want to inject more of them. Instead, DIFs inject instances of the dependees, so the dependers don’t have to create them.

DIFs require you write your code in a specific way, where you avoid creating instances of dependees. Instead, you make sure that there is a way to provide the dependee instance (in DIF terminology, to inject it) from outside the object. You then tell the framework where and what to inject. The reason injection traffics in instances rather than the classes themselves is because there’s no good way to abstract over the classes.

Having recapped DIFs, lets move on to Newspeak. Newspeak modules are defined in namespaces. Namespaces are simply objects that are required to be deeply immutable; they are stateless.

Tangent: This ensures that there is no global or static state in Newspeak, which gives us many nice properties.

Namespaces are organized like Java packages, as an inversion of the internet domain namespace. Unlike Java packages, sub-namespaces can see their enclosing namespace.

A module is a top-level class, that is, a class defined directly within a namespace. Newspeak classes can nest arbitrarily, so a module can contain an entire class library or framework, which can in turn be subdivided into subsystems to any depth. Nested classes can freely access the lexical scope of their enclosing class.

Modules, like all classes, are reified as objects that support constructor methods. Recall that in Newspeak, a constructor invocation is indistinguishable from an ordinary method invocation. Objects are constructed by sending messages to (invoking virtual methods on) another object. That object may or may not be a class; it makes no difference. Hence all the usual abstraction mechanisms in the language apply to classes - in particular, parameterization.

Here is a trivial top level class, modeled after the motivating example for DIFs given in Fowler’s article:

public class MovieLister = (
|
private movieDB = ColonDelimitedMovieFinder from:’movies.txt’.
|)
(
public moviesDirectedBy: directorName = (
^movieDB findAll select:[:m |
m director = directorName
].
)


The idea is that MovieLister supports one method, moviesDirectedBy:, which takes a string that contains the name of a director and returns a collection of movies directed by said director. The point of Fowler’s example is that there is an undesirable dependency on a class, ColonDelimitedMovieFinder, embedded in MovieLister. If we want to use a different database, we need to change the code.

However, this code won’t actually work in Newspeak. The reason is that the enclosing namespace is not visible inside a Newspeak module. Any external dependencies must be made explicit by passing them to the module as parameters to a constructor method. These parameters could be other modules, namespaces, or classes and instances of any kind.

In this specific case, ColonDelimitedMovieFinder cannot be referenced from MovieLister. If we try and create a MovieLister by writing: MovieLister new, creation will fail with a message not understood error on ColonDelimitedMovieFinder. We’d have to declare a constructor for MovieLister with the movie finder as a parameter:

public class MovieLister finderClass: MovieFinder = (
|
private movieDB = MovieFinder from:’movies.txt’.
|)
(
public moviesDirectedBy: directorName = (
^movieDB findAll select:[:m |
m director = directorName
].
)


At this point, we can immediately see that we can replace ColonDelimitedMovieFinder with any class that supports the same interface, which was the object of the entire exercise. Newspeak won’t let you create a module with concrete external dependencies, because that wouldn’t really be a module, would it?

In Newspeak code in a module doesn’t have any concrete external dependencies, and no dependees need to be injected. What’s more we can subclass or mix-in any class coming in as a parameter - something a DIF won’t handle.

What about a subsystem within a module? What if I don’t want it using the same name binding as the enclosing module? I can explicitly parameterize my subsystem, though that requires pre-planning.

I can also override any class binding in a subclass. Newspeak is message-based, so all names are late-bound. Hence any reference to the name of a class can be overridden in a subclass. Classes can be overridden by methods or slots or other classes in any combination. So even if you do not explicitly parameterize your code to allow for another class to be used to construct an object, you can still override the binding of the class name as necessary.

In summary, Newspeak is designed to support, even induce, loose coupling. That’s the point of message based programming languages. DIFs are an expedient technique to reduce code coupling in the sad world of mainstream development, but in a language like Newspeak, they are pointless.