A place to be (re)educated in Newspeak

Thursday, April 30, 2009

The Need for More Lack of Understanding

People are always claiming that if only there was more understanding in the world, it would be a better place. This post will argue that less is more: we need less understanding - specifically more not understanding.

A couple of weeks ago I gave a talk at DSL Dev Con. One of the encouraging things that was evident there was the increased understanding that not understanding is important.

Tangent: While I'm advertising this talk, I might as well advertise my interview on Microsoft's channel 9 which explains the motivation for Newspeak and its relation to cloud computing.


Several programming languages support a mechanism by which a class or object can declare a general-purpose handler for method invocations it does not explicitly support.

Smalltalk was, AFIK, the first language to introduce this idea. You do it by declaring a method called doesNotUnderstand: . The method takes a single argument, that represents a reification of the call. The argument tells us the name of the method that was invoked, and the actual arguments passed. If a method m is invoked on an object that does not have a member method m (that is, m is not declared by the class of the object or any of its superclasses), then the object’s doesNotUnderstand: method is invoked. The default implementation of doesNotUnderstand:, declared in class Object, is to throw an exception. By overriding doesNotUnderstand: one can control the system’s behavior when such calls are made. Similar mechanisms exist in several other dynamic languages (e.g., missingMethod in Ruby and Groovy, _noSuchMethod_ in some dialects of Javascript).

Aficionados of these languages know that this is an extremely useful mechanism. However, users of mainstream object-oriented languages typically lack an appreciation of the power this mechanism can provide. I hope this post can be a small step in rectifying that situation.

DoesNotUnderstand: helps implement orthogonal persistence, lazy loading, futures, and remote proxies, to name a few. Recently, there’s been a surge of interest in domain-specific languages, and doesNotUnderstand: can help there as well.

We’ll use an example from my talk at DSL Dev Con. Consider how to interact with an OS shell like bash or csh from within a general purpose programming language. We’ll use Newspeak as our general purpose language (what were you expecting?), because it works best (in my unbiased opinion).

Suppose you want a listing of the files in the current directory. You could view ls as a method on a shell object, and write: shell ls. Of course, we won’t do something like the following Java code:

class Shell {
public Collection ls() {...}
... an infinity of other stuff
}

There are any number of commands that a shell can understand, depending on the current path and the executables in the directories on that path. We cannot plausibly enumerate them all as a fixed set of methods in Shell.

Instead, in we can define a class NewShell with a doesNotUnderstand: method to look up the name of the message in the shell’s path and execute it.

shell ls

If we write this code in the context of a subclass of NewShell, we can take advantage of Newspeak’s implicit receiver sends and just write

ls

Nice, but not quite good enough.

ls aFilename

doesn’t work at all. We don’t want to invoke ls immediately here - we need to gather its arguments in some way. One way to do this is to have doesNotUnderstand: return a function object, that can be fed its arguments. This is in fact what we do in our implementation. We call this object a CommandSession. To get a CommandSession to actually run the command, you call one of is value methods, with the desired arguments:

ls value: aFileName

This is less convenient for the simple case, where we need to write

ls value

to get ls to do something - but it is much more general.

What about modifiers, as in ls -l ? We can make simple cases work slightly better by defining - as a method on CommandSession :

ls -’l’


This is what the current implementation does.

The most general approach is to treat them as arguments

ls value: ‘-l’
ls value: ‘-l’ value: aFileName

An alternative might be to leave ls as it was originally, but allow

ls: aFileName

as well. In this version, doesNotUnderstand: checks to see if the message takes an argument (i.e., it ends with a colon). If so it strips the colon off the message name, creates CommandSession for the result, and calls its value: method with the argument. This handles modifiers pretty well

ls: ‘-l’

If there are multiple arguments, we can pass a tuple as the argument, and doesNotUnderstand: will unpack it as needed.

ls: {‘-l’. aFileName}

Now how about pipes?

We could introduce pipeValue methods, that produced an object that responded to the pipe operator. Or we could say that everything produced a CommandSession (and these understood “|”) and a special action is needed to get a result (sending it an evaluate or end message). This action is the analog of the newline that tells the shell to go ahead and evaluate. This could be dispensed with in a lazy setting.

Combining our second proposal above with this, we could say that value was used to derive a result. Then we can view the shell as a combinator library for CommandSessions. This does conflate two issues - the use of CommandSession to delay evaluation until a result is needed (the shell parses the input as a unit ensuring laziness) and the use of real combinators on byte streams.

We use NewShell in our IDE - for example, to manipulate subversion commands in the source control browser. It would be nice to refine it further, perhaps along the lines suggested above, but even in its current simplistic incarnation, it is quite useful.

As I noted at the beginning of this post, there a host of other cool uses for doesNotUnderstand:. I may return to those in another post.

Of course, if you are a fan of mandatory static typing, you aren’t allowed to use doesNotUnderstand: in your language. n the general case, it simply cannot be statically typed - which is an argument against mandatory typing, not against doesNotUnderstand:.

Just as switch statements, catch clauses and regular expressions all need defaults/catch-alls/wildcards, so does method dispatch. There are situations where you cannot avoid uncertainty. Reality is dynamic.

8 comments:

ahe said...

Another fun thing you can do with doesNotUnderstand: is accessing attributes of XML elements.

Carl Gundel said...

I think that Glorp uses doesNotUnderstand: to permit the construction of database queries directly in the Smalltalk language. Very cool.

See http://www.glorp.org/howglorpworks.html

Colin said...

Actually, it is method_missing in Ruby, not missingMethod.

Sorry to be that guy, but... :P

logicalmind said...

My gripe about this is that the doesNotUnderstand method being overloaded in ways you've described is not an interface contract. You have changed the semantics of "doesNotUnderstand" to be "understandsCertainThings". Why is this preferable to either:

1. Providing means for client code to discover shell commands that are available.
2. Providing a command dispatch interface that throws when the attempted command is not understood.

Technically, you could have simply one method on all objects such as:
object doSomething(object o)

And then attempt casts or pattern matching on the input parameters to determine functionality that is supported. This doesn't seem like the best idea. Particularly when the object you're using is binary and you can't see the code.

Gilad Bracha said...

logicalmind:

There is a contract, and it is honored. For example, if no suitable shell command is found, then the default version that throws an exception is called.

akuhn said...

I do love doesNotUnderstand, its great! But alas, as a singular extension point it doesnt play well in a multiprogrammer environment. When working with both Smalltalk or Ruby code, sooner than later more than one library wants to override the #DNU method of the very same class (typically this is the class Object). Boing, unresolvable conflict ahead!!! IMHO it should be up to the programming language to provide a better extension point, eg by keeping a list of DNU handlers...

Vassil Dichev said...

Hm, using a method name for what should otherwise be a method parameter looks like a code smell to me. Why not:

ls "filename"

or, in a more verbose form:

ls("filename")

In this particular case, I fail to see the benefit of dynamic methods. What if there is a file with the exact same name as an existing method of that object? You're bound to have at least a couple of those, e.g. clone, equals in Java. Suddenly you can't list those files. What if you have a file named doesNotUnderstand/method_missing?

What if you have characters, which are illegal for method names?

The problem here is that the method namespace is polluted with a list of unrelated names, where name clashes are possible.

In some cases, a similar DSL could even be a stability/security threat if e.g. a database column receives different treatment just because it happens to have the same name as an existing method.

Gilad Bracha said...

Vassil:

Valid points, but I don't see the problems you raise as critical.

ls: 'foo' doesn't suffer from the method namespace concerns you raise, nor ls value: 'foo'. There are many design variations, with trade offs between conciseness/convenience and generality.

As for security - any code one writes could have issues. Our philosophy is to deal with them via capabilities.