A place to be (re)educated in Newspeak

Wednesday, May 12, 2010

Patterns of Dynamic Typechecking

Dynamic type tests are ubiquitous in programming. Regardless of whether the programming language is statically or dynamically typed, object-oriented or functional, the need to establish what sort of value you are dealing with at run time is always there.

In object oriented languages constructs like Java’s instanceOf (or C# typeOf, or Smalltalk isKindOf:) serve this need - as do casts.

Some functional languages put a large emphasis on being “completely statically typed”. In fact, these languages rely heavily on dynamic typechecking in the guise of pattern matching. Somewhere in the middle lie constructs like typecase.

In most languages, constructs for dynamic typechecking undermine data abstraction. They allow one to test for the implementation type of a value. Checking whether an object is an instance of a particular class, or pattern matching against a specific datatype constructor suffer from this problem. In contrast, testing whether a value supports an interface is harmless. Dynamically typed languages don’t support that however.

An old trick is to add methods that support these kind of type queries. Where you might, in Java, define a class A that implements an interface T, you would simply define a method isT on A, that returns true. The problem is that you then need to define a version of isT that returns false for all other types that you might possibly encounter. To be safe, you’d add isT to Object. This is monkey patching of the highest order, and readers of this blog know my view “People don’t let Primitive Primates Program”.

However, there is an out. Define the default version of doesNotUnderstand: to check if the method name is of the form isX, where X has the form of a type identifier. If so, return false. Now all you need to do is define isT for the classes that actually support the T interface. Credit to Peter Ahe for proposing this. It’s in the current Newspeak implementation.

Pattern matching is much more general than isT however; using isT is equivalent to a simple nullary pattern - except for this annoying issue with data abstraction.

Recently, we’ve seen approaches that overcome the data abstraction problem: Scala and F# introduce pattern matching mechanism that allow you to preserve data abstraction. It was Scala’s influence, combined with demand from users at Cadence, that prompted me to examine the idea of supporting pattern matching in Newspeak.

One can’t just go graft a pattern matching construct on to Newspeak however. The only operations allowed in Newspeak are message sends. If we can’t express the matching constructs via messages, they cannot go into the language. This led me to the idea of pattern literals: a special syntax for patterns, that denotes a pattern object. We already have special syntax for number objects, string objects, closure objects etc., so adding pattern objects fits in quite naturally. The nice thing about such an approach is that patterns are first class values. Patterns can be abstracted over in methods, stored in slots and data structures, serialized to disk or what have you. You can’t do that with a pattern in ML or Haskell.

So what exactly would pattern matching in Newspeak look like? Well, Felix Geller has recently completed his masters thesis at HPI Potsdam on just that question. Felix has implemented an experimental language extension supporting pattern literals and integrated it into the Newspeak IDE. The new language features should be available in the next Newspeak refresh (coming to a computer near you, summer 2010).

In the new extension (NS3) patterns are matched against objects via a protocol very similar to Scala’s extractors, thereby preserving data abstraction. However, there is no special construct for pattern matching. Instead, complex patterns can be composed out simpler ones using combinators.

Of course, a quick scan of the literature shows that the functional community got there first. Mark Tullsen proposed first class patterns with combinators for Haskell a decade ago, and there are several other papers dealing with the idea. However, having a proposal does not guarantee that a feature is in fact included in a language; in practice, none of the popular functional languages supports first class patterns.

First class patterns combined with data abstraction are a potent combination. We mean to use them as first class queries that can be sent to objects of various sorts - ordinary in-memory collections, or databases, or logic programs. This is reminiscent of LINQ, except that the queries are first class so they can be abstracted over. You can write code that takes queries as parameters, accepts queries as input from the user, serializes and deserializes queries to persistent storage (allowing you to do meta-queries) and so on. And the beauty of a language like Newspeak is that all this requires but a fraction of the machinery you need in a mainstream setting.

Felix’s thesis will soon be available as a technical report. I am reluctant to steal his thunder before that, but once the report is out, I'll put out a follow up post that illustrates how this works and why it is neat.