A place to be (re)educated in Newspeak

Saturday, June 19, 2010

Patterns as Objects in Newspeak

I now want to show, concretely, how pattern matching works in the Newspeak extension I mentioned in an earlier post. The material from here on is more or less stolen from Felix Geller’s thesis.

Here are some simple pattern literals

<1> // matches 1


<‘a’> // matches ‘a’

< _> // matches anything

Each of these is simply a sugared syntax for an instance of class Pattern (or some subclass thereof). Pattern supports a protocol for matching. When a pattern is asked to match an object, it invokes the object’s match: method, sending itself (the pattern) as the argument. It’s then up to the the match: method to decide if the pattern is something that matches the object. This is somewhat similar to how extractors work in Scala; you should be able to see that such a protocol preserves data abstraction.

Any class can implement matching logic as needed, just as any class can declare support for an interface in a statically typed setting.

Patterns support methods that correspond to combinators, such as

p1 | p2 // matches anything that either p1 or p2 matches
p1 & p2 // matches whatever both p1 and p2 match
p not // matches if p doesn’t
p => actionBlock // matches if p matches; if so, execute actionBlock

As a simple example

fib: n = (
n case: < 1> | < 2> => [^n-1]
otherwise: [^(fib: n-1) + (fib: n-2)]
)

The method case:otherwise: is defined in Object. It takes a pattern as its first argument, and a closure as its second. If the pattern does not match the receiver, the closure is invoked .

In our example < 1 > | < 2 > matches 1 or 2, as you expect. < 1 > | < 2 > => [^n-1] matches 1 or 2 as well; if that succeeded, it will invoke the closure [^n-1]. Evaluating the closure [^n-1] will cause the enclosing fib: method to return with the result n-1. So fib: 1 yields 0, and fib: 2 yields 1, as expected. For k > 2, fib:k yields (fib: n-1) + (fib: n-2).

Note: much of the examples below are derived from the Scala extractor paper.

Here are some other pattern literals, known as keyword patterns:

< num: 1> // a keyword pattern (user-defined)

< multiply: left by: < num: 1>> // nested patterns - this is where you win over visitors

A keyword pattern literal evaluates to an instance of KeywordPattern, a subclass of Pattern. A keyword pattern literal specifies a method name (such as num: in the first example, and multiply:by: in the second); the resulting keyword pattern object supports a method of the same name. For example < num: 1> responds to num:. What this method does is check if its argument matches the argument specified in the pattern literal - in our example, checking that it equals 1. If the pattern specifies a nested pattern literal as an argument, the method matches the argument against that pattern recursively.

An object that wants to match < num: 1> will define its match: method thus:

match: p = (^p num: 1)

Similarly, < multiply:< _> by: 1> supports a method multiply:by: that tests its arguments to see if they match the pattern. In this example, the first argument of multiply:by: would be recursively matched against the nested pattern < _> (which always succeeds) and the second argument would be tested for equality against 1.

Imagine a class hierarchy for arithmetic expressions. There are several kinds of terms: numbers, products of terms, and likely other things, like variables. Assume numbers match patterns of the form < num: n> for some n, and assume products are represented using a class Product with two slots for the product term’s subtrees, operand1 and operand2. Instances of Product will match patterns of the form < multiply: x by: y > for some x and y. The match: method for Product can be written as

match: pat = (

^pat multiply: operand1 by: operand2

)

The result of a match is a binding. This is an object that can tell you what the original object being matched was and how it matched - what values are associated with the various names specified in the pattern.

We’ll use pattern matching to define a method simplify: that transforms product terms of the form X*1 to X.

simplify: expr = (

^expr case: < multiply: ?x by: < num: 1>> => [x]

otherwise: [expr].

)

If expr is a product of some term t and the number 1, simplify: will return t, otherwise it will return expr.

Note: The syntax ?id is allowed inside pattern literals and will make the corresponding matched value matched available under the name id in other parts of the pattern. How? Read the tech report.

How does x end up available in the closure? Well, the => combinator manipulates the scope of the closure to ensure that the desired accessors are available to it. Groovy programmers will recognize this idea, as will Smalltalkers familiar with GLORP. If it turns your stomach - well, I had reservations at first too, but all power corrupts, and dynamic languages give you a lot of power :-). This kind of trick must be used with great restraint. In Newspeak, the object capability model is intended to give us control over who can access the required reflective capabilities, so it is not a free-for-all.

The only language extension needed here are the pattern literals. All the rest is library code. We could dispense with the language extension entirely and just use the library, but things would be slightly more awkward - say

Pattern multiply: (Pattern variable: #x) by: (Pattern num: 1)

instead of

< multiply: ?x by:< num: 1>>

In principle, you could add such a library to a mainstream language such as Java. Of course, the typechecking, the absence of doesNotUnderstand:, the inability to add methods dynamically etc. would be crippling to the point where you wouldn’t really want to pursue the idea.

There is a lot of potential for refinement here. Patterns can be used as first class queries in LINQ like APIs that connect to databases or Prolog style rule engines, for example.

Check out Felix’s work if you want to understand it all. Or wait for the updated Newspeak documentation that will accompany the next release.

8 comments:

Dan Armak said...

The <...> tags in the post seem broken. It keeps saying things like, "Instances of Product will match patterns of the form <> for some x and y.". Surely there should be something between the <...>.

Gilad Bracha said...

Thanks. Fixed now.

lello said...

Say that slot pat refers to the literal <multiply: ?x by: <num: 1>>. That is, the pattern literal has been given a name, e.g., when passing the literal as argument to a method.

It's now visually hard to understand what the x in the expression pat => [x] refers to, without seeing what pat stands for. The full lexical environment for the implicit receiver send seems to be available only at runtime, when => does its duties.

Also, what if there is already an x lexically around when the closure is created, in that the programmer's intent is to access that x and not, accidentally, the pattern's x?

This is quite confusing to me and, as you note, hard to digest...

Mark Lee Smith said...

Cute Idea :), although I wonder if the sugar is really needed. Please pass my congradulations onto Felix if/when you see him Gilad. As always, I look forward to playing with the next version of Newspeak.

Gilad Bracha said...

Raffello:

Your point is well taken, and I am keenly aware of it. The extension is experimental and the rules may change. They certainly need to be pinned down.

Clearly, lexically visible names must have priority (this isn't really the case in the current implementation). One might even argue against playing any dynamic scope games, and requiring the closure passed to <= to take an explicit argument if access to the binding is needed.

In short, more experience is needed.

Gilad Bracha said...

Mark,

Thanks. One could probably live for a long time without this extension, but people I respect are quite adamant about the advantages. For now, it is just an experiment. The details will be adjusted and we will decide over time.

minobuzo said...

This looks really interesting! I'm a follower of your blog and my question is regarding literals (in this case the pattern objects) and your modular idea? How can you specified the classes involved in the compilation?

Gilad Bracha said...

minobuzo: I assume you are asking whether I can override the classes used to create the pattern literals?
Pattern literals are translated into calls of the form Pattern literal:, Pattern wildcard etc. Since Pattern si just a message send, you can override it to return a different class. Down th eline, the goal is to compile all literals this way.