Way back in January, I described a parser combinator library I’d built in Smalltalk. Since then, we’ve moved from Smalltalk to Newspeak, and refined the library in interesting ways.
The original parser combinator library has been a great success, but grammars built with it were still polluted by two solution-space artifacts. One is the need to use self sends, as in
id := self letter, [(self letter | [self digit]) star]
In Newspeak, self sends can be implicit, and so this problem goes away. We could write
id:: letter, [(letter | [digit]) star]
The other problem is the use of blocks. We’d much rather have written
id:: letter, (letter | digit) star
The solution is to have the framework pre-initialize all slots representing productions with a forward reference parser that acts as a stand-in for the real parser that will be computed later.
When the tree of parsers representing the grammar has been computed, it will contain stand-ins exactly in those places where forward references occurred due to mutual recursion. When such a parser gets invoked, it looks up the real parser (now stored in the corresponding slot) and forwards all parsing requests to it.
Our parsers are now entirely unpolluted by solution-space boilerplate, so I feel justified in calling them executable grammars. They really are executable specifications, that can be shared among all the tools that need access to a language’s grammar.
Below is a small but complete grammar in Newspeak:
class ExampleGrammar1 = ExecutableGrammar (
|
digit = self charBetween: $0 and: $9.
letter = (self charBetween: $a and: $z) | (self charBetween: $A and: $Z).
id = letter, (letter | digit) star.
identifier = tokenFor: id.
hat = tokenFromChar: $^.
expression = identifier.
returnStatement = hat, expression.
|
) ()
If you want to understand all the details, check out this paper; if you can, you might also look at my JAOO 2007 talk, which also speculates on how we can make things look even nicer, e.g.:
class ExampleGrammar3 = ExecutableGrammar (
|
digit = $0 - $9.
letter = ($a - $z) | ($A - $Z).
id = letter, (letter | digit) star.
identifier = tokenFor: id.
expression = identifier.
returnStatement = $^, expression.
|
)()