A place to be (re)educated in Newspeak

Saturday, December 11, 2010

Reflecting on Functional Programming

In this post, I wanted to make a case for reflection in the context of pure functional programming. I don’t know that pure functional languages should be different than other languages in this regard, but in practice they are: they generally do not have reflection support.

To demonstrate the utility of reflection, I’m going to revisit one of my favorite examples, parser combinators. In particular, we’ll consider how to implement executable grammars. Executable grammars are a special flavor of a parser combinator library that allows semantic actions to be completely separated from the actual grammar. I introduced executable grammars as part of the Newspeak project.

Consider the following grammar:

statement -> ifStatement | returnStatement
ifStatement -> ‘if’ expression ‘then’ expression ‘else’ expression
returnStatement -> ‘’return’ expression
expression -> identifier | number

In Newspeak, we’d write:

class G = ExecutableGrammar ( |
(* lexical rules for identifier, number, keywords elided *)
(* The actual syntactic grammar *)
statement = ifStatement | returnStatement.
ifStatement = if, expression, then, expression, else, expression.
returnStatement = returnSymbol, expression.
expression = identifier | number.
|)()

Now let’s define some semantic action, say, creating an AST. The Newspeak library let’s me do this in a subclass, by overriding the code for the production thus:

class P = G ()(
ifStatement = (
super ifStatement wrap:[:if :e1 :then :e2 :else :e3 |
IfStatementAST if: e1 then: e2 else: e3
].
)
returnStatement = (
super returnStatement wrap:[:return :e | ReturnStatementAST return: e].
)
)

No prior parser combinator library allowed me to achieve a similar separation of grammar and semantic action. In particular, I don’t quite see how to accomplish this in a functional language.

In the functional world, I would expect one function would define the actual grammar, and another would perform the semantic actions (in our example, build the AST). The latter function would transform the result of basic parsing as defined by the grammar, producing an AST as the result. We’d use pattern matching to define this function. I’d want to write something like:

makeAST =
fun ifStatement(ifKw, e1, thenKw, e2, elseKw, e3) =
IfStatementAST(makeAST(e1), makeAST(e2), makeAST(e3)) |
fun returnStatement(returnKw, e) = ReturnsStatementAST(makeAST(e)) |
fun identifier(id) = IdentifierAST(id) |
fun number(n) = NumberAST(id)

where makeAST maps a concrete parse tree into an abstract one. Which in this case looks pretty easy.

The question arises: where did the patterns ifStatement, returnStatement, number and identifier come from?

Presumably, our parser combinator library defined them based on our input grammar. The thing is, the library does not know the specifics of our grammar in advance. It cannot predefine data constructors for each conceivable production. Instead, it should create these data constructors dynamically each time it processes a specific grammar.

How does one create datatypes dynamically in a traditional functional language? I leave that as an exercise for the reader.

Ok, so while it is clear that creating datatypes on the fly would be very helpful here, it is also clear that it isn’t easy to do in the context of such languages. How would you describe the type of the library? The datatype it returns is created per grammar, and depends on the names of the grammar production functions. Not easy to characterize via Hindley-Milner. And yet, once the library created the datatype, we actually could utilize it in writing type safe clients.

Instead, our library will probably generate values of some generic datatype for parse trees. A possible representation is a pair, consisting of a tag of type string representing the name of the production used to compute the tree, and a list consisting of the elements of the tree, including vital information such as where in the input stream a given token was found and what string exactly represented it. We cannot elide such lexical information, because some users of our library will need it (say, pretty printers). Then I can write:

makeAST =
fun parsetree(“if”, [ifKw, e1, thenKw, e2, elseKw, e3]) =
IfStatementAST(makeAST(e1), makeAST(e2), makeAST(e3)) |
fun parsetree(“return”, [returnKw, e]) = ReturnsStatementAST(makeAST(e)) |
fun parsetree(“id”,[id]) = IdentifierAST(id) |
fun parsetree(“number”,[in]) = NumberAST(in)

Obviously, we’ve lost the type safety of the previous version. Ironically, the inability of the language to generate types dynamically forces code to be less statically type safe.

Now ask yourself - how does our combinator library produce values of type parsetree with an appropriate tag? For each parsetree value p(tag, elements), the tag is a string corresponding to the name of the production that was used to compute p. How does our library know this tag? The tag is naturally specified via the name of the production function in the grammar. To get at it, one would need some introspection mechanism to get the name of a function at run time. Of course, no such mechanism exists in a standard functional language. It looks like you’d have to force the user to specify this information redundantly as a string, in addition to the function name (you still need the function name so that other productions can refer to it).

You might argue that we don’t really need the string tags - just return a concrete parse tree and distinguish the cases by pattern matching. However, it isn’t generally possible to tell the parse tree for a number from that for an identifier without re-parsing. Even when you can tell parse trees apart, the resulting code is ugly and inefficient, as it is repeating some of the parser’s work.

We could approach the problem via staged execution, writing meta-program that statically transformed the grammar into a program that would provide us with the nice datatype constructors I suggested in the beginning. If one goes that route, you might as well define an external DSL based on BNF or PEGs.

So, I assert that reflection is essential to this task, and dynamic type generation would be helpful as well, which would require dependent types and additional reflective functionality. However, maybe I’ve overlooked something and there is some other way to achieve the same goal. I’m sure someone will tell me - but remember, the library must not burden the user by requiring redundant information or work, it must operate independent of the specifics of a given grammar, and it must keep semantic actions entirely separate.

In any case, I think there is considerable value in adding at least a measure of introspection, and preferably full reflection, to traditional functional languages, and interesting work to be done fleshing it out.

21 comments:

Damien Cassou said...

Here is how to define HUnit test suites in Haskell:

combTests :: Test
combTests = TestList [
TestLabel "testHasOnePair" testHasOnePair
, TestLabel "testHasTwoPairs" testHasTwoPairs
]

as you can see, the test name is duplicated: once as a string for the error message and once as a function to execute the test.

Unknown said...

There is a few interesting things in Kiselyov's and Shan's implicit configurations paper -- http://www.cs.rutgers.edu/~ccshan/prepose/prepose.pdf.

I think there is a haskell implementation floating around for the reflection components discussed in the paper.

Mark Lee Smith said...

Good to see you again Gilad, I was starting to worry :).

As for the article, I completely agree, but I am bias.

Gilad Bracha said...

Damien,

Thanks for the example illustrating the same issue.

Mark H. : Thanks for the pointer.

Mark LS: ALways nice to hear from you.

Unknown said...

I'm sorry, but in my opinion, your account does not do justice to pure
functional languages, especially those with advanced type systems such
as Haskell + GHC extensions. As an example, take a look at my grammar-combinators
library. I believe that it clearly invalidates some of your
assertions.

First of all, the library features complete separation between
grammar and semantic actions, which you seem to claim is
impossible in a pure programming language.

Secondly, the need to define data types is limited to types for the
non-terminals (possibly empty, but typically represented by the AST
types) and a pattern functor representing the relations between the
non-terminals. Note that the Abstract Syntax Tree (contrary to the
Concrete Syntax Tree) is not something that you can derive from
the grammar, as it abstracts from the details of the syntax. There is
some duplication here, but most of it can be eliminated using (not yet
implemented) Template Haskell functions. In return, a lot more
correctness of user code can be guaranteed at compile time.

Finally, I want to point out that none of the arguments you provide
(including the HUnit example by the way, see e.g. here)
actually indicate any need for run-time reflection, but rather a
compile-time meta-programming facility such as Template
Haskell
, which is readily available in the de facto standard
implementation (GHC) of maybe the most widely used pure functional
programming language (Haskell).

c43ng said...

I guess that a sort of dependent-type system could help for this cases. Just as you could define the type of the lists of size 10, you could define the type of the "if" token, and statically type check all the places where you use it.
OTOH, it will make the type checking very hard, at best. (http://en.wikipedia.org/wiki/Dependent_type)

Gilad Bracha said...

Oddly, One comment that blogger notified me of hasn't shown up here, so I'll post it by hand:

I'm sorry, but in my opinion, your account does not do justice to pure
functional languages, especially those with advanced type systems such
as Haskell + GHC extensions. As an example, take a look at my grammar-combinators
library. I believe that it clearly invalidates some of your
assertions.

First of all, the library features complete separation between
grammar and semantic actions, which you seem to claim is
impossible in a pure programming language.

Secondly, the need to define data types is limited to types for the
non-terminals (possibly empty, but typically represented by the AST
types) and a pattern functor representing the relations between the
non-terminals. Note that the Abstract Syntax Tree (contrary to the
Concrete Syntax Tree) is not something that you can derive from
the grammar, as it abstracts from the details of the syntax. There is
some duplication here, but most of it can be eliminated using (not yet
implemented) Template Haskell functions. In return, a lot more
correctness of user code can be guaranteed at compile time.

Finally, I want to point out that none of the arguments you provide
(including the HUnit example by the way, see e.g. here)
actually indicate any need for run-time reflection, but rather a
compile-time meta-programming facility such as Template
Haskell, which is readily available in the de facto standard
implementation (GHC) of maybe the most widely used pure functional
programming language (Haskell).



Posted by Dominique to Room 101 at 12/12/2010 12:17 PM

Gilad Bracha said...

And now I'll respond to Dominique.

First of all, thanks for the pointers to your work. In early 2007, I asked several well known researchers in the Haskell/FP world about any work on separating grammar and semantic actions in parser combinators and no one could point at anything. Some even asked why it mattered! So I'm glad that you understand that it is important, and are working in this area.

Substantively, I disagree with you. I did explicitly note that one could do this via meta-programming and that didn't cut it from my perspective. So Template Haskell based stuff does not undermine my claims in the least. Mteaprogramming and reflection are different. Claiming that metaprogramming is a substitute for reflection is a position that only those without reflection are likely to embrace.

Unknown said...

Gilad,

Thanks for posting my previous comment, I'm not sure what went wrong and if it was my fault or blogger.com's.

First of all, I agree with you that parser combinator libraries in pure functional languages have previously not paid enough attention to separation between grammar and semantics, however my grammar-combinators work at least shows that it is possible with a modest annotation burden, even without compile-time meta-programming (but meta-programming does help to avoid certain boilerplate code).

Regarding the reflection vs compile-time meta-programming question, my personal view is that both are non-ideal, because errors in meta-programs will be detected at run-time (by the program user) or at library user's compile time (by the application developer), whereas they should ideally be detected by the library author. I think the one programming language technology showing long-term promise here are dependently typed languages like Agda, Epigram or Coq where arbitrary correctness properties of meta-programs can be proven by library authors (I'm very impressed by Brink, Holdermans and Löh's dependently typed grammars).

By the way, if you're interested in compile-time meta-programming in a dependently typed programming language, I think the badly-documented and experimental quoteGoal stuff in Agda is extremely novel and interesting.

Gilad Bracha said...

Hi Dominique,

I've always seen dependent types as too heavy. I'll take a look a the reference you provided and see how it works for grammars.

In any case, type safety isn't really very high on my list of priorities. The goal is to avoid redundancy and boiler plate and just express the grammar. I'll gladly trade type safety guarantees away if they get in the way of that goal.

單中杰 said...

What about the line of work on reflection started by Brian Smith?

Gilad Bracha said...

單中杰 :
I'm not talking about Lisp/Scheme etc. Or even Erlang. I'm talking about pure functional programming languages in the academic tradition (like Haskell).

單中杰 said...

I'm not sure what "academic tradition" you have in mind that includes Haskell but not Lisp/Scheme, but one pure language with reflection is Blond.

pizzaeater said...

Have you looked at Bla?
http://strlen.com/bla-language

Gilad Bracha said...

I confess I never heard of Bla. It's a neat idea - though a quick perusal shows no sign of reflection, so perhaps it's a tad off-topic. It does seem like a nice and novel integration of OO and FP. Thanks for the pointer!

Pascal Costanza said...

The boundaries between functional and non-functional can be blurry. Consider a version of your example in ContextL/Common Lisp.

No classes were harmed in the production of this code. There are also no unsurmountable amounts of side effects involved. The syntax can always be made look neater by way of macros. It is relatively straightforward to imagine a context-oriented extension of a pure functional language that can express this example in a similar way, except maybe for static typing issues.

The gist of why this separation between syntax and semantics is possible both in your and my version is because of the restricted form of dynamic scoping you get with super calls (or call-next-method in Common LIsp). This allows intercepting methods/functions at well-defined points. In object-oriented programming, objects and the class hierarchies in which they are involved are the 'drivers' for this limited form of dynamic scoping; in context-oriented programming, it is the layer activation.

beroal said...

The problem you have found lies in inventing useless terms with broad meaning like "separation of grammar and semantic action", "create datatypes dynamically", "generic datatype", "executable grammars" and so on. The problem is imaginable, as those words. Just get rid of them.

Unknown said...

Hi Gilad,

I've recently done some work on various mini-EDSLs in Haskell, using a variety of techniques (final-tagless, GADT, and quasi-quoted) and I ran into exactly these same problems. Currently I resort to dynamic recompilation at runtime via the plugins package.

I'm particularly dis-satisfied with the Template Haskell solution as all TH splices must be resolved at compile-time.

In Haskell at least it may be possible to hack something together by using the Data class (I'm thinking of something like the impure interface to cmdargs), but I would consider that both incredibly ugly and unstable.

I do think that metaprogramming would be part of a viable solution, but only in a good multi-stage environment (MetaML comes closest to what I'm thinking of).

Gilad Bracha said...

Hi John,

Thanks for the comment - it's always nice when someone agrees with me :-) .

Staged execution has its uses, I'm sure. One can build up staged execution environments as tools. Reflection, on the other hand, has to be part of the system core. That is one reason why I see it as more fundamental.

單中杰 said...

Perhaps you say staged execution can be built as a tool because, for example, one can always print out some code to a file and run it with something like system(). But by the same token, reflection can also be built as a tool because, for example, one can always read (even the running program's own) code from a file and react accordingly - generate boilerplate code, invoke methods by name, etc. Would you please give an example of something that can be done only if reflection facilities are provided by the language system core?

Brian Slesinsky said...

As an example in a more practical setting, code generation is commonly used as a substitute for reflection in GWT, which lacks reflection due to the runtime overhead when generating JavaScript.

It's not a general substitute and tends to be limited to people writing advanced libraries. But it's sufficient to port most of Guice (which relies heavily on reflection) to GWT.