Saturday, January 12, 2019

Much Ado About Nothing

What sweet nothing does the title refers to? It could be about null, but it in fact will say nothing about that. The nothing in question is whitespace in program text. Specifically, whether whitespace should be significant in a programming language.

My instinct has always been that it should not. Sadly, there are always foolish souls who will not accept my instinct as definitive evidence, and so one must stoop to logical arguments instead.

Significant whitespace, by definition, places the burden of formatting on the programmer. In return, it can be leveraged to reduce syntactic noise such as semicolons and matching braces. The alleged
benefit is that in practice, programmers often deal with both formatting and syntactic noise, so eliminating one of the two is a win.

However, this only holds in a world without civilized tooling, which in turn may explain the fondness for significant whitespace, as civilized tooling (and anything civilized, really),  is scarce. Once you assume proper tooling support, a live pretty printer can deal with formatting as you type, so there is no reason for you to be troubled by formatting. So now you have a choice between two inconveniences. Either:

  • You use classical syntax, and programmers learn where to put the semicolons and braces,  and stop worrying about formatting, Or
  • You make whitespace significant, cleanup the syntax, and have programmers take care of the formatting.

At this point, you might say this a matter of personal preference, and can devolve into the kind of religious argument we all know and love. To tip the scales, pray consider the line of reasoning below. I don’t recall encountering it before which is what motivated this post.

In the absence of significant whitespace, a pretty printing (aka code formatting) is an orthogonal concern. We can choose whatever pretty printing style we like and implement a tool to enforce it.  Such a pretty-printer/code-formatter can be freely composed with any code source we have - a programmer typing into an editor, an old repository, and most importantly, other tools that spit out code - whether they transpile into our language or generate code in some other way.

Once whitespace is significant, all those code sources have to be cognizant of formatting.  The tool writer has to be worried about both syntax and formatting, whereas before only syntax was a concern.

You might argue that the whitespace is just another form of syntax; the problem is that it is not always context-free syntax. For example, using indentation to nest constructs is context sensitive, as the number of spaces/tabs (or backspaces/backtabs) depends on context.

In short, significant whitespace (or at least significant indentation) is a tax on tooling. Taxing tooling not only wastes the time and energy of tool builders - it discourages tooling altogether. And so, rather than foist significant whitespace on a language, think in terms of a broader system which includes tools. Provide a pretty printer with your language (like in Go).  Ideally, there's a version of the pretty printer that live edits your code as you type.

As a bonus,  all the endless discussions about formatting Go away, as the designers of Go have noted.  Sometimes the best way to address a problem is to define it away.

There. That was mercifully brief, right? Nothing to it.

11 comments:

  1. I wonder if the question is instead: "Should whitespace control semantics or should semantics control whitespace?"

    If it's the latter then each feature of the language can have its respective rules and be enforced by the parser and therefore be ignored or trigger an error/warning as warranted.

    This idea has been touched on in P4P.

    ReplyDelete
  2. How about neither? I really think that separating content from presentation is the best policy. The post, however, focuses on concrete argument rather than on the general question

    ReplyDelete
  3. As always, the answer is: it depends. Let's have a look at Haskell's significant whitespace as an example. The language is defined to have a semicolon inference stage between lexing and context-free parsing. I.e. signifcant whitespaces are translated into braces and semicolons. The programmer and any tool is always free to use braces and semicolons explicitly, which renders whitespace insignificant immediately. That strikes me as the best of both approaches. In practice, I've had much less trouble with significant whitespace in Haskell than in Python, but your mileage may vary.

    ReplyDelete
  4. Yes, it always depends on what you are trying to achieve etc. And yes, as longs you can spit out code where whitespace does not matter, you don't hit the issue I brought up, so the Haskell approach should be fine. Good point.

    ReplyDelete
  5. Yes, in Haskell you can even decide for any scope whether you want to use explicit braces and semicolons or significant whitespaces. So you are free to chose as you wish. And there are people who prefer to use semicolons with the do notation and significant whitespace for let and where clauses, e.g.

    ReplyDelete
  6. Formatting via a pretty-printer is still far from the norm (personally I don't like it — I like to exercise aesthetic judgement which is sometimes arduous to formalize).

    That being said, I think that's one of the things the significant-whitespace people actually like: that the language essentially enforces the formatting. So in one stroke you get clean-ish syntax and the abolition of braces/indentation debates.

    Your point that significant-whitespace is a tax on tooling is fair. But isn't the problem that it's so hard to define mildly context-sensitive grammars in the first place? Of course, since easier definition of context-sensitive parsers is one of the big thing I do in my PhD thesis, I would think so.

    To also give a counter-example to the fact that it makes tooling complex, consider Python. Significant-whitespace can be handled entirely at the lexical level. The algorithm is simple and well-documented (https://docs.python.org/3/reference/lexical_analysis.html). The pain is very minimal.

    ReplyDelete
  7. As I've been at pains to make clear in other threads (on Twitter) this has nothing to do with any difficulty (or none-difficulty) with parsing. So while making it easier to parse context sensitive grammars is a fine thing (a good parser combinator library should suffice, I believe), it's irrelevant. The tax is on tools that *generate* source code, that need to track this context.

    I prefer that the language implementation come with a tool that does the formatting (e.g., gofmt) and that solves the debates, and the aesthetics, the tool tax. But the point here is not to convert people's religion on whitespace - life is too short for that. The point is simply to point out the existence of this "tax", which isn't widely recognized. It won't stop anyone from rationalizing their bias, and I didn't expect it too. It just provides a bit of information for the open minded.

    ReplyDelete
  8. Thanks for the clarification!

    If it's fine to have a pretty-printer take care of the formatting, by the same token can't we argue that having a code-generation library that handles the context for us essentially solves the issue?

    I'd also argue that a language toolchain should provide these tools, especially the parser since it's already a part of the compiler.

    ReplyDelete
  9. I agree the language should provide these tools. Whether a library to assist code generation is a good option depends (as someone already noted) but I can see that working reasonably well in some cases.

    ReplyDelete
  10. It's not explicit in your discussion, but another problem with significant whitespace is that the lack of tooling forced everyone on a team to adopt the same stylistic preferences WRT formatting.

    I know many people will think, "We have to agree on formatting preferences within the team anyway, so this is actually a good thing," but that's only because these people have not been exposed to a reasonable alternative.

    When I worked in Smalltalk, we had set up the environment so each developer could have her own formatting preferences. We modified the code browsers so that they would apply formatting when the code was displayed, so it didn't matter how the underlying source code was actually formatted, and there was no need for everyone to agree on shared formatting standards. Everyone could see the code the way they wanted.

    ReplyDelete