A place to be (re)educated in Newspeak

Thursday, September 03, 2009

Systemic Overload

Type-based method overloading is a (mis)feature of mainstream statically typed languages. I am very much against it - I’ve spoken about it more than once in the past. I decided I should put my views on the matter in writing, just to get it out of my system. I’ll start with the better known problems, and work my way up to more exotic issues.

Throughout this post, I’ll use the term overloading to denote type based overloading. Arity based overloading, as in e.g., Erlang, is pretty harmless.

When introducing a feature into a language, one has to consider the cost/benefit ratio. You might argue that overloading lets you separate the handling of different types into different methods, instead of doing the type dispatch inside one method, which is tiresome and costly. A classic example are mathematical operators - things like +.

This argument would have merit, if the overloading was dynamic, as in multi-methods. Since it isn’t, overloading doesn’t solve this kind of problem. Not that I’m advocating multi-methods here - they have their own problems - but at least they are based on accurate type information, whereas overloading is based on crude static approximations.

Consider this code (loosely based on an example from Dave Ungar’s OOPSLA 2003 keynote).

class VehicleUtilities {

int numberOfAxles(Vehicle v) { return 2;} // a plausible default

int numberOfAxles (Truck t){ return 3;}

}




Vehicle v = new Truck();

VehicleUtilities u = new VehicleUtilities();

u.numberOfAxles(v); // returns 2

This simple example illustrates the dangerous disconnect between static and dynamic information engendered by overloading.

What exactly is the benefit of type based overloading? It saves you some trouble inventing names. That’s it. This may seem important, but the only case where it is actually needed is when writing constructors - and only because the language doesn’t allow you to give them distinct names.

Constructors are a bad idea, as I’ve already explained, so let’s assume we don’t have them. For methods, type based overloading provides a trivial rewrite and that is all. I don’t think it is that hard to give the different operations different names. Sometimes, the language syntax can make it easier for you (like keyword based method names in Smalltalk), but even in conventional syntax, it isn’t that difficult.

You pay for this convenience in myriad ways. The code above exemplifies one set of issues.

Another problem is the risk of ambiguity. In most overloading schemes, you can create situations where you can’t decide which method to call and therefore declare the call illegal. Unfortunately, as the type hierarchy evolves, legal code can become illegal, or simply change its meaning.

This means that existing code breaks when you recompile, or does the wrong thing if you don’t.

Overloading is open to abuse: it allows you to give different operations the same name. Altogether, you need style guides like Effective Java to warn you to use overloading as little as possible. Language constructs that require style guides to tell you not to use them are a bit suspect.

Ok, you probably know all this. So what contribution does this post make? Well, the systemic costs of overloading in terms of designing and engineering languages are less widely appreciated.

Overloading makes it hard to interoperate with other languages. It’s harder to call your methods from another language. Such a language may have a different type system and/or different overload rules. Or it may be dynamically typed.

You often find a dynamic language implementing multi-method dispatch to approximate the behavior of overloaded methods it needs to call. This is costly at run time, and is a burden on the language implementor.

Scala supports overloading primarily so it can call Java; you might say overloading is a contagious disease, transmitted from one language to another through close contact.

In general, overloading adds complexity to the language; it tends to interact with all sorts of other features, making those features harder to learn, harder to use, and harder to implement. In particular, any change to the type system is almost certain to interact with type based overloading.

Here are some examples. Answers at the end of this post.

Exhibit 1: auto-boxing

void foo(Number n) { ... }

void foo(int i) { ...}



foo(new Integer(3)); // quick, what does this do?

Exhibit 2: the var args feature in Java

void foo(String s, Number... n) {...}

void foo(String s, int i, Integer... n) {...}

void foo(String s, int i, int... n) {...}

void foo(String s, Integer i, Integer... n) {...}

void foo(String s, Integer i, int... n) {...}


foo(“What does this do?”, 1, 2);



Exhibit 3: Generics

void foo(Collection c) {...}

void foo(Collection < String> c){...}


void foo(Collection < Boolean> c){...}


void foo(Collection < ? extends String > c){...}


void foo(Collection < ? super String > c){...}



Collection< String> cs;

foo(cs);


/* Which of these is legal? What would happen if we didn’t use erasure? You have 3 seconds. */


Each of the above exhibits shows a specific type system extension which gets entangled with overloading.

You might say you don’t care; these are pretty sick examples, and the language designers sorted out the rules. What is their suffering to you? Well, these complications all have cost, and since resources are finite, they come at the expense of other, more beneficial things.

Tangent: People tend to think that the language design and implementation teams at large companies like Sun or Microsoft command vast resources. In reality, the resources are spread pretty thin considering the scope of the task at hand.

The more time is spent chasing down these issues, the less is spent focusing on doing stuff that actually buys you something. Products ship later and have more bugs and less desirable features, because people spent time worrying about gratuitous complications.

This is not hypothetical. I know the poor soul who wrote the spec for this might have spent his time doing something else, like day dreaming or adding closures. The compiler writers could have fixed more outstanding bugs, perhaps even reaching a state where there were no open bugs. I know they tried. The testers could have tested for other, more pressing issues and discovered more bugs to open, before shipping. These are the wages of sin - and the sin is unjustified complexity.

Now, for the sake of balance, I should say that overloading, and language complexity in general, do have one advantage I haven’t yet mentioned. They open up great opportunities for training, support and consulting. You can even write some really cool books full of language puzzlers.

It’s worth noting that this kind of overloading is only a concern in languages with a mandatory type system. If you use optional typing (or just dynamic typing), you aren’t allowed to let the static types of the arguments change the meaning of the method invocation. This keeps you honest.

Will future language designs avoid the problems of overloading? I wish I was confident that was the case, but I realize overloading is an entrenched tradition, and thus not easily eradicated.

However, the overarching point I want to make is that the costs of complexity are insidious - unobvious in the short term but pervasive over the longer haul. KISS (Keep It Simple, S - where the binding of S is open to interpretation).


Answers:

  1. The issue here is whether we auto-unbox Integer(3) first to produce an int (and call foo(int)) or resolve overloading in favor of foo(Number) and don’t unbox at all. Java does the latter. The reason is to remain compatible with older versions.
  2. This is ambiguous. Except for the first declaration, no method is more specific than the others.
  3. They all have the same erasure, and so the example is illegal. If we did not use erasure, than foo(Collection < String >) would be the most specific method.