A place to be (re)educated in Newspeak

Saturday, January 13, 2007

Representation Independent Code

In most object oriented languages, replacing a field with a method requires you to track down the uses of that field and changing them from field accesses to method invocations. The canonical example is a class of points. You decided to change your representation from cartesian to polar coordinates, and now all the places you wrote ‘x’ have to be rewritten as ‘x()’.

This example isn’t so bad, because the odds are you already had an x() method, and you probably had the sense to avoid making the x field public. But maybe you made it protected (perhaps your language is smart enough to disallow public fields, but simple-minded enough to force them to always make them protected, like Smalltalk). If x is protected, you’ll need to find all the subclasses. Maybe you don’t have access to all of them, and you can never get rid of the field x.

Or maybe you make a stupid mistake, and made x public, perhaps in the mad rush toward a release. Won’t happen to you? Take a look at Java’s System.out and ask yourself how it got to be there. Now go find all the uses of x and change them. Even if you can, it’s pretty tiresome.

The fact is, given the ability to publicize a field, programmers will do so. Once that’s happened, tracking down the uses may be impossible, and in any case is a huge amount of work.

It would be nice if you didn’t have to worry about this sort of thing. If everyone using your object went through a procedural interface, for example. Smalltalk makes all uses outside of an object do that - but uses within the object, in its class and subclasses, are exempt. As for mainstream languages like Java and C# - they allow you to declare public fields; it’s your funeral.

About 20 years ago, Dave Ungar and Randy Smith introduced Self, which fixed this problem. All communication is through method calls (or synchronous message sends, if you will) - even the object’s own code works exclusively by sending messages to itself and other objects. Fields (slots in Selfish) are defined declaratively, and automatically define access methods. The only way to get or set a field is by invoking a method. So if you get rid of the field and replace it with a method that computes the value instead, no source code anywhere can tell the difference. The code is representation independent. Self’s syntax makes it very easy and natural to send a message/call a method - there is no overhead compared to accessing a field in other languages.

In C# they have a thing called properties, which is similar. Except that C# also has fields, and so it requires careful attention by the programmer to ensure representation independence. In other words, it cannot be relied upon to happen. I don’t know why the designers of C# chose to support both fields and properties. I should ask my friends at Microsoft (yes, I have a few; I’m very non-judgmental). In complex languages, there are always all kinds of strange gotchas and constraints.

There are of course other ways that languages can undermine representation independence. In particular, the type system can support class types that make code dependent on which class you use, rather than on what interface is supported. I don’t want to dive into that right now.

The point of this post is to draw attention to the importance of representation independence. If you are using C# or something else with such a construct, I’d suggest you make the best of it and use properties or their equivalent religiously. And future languages should follow Self’s lead and ensure representation independence.


Trevor Fancher said...

You might be interested in http://iolanguage.com/

mgsloan said...

One problem with .NET's properties/fields is that they are pure syntax sugar - you can't replace a field in a lib with a property and expect dependant runtimes to work.

Unknown said...

Actually, properties aren't 'syntax sugar' at all in .NET, which is causing the problem. If they were syntax sugar, .Net would know how to use them interchangably without causing things to crash.
The reasons fields are in the languages is because of the way they decieded to implement them. The get method for an indexed property (collections, arrays, etc.) requires a value for each index. Unfortunately, there is no way, without breaking encapsulation, to fetch the legal values for that index. This means that it becomes very hard to write 'safe' software that does things with properties. For example, writing a generalized XML serializer/unserializer for classes (because .NET only does that with structures) is basically impossible to do without breaking encapsulation.

Gilad Bracha said...

Trevor: I know about Io, thanks.

Bot builder: Interesting. I assumed the sugar worked by always inserting method calls. Otherwise, the only benefit is not having to edit sources when you make the change - but you still have to recompile. This means that they haven't really solved the problem.

The good news is that this should be fixable in the .Net implementation.

Unknown said...

In the OO world in the early '90s, this used to be called "uniform reference". I believe the term is due to Bertrand Meyer, designer of the Eiffel OO programming language, who called it the "principle of uniform reference". See e.g. here

I believe it's mentioned in Meyer's book, Object Oriented Software Construction.

Isaac Gouy said...

"Now go find all the uses of x and change them. Even if you can, it’s pretty tiresome."
Seems like we're assuming some particularly tiresome method of changing the references.

Last time this topic came up, I was surprised by responses like 'Why's that a problem - we use IntelliJ / Eclipse ?'

Gilad Bracha said...

Isaac: sure you can refector your code automatically using tools. But why should you have to bother/remember. Why should you have to recompile (some VMs can address that, some can't/won't).

And what about all the code in the rest of the universe? For example, what about all the users of System.out?

Eventually even such large scale distribution problems will be solved, I believe, but it will take a long time. That's a topic for another post.

Ultimately, what should matter about an object is its behavior, not its representation,and this an important step in that direction.

Unknown said...

Scala seems to do this right. public 'val's get translated into methods in the rendered byte-code. Scala's gotten a lot of other things right as well.

Alex Buckley said...
This comment has been removed by the author.
Isaac Gouy said...

Gilad Bracha said ... what should matter about an object is its behavior, not its representation ...

Long ago I read Encapsulation and inheritance in object-oriented programming languages - the reason it struck a chord was that I had a visceral understanding of how much pain could be involved in changing representations.

Now that refactoring tools have made those changes so much easier, I think the old argument has been undermined and new arguments must be put forward.

David Pollak mentioned Scala, there are features of Scala which blur the distinction between field access and method calls

- board.cells could be a field access or it could be a method call

- board.cells(i) could be a field access to an indexed variable or it could be a method call

Alex Buckley said...

I'm surprised that no-one has mentioned the main benefit of representation independence: that it enables hot swapping. If you can avoid a language feature like fields (which loses only minor amounts of expressibility), you reap vast benefits at the VM level.

Interestingly, bot builder and michael are both right about .NET properties. Properties are formal members of a type in the Common Type System and Common Language System (.NET's spec). But as bot builder says, they are defined as syntactic sugar for synthetic get/set methods (ECMA335 Partition I S8.11.3). The compilation of property definitions and access into bytecode is not standardised. However, there is just enough metadata about a property to allow interop (Partition II S17), so michael is right too.

Pascal Costanza said...

CLOS's solution is interesting in that (a) it is very straightforward to have accessor methods automatically generated for you - you only need to use the appropriate "modifier" (or "annotation") in a field definition; (b) it is in fact syntactically more convenient to use method calls to access fields than to access them directly. (There is also no performance penalty involved with using accessors instead of direct slot accesses - but that takes too long to explain here... ;)

I guess you get the same benefits in Dylan and the various object systems for Scheme which are all more or less based on CLOS concepts.

Gilad Bracha said...

Hi Pascal.

The point I was making is that a language can guarantee representation code. Obviously, if one has to ask for an accessor, however easy that is, that leaves it up the programmer. So there is no guarantee. I'm sure it's trivial to define a Lisp dialect that does guarantee this though.

Vanessa said...

Andreas added Self-like fields to Smalltalk in Tweak, which is built on top of Squeak (also used in Croquet). It's a restricted form of Self's implicit receiver, "foo" is compiled as "self foo" (and "foo :=" becomes "self foo:") if you declared foo to be a field, rather than an instance variable.

Some long-time Smalltalkers complain about this because it blurs the distinction between variable access and message sends. Anyway, Tweak is worth a look, even more so for its asynchronous messaging model.

Gilad Bracha said...

Yes, it's pretty easy to make a Smalltalk system do representation independent code, or any other language variation. Which is why it is so odd that so few people do such language experiments.

Of course, once you do RIC in Smalltalk, you need to get some notion of private methods, since you've lost the use of instance variables for encapsulation. Of course, that is a good thing regardless.