A place to be (re)educated in Newspeak

Monday, May 25, 2009

Original Sin

I’ve often said that Java’s original sin was not being a pure object oriented language - a language where everything is an object.

As one example, consider type char. When Java was introduced, the Unicode standard required 16 bits. This later changed, as 16 bits were inadequate to describe the world’s characters.

Tangent: Not really surprising if you think about it. For example, Apple’s internationalization support allocated 24 bits per character in the early 1990s. However, engineering shortcuts are endemic.

In the meantime, Java had committed to a 16 bit character type. Now, if characters were objects, their representation would be encapsulated, and nobody would very much affected how many bits are needed. A primitive type like char, however, advertises its representation to the world. Consequently, people dealing with unicode in Java have to deal with encoding code points themselves.

I was having this conversation for the umpteenth time last week. My interlocutor asked whether it would be possible for Java to be as efficient as it was without primitive types. The answer is yes, and prompted this post.

So how would we go about getting rid of primitive types without incurring a significant performance penalty?

Java has a mandatory static type system; it is compiled into a statically typed assembly language (Java byte codes, aka JVML). It supports final classes. I do not favor any of these features, but we will take them as a given. The only changes we will propose are those necessary to eradicate primitive types.

Assume that the we have a final class Int representing 32 bit integers. The compiler can translate occurrences of this type into type int. Hence, we can generate the same code for scalars as Java does today with no penalty whatsoever.

To make Int a suitable replacement for int, we would like the syntactic convenience of using operators:

Int twice(Int i) { return i + i;}

This is easy enough. We don’t want operator overloading of course. What we want is simply the ability to define methods whose names are operators. The language can support the same set of binary operators it does today, with the same fixed precedence. No need for special syntax and rules to define the precedence of operators etc. I leave that sort of thing to people who love complexity.

It is crucial that Ints behave as values. This requires that the == operator on Ints behaves the same as equality. There should be no way to detect if we’ve allocated one copy of the number 3 or a million of them. Since we can define == as a method, we can define it in Object to work the way it does for most objects, and override it with a method in Int and other value classes to work differently. We’ll want to take care of identityHash as well of course.

At this point, we can simply say that int stands for Int. Except for locking. What would it mean to synchronize on an instance of Int? To avoid this nastiness, we’ll just say that all instances of Int are locked when they are created. Hence no user code can ever synchronize on them.

In this hypothetical dialect, literal integers such as 3 are considered instances of Int. They, and all other integer results, can be stored in collections, passed to polymorphic code that requires type Object etc. The compiler sees to it that they are usually represented as the primitive integer type of the JVM. When they are used as objects, they get boxed into real instances of Int. When an object gets cast to an Int, it gets unboxed.

This is similar to what happens today, except that it is completely transparent. Because these objects have true value semantics for identity, you can never tell if a new instance was allocated by the boxing or not; indeed, you cannot tell if boxing or unboxing occur at all.

So far I haven’t described anything really new. A detailed proposal along the lines above existed years ago. It would allow adding new user defined value types like Complex as well.

A more restricted, but somewhat similar proposal was considered as part of JSR201 - the idea being that boxing would create value objects, without actually replacing the existing primitive types. It was rejected by the expert group. As I recall, only myself and Corky Cartwright supported it. There were concerns as to what would become of the existing wrapper classes (Integer and friends); those are used by reflection etc., and are hopelessly broken because they have public constructors that allocate new instances for all the world to see.

Tangent: I don’t know who came up with the idea of an Integer class that could have multiple distinct instances of 3. I assume it was another engineering shortcut (see above) by someone really clever.

To be clear - I am not suggesting that this discussion should be reopened. This is just a mental exercise.

There is only one place where this proposal introduces a performance penalty: polymorphic code on type Object that tests for identity. Now that identity testing is method, it will be slower. Not by all that much, and only for type Object.

Why? Because we preclude overriding == in non-value classes. There are several ways to arrange this. Either by fiat, or by rearranging the class hierarchy with types Value (for value types) and Reference (for regular objects) as the only subtypes of Object, making == final in those two types. Or some other variation

What has not been seriously discussed, to my knowledge, is what to do with arrays. As indicated above, scalar uses of primitive types don’t pose much of a problem.

However, what of types like int[]? Again, we can allocate these as arrays of 32 bit integers just as we do today. No memory penalty, no speed penalty when writing in ints or reading them out.

What makes things complicated is this: If int is a subtype of Object (as it is in the above) we’d expect int[] to be a subtype of Object[], because in Java we expect covariant subtyping among arrays.

Of course that isn’t type safe, and one could certainly argue that this our chance to correct that problem. But I won’t. Instead, assume we want to preserve Java’s covariant array subtyping.

The thing is, we do not want to box all the elements of an int[] when we assign it to an Object[].

The zeroth order solution is to leave the existing subtype relations among arrays of the predefined types unchanged. It’s ugly, but we are no worse off than we are today - and have a bunch of advantages because we are free of primitive types. But we can do better.

Suppose Object[] has methods getObjectElement and setObjectElement. These (respectively) retrieve and insert elements into the array.

We can equip int[] with versions of these methods that box/unbox elements that are being retrieved or inserted. We ensure that a[i] and a[i] = e are compiled differenty based upon the static type of a. If a is Object[], we use the getObjectElement and setObjectElement. If a is of type int[], we use methods that access the integer representation directly, say, getIntElement and setIntElement.

At this point, we can even introduce meaningful relations among array types like int[] and short[] using the same technique.

In essence, Object[] provides an interface that different array types may implement differently. We just provide a sugar for accessing this interface magically based on type. On a modern implementation like Hotspot, the overhead for this would be minimal.

Of course, if you make arrays invariant, the whole issue goes away. That just makes things too easy. Besides, I’ve come to the conclusion that Java made the right call on array covariance in the first place.

All in all - Java could have been purely object oriented with no significant performance hit. But it wasn’t, isn’t and likely won’t. Sic Transit Gloria Mundi.

50 comments:

Robert Konigsberg said...

Small not significant typo in first example: Init -> Int (in Int twice(Init i) { return i + i;})

Gilad Bracha said...

Fixed. Thanks!

Howard Lovatt said...

Pity you are no longer at Sun to push for this change :( I think that if you introduced a source keyword to Java, so you could say a file was in Java7 for example, then changes like this would be possible.

Also, I think you need to be able to declare a class as non-null, meaning that no instance of the class can be null. So that you can't say:

Int i = null;

Gilad Bracha said...

Howard,

One of the many reasons I left Sun was because it was increasingly impossible to make meaningful progress on the language, or anything else for that matter.

Neal Gafter and I argued for identifying the language version in Java source some years ago. The idea was vetoed, for reasons I could never follow.

Daniel Spiewak said...

Sounds a lot like Scala, actually. As a point of interest, Odersky and friends have recently come up with an interesting solution to the covariant arrays + unboxed primitives issue: generic type specialization. There've got a pre-print of a paper around somewhere, but it boils down to selectively creating specialized versions of the Array class for certain types. Their current solution does something similar by hand, but (as I understand it) there are some edge cases that are fixed by general type specialization.

Coming back 'round to your point about Java and pure object-oriented languages... Java also has some other features which break object-oriented purity (such as it may be defined). Statics are a biggie, and I would claim that arrays are another. It's a shame, because both of these "features" as they are implemented add unnecessary complexity to the language. Java *could* have very easily been a nice, clean language in terms of core constructs. Such a shame...

Gilad Bracha said...

Daniel,

Yes, Scala is an excellent example of what can be done on within the constraints of static typing. As for definitions of pure OO etc.:

I've railed against all thing static on this blog before - but you can have a perfectly OO language with static state (e.g., Self).

The traditional definition of pure OO is simply that everything is an object. I have tended to interpret this as implying reflection as well - since the computation itself must be an object.

However, one wants to go further and insist that only an object's behavior matters - so everything goes through an interface, as we have done in Newspeak. That precludes some of the techniques suggested in this post.

Mikhail said...

Sounds (almost) like C#. I actuallay missed in your post as well as in most other posts about new language features in Java a comparision with corresponding features in C#.

Daniel Spiewak said...

Actually, C# really isn't much better than Java in this respect. It still has fully primitive types and arrays. IIRC, its auto-boxing works better, but that's about it. Add to that the fact that C# has raw pointers and a fairly intrusive FFI, and I think you arrive at a language that is *worse* than Java when it comes to its core constructs. I'm considering things like delegates and reified generics as beyond the scope of this discussion.

Gilad Bracha said...

Mikhail,

What Daniel said, in spades.
C# is not a pure OO language by any stretch of the imagination. It is important to go beyond cursory similarities and understand the fundamentals.

In particular, C# autoboxing is not transparent, and so the primitive types are distinct from objects just as in Java.

Likewise, operator overloading in C# is very different from what I suggested in the post (which is similar to Scala, and like Scala's approach, inspired by Smalltalk).

Steven said...

I think a large part of why Java ended up this way is the time it was first released. There were few OO languages and to gain a foothold in the dev community, it had to make the dinosaurs using non OO languages happy as well

Gilad Bracha said...

Steve,

I agree: if Java had been more modern, it might not have succeeded. I'm sure this wasn't a deliberate design strategy though.

In any case, I'm not into second guessing James - it's just a design exercise - not a proposal for the future nor a critique of the past..

As such, this post shows that you can make a language that would appear almost indistinguishable from Java at first blush, while keeping its core design much more flexible.

abies said...

I'm afraid it is not that simple. While I agree that it is possible to have quite efficient encoding of primitive types as objects there is always a cost. Smalltalk is a good example here - most implementations are having small/large integer distinction in implementation, with small integers being disguised inside a non-rounded pointers. This suddenly means that you have quite different performance for integers crossing certain barrier - and bit worse performance for every integer out there.

I, for one, enjoy the fact that java has possibilities to be as fast as C++/Fortran for math crunching code. Even if it was not at very start and maybe is not now in every single aspect, possibility is there - and it would not be possible with 'funny' integer encoding.

As far as you example with 'char' is concerned... I don't think that making char as object would help in any way. After all, chars are represented as 32-bit in most cases anyway on the stack. What is leaking around is bit arithmetic for example. Even with object representation you would be suddenly surprised by having different bits cuts when converting between int and char. Even bigger issue would be serialization. While you can play a lot of games in memory, byte streams are cruel - every byte wasted is sticking out. You would suddenly end up with some strange markers/versions of streams, incompatibility between versions of java communicating over RMI etc... I don't think it is doable. Even with 'char' as object, I think we would just get 'longchar' one, with a bit of automatic conversion sugar in language (as we got a bit of sugar for char -> int now automatically).

When you think about 'primitives' as objects, always think about opengl JNI bindings from time before the NIO buffers. I don't think that any amount of Hotspot trickery would allow vertex arrays to be as fast with primitives being anything else than exactly 'native' versions of the types.

Daniel Spiewak said...

This has less to do with Hotspot trickery and more to do with the javac compiler itself being smart. As I said before, look at Scala. It doesn't really have primitive types at the language level; everything is an object. However, Scala benchmarks on mathematically-intensive code as being even *faster* than Java. (see: Alioth). Clearly, object primitives do not impose a serious performance hit when done correctly.

I also disagree with the statement that every extra bit counts. Modern computers have *gigabytes* of memory, negating any serious concern about raw space. At the CPU level, registers are 32bits on some CPUs, and 64bits on an ever-increasing number. There is plenty of room between 16 and 64 bits, and all of that gets padded out during computation.

I don't think there are any arguments that things like vectors and matrices are more efficient in their raw, primitive form. However, I think that has more to do with the way the JVM heap and GC works. JVM objects have a fair bit of overhead, and in something as complex as a matrix, that really adds up. However, that's a part of the JVM design that has nothing to do with whether or not Java has real primitives.

Gilad Bracha said...

Abbie,

If you read the post carefully, you'll see that no change in runtime representation of primitive types is suggested. Nor of arrays (only array classes).

Hence, your concerns are unfounded. They pertain mainly to numeric computation in a dynamically typed pure OO setting, which was not the topic.

abies said...

Sorry - indeed I have skipped the middle part bit too fast and my imagination filled out the blanks...

In case you don't want to have special representation of the primitives on jvm level, how do you suggest to handle nulls (as opposed to zeros)?

Second question - if you want to have only once instance of Integer for each int (and presumably Double for each double), how can you imagine it being cached?

One of the answers to both questions is to use 'tagged' representation of pointers, like Smalltalk implementions do - I kind of assumed it by default, thus my original comment.

Alex Buckley said...

The Javalobby link to your entry is "Would Java Be Better Off Without Primitives?". In that vein, I'd like to make two points:

1. We can all agree the answer to that question is yes, because a type system split between value types and reference types is injurious. There is no need to think in terms of "getting rid of primitive types", since that attracts a more hysterical reaction of an utterly pointless variety - primitive types (and checked exceptions, and raw types, etc) aren't being removed anytime soon.

2. Just because the language would have been easier to learn and use without primitive types, it does not follow conclusively that the language would have been better off without them. Would it have been adopted so enthusiastically in the late 90s had it disdained familiar C-style primitive types and insisted that 'everything is an Object'? I believe the answer is no. (Especially when people realized that boxing and unboxing occur, and that the reference types for integers et al are rather limited.) Two steps forward, one step back.

Osvaldo Doederlein said...

[Devil's Advocate mode...]

The example of a char class looks weak, because we could have just a better spec - IMHO the problem is not that char is primitive, the problem is that the JLS says that char is an IntegralType complete with a public numeric value range, subtytping (int>char), etc. There are other design options, like not having any char type at all, implementing java.lang.String and friends with short[] (that could be later changed to int[]); programs would use a String of length 1 to hold individual characters (e.g. like Sun's own JavaFX Script language does now).

The discussion about OO vs primitive types should really start with this challenge: name at least one pure OOPL that doesn't exhibit any sigificant performance penalty over languages with primitive scalars/arrays. And don't fool me with high-level application benchmarks; I want that top performance in math/array microbenchmarks, or real-world low-level algorithms like data compression, video encoding, network stacks, etc.

The fact is, in Java even with primitive types we still have significant performance limitations for many niche uses, because we don't have features like lightweight objects (value types / structs / tuples), by-value object fields and arrays, fine control over layout (e.g. alignment) and other typesystem delicacies. For some programs these features would allow faster Java code, for others it would allow MUCH faster interfacing with native code that mandates specific data layouts. If you're writing, say, a OpenGL binding library or a PureJava DBMS, the result is very ugly NIO code that manages complex data structures in byte buffers. A high-level OO view of that data cannot be offered without some cost (copying/marshalling data from buffers to objects and back).

Now if the argument is that a high-level OOPL shouldn't be optimized to write those low-level algorithms, then the problem is the whole strategy of the Java platform with its "Pure" mantra. The Java APIs themselves use native code only as a last resort; and for anything that's not in the JRE, say a JavaEE container, it's been many years that using a single byte of native code is considered heresy. But a JavaEE container needs many low-level components like XML parsers, protocol handlers, JMS connectors, database drivers, loads of resource management (several types of pools and caches), etc. One just doesn't implement that kind of stuff in a pure OOPL. I don't say that is impossible, but so far there are no such implementations to prove me wrong.

Gilad Bracha said...

Alex,

1. I have not made any proposal to change Java in any way. The post says that quite clearly. Alas, people don't read very carefully. I have no interest in fixing Java unless someone pays me to do so. The best thing to do is simply to move on to more advanced languages.

2. This came up in an earlier comment. I think that given the proposed changes, most people would not be able to tell the difference. It could be just as performant and just as successful as it actually was.

Gilad Bracha said...

Osvaldo,

Couldn't disagree with you more.

Wrt char - if you expose the representation, you are hosed. That is fundamental. The other issues you mention are easy to handle.

One could of course dispense with char altogether. The point was to change the language as little as needed (as the post says explicitly).

I agree there are niches where even Java is less efficient than C or Fortran. I said nothing about being more efficient than Java (though I don't necessarily concede the point either).

While I don't feel the least bit obliged to provide you with an existence proof of a pure OO language as efficient as Java, others have pointed out Scala.

As for "one simply doesn't implement that kind of stuff in a pure OO language" - speak for yourself.

Lars Bak's Resilient Smalltalk was used to write device drivers for embedded software for example. Not toys, not something inefficient - the real deal.

A language that lets you build nice abstractions can even make byte array manipulations pretty palatable.

So I'd suggest keeping a more open mind. Programming languages are a lot more interesting than the mainstream.

Gilad Bracha said...

Abies,

Null is easy. Pretend it's an object, compile as usual.

As for having a single instance for a given integer - the point is, you can't tell.

Howard Lovatt said...

Gilad,

You said: "Null is easy. Pretend it's an object, compile as usual." Can you enlighten me, I thought they would be hard. EG:

int[] ia = new int[1];
// ia points to a packed int array not an Object[]
ia[0] = null;
// Now I need ia to point to an Object[]

You could do something like this:

class IntArray {
int[] intValues;
object[] objValues;
}

and

int[] ia = new int[1];

gets translated to:

IntArray ia = new IntArray();
ia.intValues = new int[1];

And use intValues if there are no nulls, otherwise use objValues (and switch from intValues to objValues if a null is added). But it isn't ideal and every access needs to check if intValues is null, i.e.:

int i = ia[0];

Gets translated to:

if (ia.intValues == null) { throw ... }
int i = ia.intValues[0];

Therefore my current thinking is to have classes that are value types that null cannot be assigned to.

I ask because I am interested in making immutable types in my extended compiler (PEC) use primitives. Issues like the above seemed hard to me.

Thanks in advance for any hints,

Howard.

Gilad Bracha said...

Howard,

Since ia[1] =x is just a sugar for ia.setIntElement(1, x), (assuming ia is an int[]) the question is what does the unboxing from x to an int do - especially when x is not an int, or something compatible with int. Say, if x is a String.

Of course, this shouldn't happen. It would be a compile time error.

Which leads to the general conclusion that null is not a member of Value or any subclass, such as int. Which is what I think you were saying. However, null is an object - it is just unique.

Hence the unboxing routine doesn't pay the penalty of testing for these situations.

But null gets compiled into the same machine code it always does, which is what I was trying to say.

Feel free to let me know of other possible holes/oversights.

Daniel Spiewak said...

Actually, if you're designing a new language from scratch, I would recommend eliminating null altogether. You need a slightly more robust type system, but it's worth it all around.

Howard Lovatt said...

Thanks Gilad, no your solution is the same as I was thinking of doing (and the same as Scala).

Thanks for replying,

Howard.

Howard Lovatt said...

Daniel,

I am doing an extension of Java so I need to deal with null. But for a completely different language not having null may well be a good option.

Howard.

Osvaldo Doederlein said...

On char, my idea would be exactly NOT exposing it; char's binary representation could be opaque. It could just have an API allowing conversions in a more civilized way, e.g. with a method that queries the maximum integer value (implementation dependent) that can be converted into a char without loss of bits.

There are more problems with char. While for some uses you want 32-bit chars, some people go to the trouble of implementing things like XML parsers over raw byte[] strings, because it's much faster when Unicode is not an issue and the encoding fits in 8 bits. Then an OO design demands a hierarchy of char types of different capacities, or just to pick 32 bits and forget about efficiency. Even Java's 16-bit chars are a significant memory overhead for applications that make zero use of Unicode.

On the rest... I didn't know Resilient Smalltalk, but from the paper, it's certainly not an "efficient" language in terms of raw speed (it's even interpreted), seems more focused on other factors like low resource usage and safe access to resources like memory and I/O. Not very relevant to the discussion IMHO; when I say "performance" I mean "in the same ballpark of C". But, I know other examples of OO/managed languages achieving that performance for low-level systems programming: JikesRVM, Maxine, Squeak, Singularity, BulletTrain, etc. But they usually resort to either some bootstraping mechanism, or unsafe language/bytecode extensions, or a boatload of specific compiler tricks to produce lightweight types and low-level operations (even sun.misc.Unsafe has a lot of that). [The paper Demystifying Magic: High-level Low-level Programming is a great review of these techniques.] I just don't put such solutions in the same ballpark of an "intrinsically efficient" language. And these solutions have tradeoffs... in JikesRVM, for example, there are anotations to extend the Java language semantics, so you can declare e.g. an @Unboxed class that has no object header and controlled field layout. This is indistinguishable from having a 'struct' type in the Java typesystem, except that this type construction is not implemented by the language but rather by a special compiler (which means that it's not type-safe from Java's POV, e.g. javac won't complain if you synchronize on some @Unboxed object, although it has no monitor).

I try to keep an open mind, but I also think that there is a reason why some languages are mainsream and others are not; Java's great balance of high-level features and performance (for 1995 anyway) was critical to its success. If Sun had designed something like Ruby, I would probably be still hacking applications in C++. And the "existence proof" is not the only valid argument in such discussions, but it's certaily the only argument that is inquestionable and definitive - if somebody gave me that kind of proof, I would happily shut up and stop whining and trolling about the limitations of pure OOPLs.

P.S.: The only "pure" language design that impressed me performance-wise was Haskell with its GHC compiler. But that's a very different domain (functional language; zero dynamic features; massive dependency on closed-world compilation; and unfortunately, a hideously complex typesystem that killed its momentum and any chance of significant adoption). But perhaps there are some lessons to be learned/copied there.

A said...

@Daniel Spewak & others

Just curious, how would you design/use a language without null? (I'm sincerely interested by that issue for my project ooc-language)

I understand that I may be totally stuck in that C/Java self-education that I have, but I can't imagine not being limited by the absence of null (or rather, not being condamned to poorly reimplement it).

Practical use case: I need to parse different kind of tokens. I have one method per kind of token. These methods return a new instance of XXXToken if a token of type XXX has been read successfully, and null if there was no token of their type in the "buffer". (Ie. if StringTokenReader doesn't encounter a '"', then return null). How would I do that without null? Returning a special NoTokenRead object? (isn't that a specific kind of null) Throwing an exception? (ouch performance) Change the function to notify a listener if we've read a token, and make the function return void?

That use case may look pretty stupid, but I'd be really interested in other ways of doing it.

Daniel Spiewak said...

This might help clear things up: http://www.scala-lang.org/docu/files/api/scala/Option.html

In Java terms, you have an interface Option parameterized with type A. This interface has exactly two implementations: Some and None. None is a singleton representing the absence of a value. Some represents exactly one value of the parameterized type. So, any time you want to have a "nullable" value of type String, you need to declare it explicitly in the return type Option[String].

In order to do this effectively, the type system does need to support covariant type parameters (Java does not, but Scala does) and a bottom type (`Nothing` in Scala). It's also helpful to have sealed inheritance to prevent an unbounded hierarchy under Option.

This is a *very* common pattern in functional languages, which do not support null as a general rule. Haskell has the Maybe ADT, which instantiates to either `Just a` or `Nothing`. IIRC, ML uses `Option`, but it might be `Maybe`. Either way, it's the same idea.

The nice thing about this is you can leverage the type system to prevent NPEs at compile time, as well as leverage monadic operations to chain computations together (imagine C# or Groovy's ?: operator on steroids).

Amos Wenger said...

That is very nice indeed =) I'm thrilled to see such solutions exist.

I guess this could be implemented in a "high level low level language" by just introducing additional checks at compile time and still compile to null/non-null? The desired effect it to enforce nullable values/objects to be explicit, and enforce handling of the two cases by "user code", or is there something more to it?

(Everyone: sorry for this public display of my functional ignorance, it's 'Java brain damage' that you see here)

Daniel Spiewak said...

Well, conceptually speaking, *every* type in Java is nullable, which can be considered as everything being wrapped in Option by default. If you think about it though, this really negates a lot of the benefit of a dedicated Option type. Making `null` a real object (like `None`) would be nice, but only a marginal benefit over conventional null. I could make the argument that the transparency of Java's nullability is really what causes all of its NPE-related problems.

Gilad Bracha said...

Re: languages w/o null.

The Maybe solution is one possibility. An alternative is to have the type system enforce non-nullness.

So null is not a member of a reference type T. If you want to use null, you need to declare the type as:

T | Null

This is perhaps more natural for programmers in the mainstream tradition - but not completely natural. We considered such a scheme back in the early 90s for Strongtalk, but were too conservative to go with it.

Amos Wenger said...

I see. Java's transparency of null is indeed often a problem, since it's a convention in some APIs to pass null for optional parameters when they are not needed.

But then again, in my "high level low level" language, ooc, I have to deal with primitive types (to be fully interfaceable with C with no noticeable performance hit), and I can't think of a way to make an int/char/float `null` apart from wrapping it into an object: but that would really be a tough cost for the "Option" feature, in that case. (Even though the overhead of an object in ooc is no more than the size of a pointer on your platform).

Sometimes I wish I were in "pure high level programming language"-land, where you can abstract and not think about performance (but come on, Ruby?)

Another thing I didn't really get, Gilad, you say about "a single instance of Int per given integer", that the point is you can't tell. How would that be implemented? (Compiled, no interpreters/no VM)

Amos Wenger said...

@Gilad: seems like only a syntactic difference to me, e.g. it reads like "type T or Null", while Option[T] reads like "if it exists, it's a T". I don't know which is more intuitive, though.

Also, I don't see what you mean by Null (not) being "a member of type T". Do you mean that null in that case would be an instance of type T kept in a static member of its 'class'? (in the Java sense of 'static')

Gilad Bracha said...

Amos,

Re: you can't tell how many copies of an int there are.
The post discusses this: == is a method in this proposal, and it does a value comparison on the underlying integer bits (i.e., == is the same as equals).

Re: null is not a member of T. Today, null is a member of every reference type, so T v = null; is legal.
Suppose null is not a member of T, only of the type Null. T v = null; is illegal. You can write:

Null v1 = null; // not very useful
(T | Null) v2 = null;
(T | Null) v3 = new T();

Now, v3.foo() is illegal unless foo() is method of Null (such as ==). You cannot get an NPE.

It might be nice to have a construct like

ifNonNull {... assume v3 is T ...} else {..}

To safely unpack the value. Or general pattern matching.

Which brings me to your comment: is the difference vs. Maybe only syntactic? No. The Maybe approach requires you to wrap T values in a Maybe if a "null" result is a possibility. The other approach doesn't.

Amos Wenger said...

Ok, I think I get it. The "null is a member of a reference type" phrase sounded a bit curious to me.

The fact that the (T | Null) approach doesn't need wrapping into another object attracts me, ie. it would be "cheaper" than the Option version. And it would only require changes in the compiler, not in the data representation (correct me if I'm wrong).

Anyway, Thanks Daniel & Gilad for the lesson in 21st century programming languages =)

abies said...

No amount of syntax magic, renaming nulls to Options or T|null will allow you to fit 2^32 different int values AND null inside one 32 bit register. Please make a distinction between some language sugar, jvm level changes and between possibility of implementing it in most optimal way on common CPUs.

As far as == for 'Int' comparing the contents... This probably would mean that every object would have to implement something like identityEquals(...) method, which would default to old == for reference types and be overwritten to compare for contents for 'primitive' types. In other case, how would

boolean myEquals(Object o1, Object o2) {
return o1 == o2;
}

myEquals(1,2);

would work ?

Amos Wenger said...

abies, I had the same question, but I think our high-level fellows here don't see a problem wrapping every integer in an object. Example C implementation with structs:

struct Int {
int value;
}
typedef Int struct Int*;

Int i = malloc(sizeof(struct Int));
i->value = 23; // Simple assignment
function(i);

void function(Int arg) {
if(!arg) {
// throw NullPointerException
} else {
int unpacked = arg->value;
}
}

Yeah, that doubles memory usage. That was also my concern.

Gilad Bracha said...

Abies and Amos,

It sounds like there is still some misunderstanding.

The only case where objects need to be wrapped is where they are passed into a context that expects an object. This is similar to autoboxing today EXCEPT that it is semantically transparent.

As I've noted, null is not a member of value types, so there is no issue with the number of values representable in 32 bits.

The overhead of treating == as a method is restricted to the top level node of the type hierarchy, where we don't know if we are dealing with a value or a reference.

I hope this is finally clear.

Amos Wenger said...

Gilad, this is what I had understood =)

What abies and I are just saying is, you can't implement null without wrapping it into an object, so there is a performance penalty somewhere, even if you wrap it *only* when you have to pass it to a context expecting an object.

For me there's no debate that what you expose is the way to go (only wrap it when necessary, make it all transparent). I'm just (always) worried performance-wise.

Ooooor, we could pick up a value and call it null. I suggest 42. What do you suggest?

Gilad Bracha said...

Amos,

Glad you did get it - the comments can be misleading. Of course, null doesn't need to get boxed - it's already supported by the JVM. I don't know of an implementation that uses 42 - but it's impossible to tell :-)

Amos Wenger said...
This comment has been removed by the author.
Amos Wenger said...

> It's impossible to tell
At least the HotSpot JVM doesn't use it:

________________________________
public class NullTest {

public static void main(String[] args) {
System.out.println("42 == null? "+(new Integer(42) == null));
}

}

________________________________
42 == null? false

Gilad Bracha said...

Amos,

I hope you're joking. But I can't tell.

Amos Wenger said...

Well, at least the real world implementation of Amos is.

_______________________
Amos a = new Amos();
System.out.println("C'mon, is he joking? "+a.isJoking())

_______________________
C'mon, is he joking? true

The Grey Lens Man said...

Pretty sure Scala does the "right" thing across the board here (and of course a whole lot more in many areas where Java has fallen). Everything is an object, primitives are optimized, defining new classes with defined operator semantics, ...

i.e., val total = base + addon
where total, base, addon: Price

I view Scala not as a new language, but as an improved, enhanced version of Java washed free of original sin. Java done right as it were.

I wish more influencers such as yourself would assist in making it a palatable choice and eventually a prevalent choice.

Gilad Bracha said...

Grey Lens Man:

I wish I had the influence you seem to attribute to me. Then there would be vast numbers of Newspeak programmers :-)

I have been following Scala since before it was called Scala. I believe I've even, on occasion, had a modest influence on its evolution. I think it is superb work, and clearly the language of choice on the JVM.

Hence, I would not reduce it to "Java done right". It is a new language - and could not have been successful at the time Java was developed and introduced. Its type system is far beyond what was possible back then; and culturally, it wouldn't have stood a chance at the time, being too far ahead of the C crowd.

This post simply shows how little need be done to plain old Java to make it a pure OO language (for a pretty weak definition of pure OO).

I still have some issues with Scala BTW: reflection, constructors, overloading, traces of static, dynamicity. But it is still a tour de force of language design.

Zappini said...

I once asked you about supporting lightweight objects as presented in David Bacon's Kava. Your illuminating response was "there were problems with it".

How is your suggestion different from Bacon's Kava?

Gilad Bracha said...

Zappini:

Obviously, the answer is that there are no problems with it :-)

Seriously, I see very little relationship. David's proposal allowed the programmer to define a bit level representation.

In any case, there is a big difference between a design that stands in its own right, and a proposal for changing a widely used system like Java, where compatibility casts its ugly shadow. As I've said repeatedly, I am not proposing any changes to Java.

Buddy Casino said...

> I think [Scala] is superb work, and clearly the language of choice on the JVM.

I played with Scala a bit, and it seemed very complex. To me, Clojure seems to be simpler and more scalable (immutability, stm). What do you think about Clojure?

Gilad Bracha said...

I haven't looked very closely at Clojure. It's basically a functional Lisp, which is not a bad thing. I don't think it represents any important new ideas in PL design though.

I think you'd be hard pressed to beat Scala;s performance on the JVM though. The new collection libraries in Scala are purely functional BTW.

I agree Scala may not be for everyone - it is a sophisticated language.

I hope we'll get Newspeak to work well using the dynamic features in the Java 7 VMs. Just to give people some more choice.