Several people have asked me when Newspeak will be released. Well, I still don’t know, but at least now I know it will be released. Cadence has generously agreed to make Newspeak available as open source software under the Apache 2.0 license.
We will be publishing a draft spec for Newspeak soon; I say draft because I expect Newspeak to continue to evolve substantially for the next year at least, and because the initial spec will necessarily be incomplete.
It will be a while before we are ready to make a proper public release of Newspeak. In the meantime, I’ve gathered some information on my personal web site. We plan to set up an official Newspeak web site in the near future.
Tuesday, May 06, 2008
The Future of Newspeak
Saturday, April 26, 2008
java'scrypt
Everyone is talking about cloud computing these days; I should add some vapor to the mist.
I began thinking seriously about the topic a bit after April fools day 2004, when gmail was released. I started using gmail shortly therefater. After about 30 minutes, I was convinced that web clients had a long way to go.
It is true that gmail showed the world that you could do far more with javascript in a browser than I or most other people had realized.
Javascript and Ajax have come a long way since then. For an impressive demonstration of what’s possible, see the Lively Kernel, a realization of John Maloney’s Morphic GUI ideas of Self and Squeak, in Javascript. It only works in Safari 3.0 and some advanced Firefox builds, but that situation will improve in time.
Javascript’s performance remains an issue; this will also improve dramatically in the foreseeable future. However, with all the talk about computing in the cloud, I think some caveats are in order.
Many real applications will need to be used offline, and so will need to store state on the client. They are NOT going to be pure server side web applications. Google has finally acknowledged as much with the introduction of Google Gears. Microsoft’s LiveMesh naturally emphasizes this even more.
I say naturally, because Microsoft has an interest in the personal computer, while Google has an interest in taking us to an updated version of X terminals and 1960’s time sharing.
Of course, there are other technologies in this space, like Adobe Air. Macromedia (now part of Adobe) saw this coming earlier than most, going back to at least 2003 with its Central project.
With all these pieces in place, we may finally have the underlying elements necessary to build good applications delivered through the web browser.
Why has this taken so long? In theory, Java was supposed to deliver all this over ten years ago. Java started as a client technology, and applets were supposed to do exactly what advanced Ajax applications do today, only better.
Of course, the problem was that Java clients behaved very poorly; rather than fix the client technology, Sun captured the lucrative enterprise business. To understand why Sun behaved that way, why it had to behave that way, read The Innovator’s Dilemma.
Maybe I’ll expand on that theme in another post.
The result of Java’s failure on the client was that the entire vision of rich network-enabled client applications was put on hold for a decade. It is now once again coming to the fore, but there are other languages in that space now, Javascript chief among them.
I’m not advocating writing clients in Javascript directly. Javascript is the assembly language of the internet platform (and the browser is the OS). It’s flexible enough to support almost anything on top of it, and poorly structured enough that humans shouldn’t be writing sizable applications in it.
I am not in favor of the attempts to make Javascript 2 be another Java either. Javascript’s importance stems from the existing version’s position in the browser, and from its flexibility. It should be kept simple. Javascript's value increases the more uniformly it is implemented across browsers. A new, complex language will take a long time to reach that point.
People should program in a variety of languages that suit them, and have these compiled to Javascript and HTML (GWT is an example of that, though not one I really like). One shouldn't have to deal with the hodgepodge of Javascript and HTML (broadly defined to include XML, CSS etc.) that constitute web programming today.
I don’t expect native clients to go away any time soon though. They will provide a better experience than the browser in many ways for quite a long time. The advantage of writing in a clean, high level language and platform is that it can be compiled to run either on the “WebOS” or on a native OS.
What one wants to see in such a platform is the ability to take advantage of the network in a deep way - for software delivery and update and for data synchronization. Which brings me back to my theme of objects as software services (see the video), which I’ve been preaching publicly since 2005 ( see my OOPSLA 2005 presentation).
One should be able to write an application once, and deploy it both to the web browser and natively to any major platform. In all those scenarios, one should be able to update the software and data on the fly, run it on- and offl-ine and and monitor it over the network continuously. And all this needs to be done in a way that preserves user’s security. This last bit is still a big challenge.
The technology to do this is becoming available, even though everything takes much longer than one expects. The vision has its root in research in the 1980s, continued in the 1990s with Java (and others, now largely forgotten, like General Magic's Telescript) and will be probably be realized in the 2010s, per Alan Kay’s 30 year rule.
Sunday, March 23, 2008
Monkey Patching
Earlier this month I spoke at the International Computer Science Festival in Krakow. Krakow is a beautiful city with several universities, and it is becoming a high tech center, with branches of companies like IBM and Google. The CS festival draws well over a thousand participants; the whole thing is organized by students. While much of the program was in Polish, there were quite a few talks in English.
Among these was Chad Fowler’s talk on Ruby. Chad is a very good speaker, who did an excellent job of conveying the sense of working in a dynamic language like Ruby. Almost everything he said would apply to Smalltalk as well.
One of the points that came up was the custom, prevalent in Ruby and in Smalltalk, of extending existing classes with new methods in the service of some other application or library. Such methods are often referred to as extension methods in Smalltalk, and the practice is supported by tools such as the Monticello source control system.
As an example, I’ll use my parser combinator library, which I’ve described in previous posts. To define a production for an if statement, you might write:
if:: tokenFromSymbol: #if.
then:: tokenFromSymbol: #then.
ifStat:: if, expression, then, statement.
assuming that productions for expression and statement already exist. The purpose of the rules if and then is to produce a tokenizing parser that accepts the symbols #if and #then respectively. It might be nicer to just write:
ifStat:: #if, expression, #then, statement.
and have the system figure out that, in this context, you want to denote a parser for a symbol by the symbol directly, much as you would when writing the BNF.
One way of achieving this would be to actually go and add the necessary methods to the class Symbol, so that symbols could behave like parsers. I know otherwise intelligent people who are prepared to argue for this approach. As I said, Smalltalkers would call these additions extension methods, but I find the more informal term monkey patching conveys a better intuition.
Typically, one wants to deliver a set of such methods as a unit, to be installed when a certain class or library gets loaded. So these changes are often provided as a patch that is applied dynamically. Not a problem in Smalltalk or Ruby or Python (though I gathered from the Pythoners in Krakow that they, to their credit, frown on the practice).
Apparently, there is a need to explain why monkey patching is a really bad idea. For starters, the methods in one monkey’s patch might conflict with those in some other monkey’s patch. In our example, the sequencing operator for parsers conflicts with that for symbols.
A mere flesh wound, says our programming primate: I usually don’t get conflicts, so I’ll pretend they won’t happen. The thing is, as thing scale up, rare occurrences get more frequent, and the costs can be very high.
Another problem is API bloat. You can see this in Squeak images, where a lot of monkeying about has taken place over the years. Classes like Object and String are polluted with dozens of methods contributed by enterprising individuals who felt that their favorite convenience method is something the whole world needs to benefit from.
Even in your own code, one needs to exercise restraint lest your API becomes obese with convenience methods. Big APIs eat up memory for both people and machinery, reducing responsiveness as well as learnability.
Then there is the small matter of security. If you are free to patch the definition of a class like String (typically on the fly when their code gets loaded), what’s to stop malicious macaques from replacing critical methods with really damaging stuff?
The counter argument is that in many cases (though not in this example), the patch is designed to avoid the use of typecase/switch/instance-of constructs, which bring their own set of evils to the table.
Extractors are a new approach to pattern matching developed by Martin Odersky for Scala. They overcome the usual difficulty with pattern matching, which is that it violates encapsulation by exposing the implementation type of data, just like instance-of. It may be part of the answer here as well.
However, many monkey patches are motivated by a desire for syntactic sugar, as the example shows. Extractors won’t help here.
A variety of language constructs have been devised to deal with this and related situations. Class boxes and selector namespaces in Smalltalk dialects, context oriented programming in Lisp and Smalltalk, static extension methods in C# and even Haskell type classes are related. These mechanisms don’t all provide the same functionality of course. I confess that I find none of them attractive. Each comes at a price that is too high for what it provides.
For example, C# extension methods rely on mandatory typing. Furthermore, they would not address the example above, because we need the literal symbols we use in the grammar to behave like parsers when passed into the parser combinator library code, not just in the lexical scope of the grammar.
Haskell type classes are much better. They would work for this problem (and many others), but also rely crucially on mandatory typing.
Class boxes are dynamic, but again only effect the immediate lexical scope. The same is true of simple formulations of selector namespaces. Richer versions let you import the desired selectors elsewhere, but I find this gets rather baroque. I'm not sure how COP meshes with security; so far it seems too complex for me to consider.
I’ve contemplated a change to the Newspeak semantics that would accommodate the above example, but it hasn’t been implemented, and I have mixed feelings about it. If a literal like #if is interpreted as an invocation of a factory method on Symbol, then we can override Symbol so that it supports the parser combinators. This only effects symbols created in a given scope, but isn’t just syntactic sugar like the C# extension methods suggested above.
Of course, this can be horribly abused; one shudders to think what a band of baboons might make of the freedom to redefine the language’s literals. On the other hand, used judiciously, it is great for supporting domain specific languages.
So far, I have no firm conclusions about how to best address the problems monkey patching is trying to solve. I don’t deny that it is expedient and tempting. Much of the appeal of dynamic languages is of course the freedom to do such things. The contrast with a language like Java is instructive. Adding a method to String is pretty much impossible. One has to sacrifice one’s first-born to the gods of the JCP and wait seven years for them to decide whether to add the method or not. I’m not endorsing that model either: I know it only too well.
Regardless, given my flattering portrayals of primate practices, you may deduce that my main comment on monkey patching is “just say no”. The problems it induces far outweigh its benefits. If you feel tempted, think hard about design alternatives. One can do better.
Sunday, February 17, 2008
Cutting out Static
Most imperative languages have some notion of static variable. This is unfortunate, since static variables have many disadvantages. I have argued against static state for quite a few years (at least since the dawn of the millennium), and in Newspeak, I’m finally able to eradicate it entirely. Why is static state so bad, you ask?
Static variables are bad for security. See the E literature for extensive discussion on this topic. The key idea is that static state represents an ambient capability to do things to your system, that may be taken advantage of by evildoers.
Static variables are bad for distribution. Static state needs to either be replicated and sync’ed across all nodes of a distributed system, or kept on a central node accessible by all others, or some compromise between the former and the latter. This is all difficult/expensive/unreliable.
Static variables are bad for re-entrancy. Code that accesses such state is not re-entrant. It is all too easy to produce such code. Case in point: javac. Originally conceived as a batch compiler, javac had to undergo extensive reconstructive surgery to make it suitable for use in IDEs. A major problem was that one could not create multiple instances of the compiler to be used by different parts of an IDE, because javac had significant static state. In contrast, the code in a Newspeak module definition is always re-entrant, which makes it easy to deploy multiple versions of a module definition side-by-side, for example.
Static variables are bad for memory management. This state has to be handed specially by implementations, complicating garbage collection. The woeful tale of class unloading in Java revolves around this problem. Early JVMs lost application’s static state when trying to unload classes. Even though the rules for class unloading were already implicit in the specification, I had to add a section to the JLS to state them explicitly, so overzealous implementors wouldn’t throw away static application state that was not entirely obvious.
Static variables are bad for for startup time. They encourage excess initialization up front. Not to mention the complexities that static initialization engenders: it can deadlock, applications can see uninitialized state, and unless you have a really smart runtime, you find it hard to compile efficiently (because you need to test if things are initialized on every use).
Static variables are bad for for concurrency. Of course, any shared state is bad for concurrency, but static state is one more subtle time bomb that can catch you by surprise.
Given all these downsides, surely there must be some significant upside, something to trade off against the host of evils mentioned above? Well, not really. It’s just a mistake, hallowed by long tradition. Which is why Newspeak has dispensed with it.
It may seem like you need static state, somewhere to start things off, but you don’t. You start off by creating an object, and you keep your state in that object and in objects it references. In Newspeak, those objects are modules.
Newspeak isn’t the only language to eliminate static state. E has also done so, out of concern for security. And so has Scala, though its close cohabitation with Java means Scala’s purity is easily violated. The bottom line, though, should be clear. Static state will disappear from modern programming languages, and should be eliminated from modern programming practice.
Monday, December 31, 2007
More on Modules
My posts seem to raise more questions than they answer. This is as it should be, in accordance with the Computational Theologist Full Employment Act. In this post, I’ll try and answer some of the questions that arose from my last one.
How does one actually hook modules together and get something going? As I mentioned before, module definitions are top level classes - classes that are defined in a namespace, rather than in another class.
Defining a top level class makes its name available in the surrounding namespace. More precisely, it causes the compiler to define a getter method with the name of the class on the namespace; the method will return the class object.
Since a module definition is just a class, one needs to instantiate it, by calling a constructor - which is a class side method. Continuing with the example from the previous post:
MovieLister finderClass: ColonDelimitedMovieFinder
Now this isn’t quite realistic, since ColonDelimitedMovieFinder probably needs access to things like collections and files to do its job. So it’s probable that it takes at least one parameter itself. The typical situation is that a module definition takes a parameter representing the necessary parts of the platform libraries. It might look something like this:
ColonDelimitedMovieFinder usingLib: platform = (
|
OrderedCollection = platform Collections OrderedCollection.
FileStream = platform Streams FileStream.
|
)...
So we’d really create the application this way:
MovieLister finderClass: (ColonDelimitedMovieFinder usingLib: Platform new)
where Platform is a predefined module that provides access to the built-in libraries.
Bob Lee points out that if I change MovieLister so that it takes another parameter, I have to update all the creation sites for MovieLister, whereas using a good DIF I only declare what needs to be injected and where.
In many cases, I could address this issue by declaring a secondary constructor that feeds the second argument to the primary one.
Say we changed MovieLister because it too needed access to some platform library:
class MovieLister usingLib: platform finderClass: MovieFinder = ...
We might be able to define a secondary constructor
class MovieLister usingLib: platform finderClass: MovieFinder = (
...
): (
finderClass: MovieFinder = (
^usingLib: Platform new finderClass: MovieFinder
)
)
There are however two situations where this won’t work in Newspeak.
One is inheritance, because subclasses must call the primary constructor. I showed how to deal with that in one of my August 2007 posts - don’t change the primary constructor - change the superclass.
class MovieLister finderClass: MovieFinder = NewMovieLister usingLib: Platform new finderClass: MovieFinder
(...)
The other problematic case is for module definitions. In most cases, the solutions above won’t help; they won’t be able to provide a good default for the additional parameter, because they won’t have access to the surrounding scope. For this last situation I have no good answer yet. I will say that the public API of a top level module definition should be pretty stable, and the number of calls relatively few.
So overall, I think Bob makes an important point - DIFs give you a declarative way of specifying how objects are to be created. On the other had, it gets a bit complicated when different arguments are needed in different places, or if we don’t want to compute so many things up front at object creation time. Guice has mechanisms to help with that, but I find them a bit rich for my blood. In those cases, I really prefer to specify things naturally in my code.
Another advantage of abstracting freely over classes is that you can inherit from classes that are provided as parameters.
class MyCollections usingLib: platform = (
| ArrayList = platform ArrayList. |
)(
ExtendedArray List = ArrayList (...)
)
Now, depending what library you actually provide to MyCollections as an argument, you can obtain distinct subclasses (in fact, there’s an easier way to do this, but this post is once again getting too long). Correct me if I’m wrong, but I don’t think a DIF helps here.
You can also do class hierarchy inheritance: modify an entire library embedded within a module by subclassing it and changing only what’s needed. This is somewhat less modular (inheritance always causes problems) but the tradeoff is well worth it in my opinion.
I spoke about class hierarch inheritance at JAOO, and will likely speak about it again in one or more of my upcoming talks on Newspeak, at Google in Kirkland on January 8th, at FOOL on January 13th, or at Lang.Net 2008 in Redmond in late January.
I’m trying to make each of these talks somewhat different, but they will necessarily have some overlap. I hope that some of these talks will make it onto the net and make these ideas more accessible.
Sunday, December 16, 2007
Lethal Injection
Some months ago, I wrote a couple of posts about object construction and initialization. I made the claim that so-called dependency-injection frameworks (DIFs) are completely unnecessary in a language like Newspeak, and promised to expand on that point in a later post. Four months should definitely qualify as “later”, so here is the promised explanation.
I won’t explain DIFs in detail here - read Martin Fowler’s excellent overview if you need an introduction. The salient information about DIFs is that they are used to write code that does not have undue references to concrete classes embedded in it. These references are usually calls to constructors or static methods. These concrete references create undue intermodule dependencies.
The root of the problem DIFs address is that mainstream languages provide inadequate mechanisms to abstract over classes.
Terminology rant: DIFs should more properly be called dependee-injection frameworks. A dependency is a relationships between a dependent (better called depender) and a dependee. The dependencies are what we do not want in our code; we certainly don’t want to inject more of them. Instead, DIFs inject instances of the dependees, so the dependers don’t have to create them.
DIFs require you write your code in a specific way, where you avoid creating instances of dependees. Instead, you make sure that there is a way to provide the dependee instance (in DIF terminology, to inject it) from outside the object. You then tell the framework where and what to inject. The reason injection traffics in instances rather than the classes themselves is because there’s no good way to abstract over the classes.
Having recapped DIFs, lets move on to Newspeak. Newspeak modules are defined in namespaces. Namespaces are simply objects that are required to be deeply immutable; they are stateless.
Tangent: This ensures that there is no global or static state in Newspeak, which gives us many nice properties.
Namespaces are organized like Java packages, as an inversion of the internet domain namespace. Unlike Java packages, sub-namespaces can see their enclosing namespace.
A module is a top-level class, that is, a class defined directly within a namespace. Newspeak classes can nest arbitrarily, so a module can contain an entire class library or framework, which can in turn be subdivided into subsystems to any depth. Nested classes can freely access the lexical scope of their enclosing class.
Modules, like all classes, are reified as objects that support constructor methods. Recall that in Newspeak, a constructor invocation is indistinguishable from an ordinary method invocation. Objects are constructed by sending messages to (invoking virtual methods on) another object. That object may or may not be a class; it makes no difference. Hence all the usual abstraction mechanisms in the language apply to classes - in particular, parameterization.
Here is a trivial top level class, modeled after the motivating example for DIFs given in Fowler’s article:public class MovieLister = (
|
private movieDB = ColonDelimitedMovieFinder from:’movies.txt’.
|)
(
public moviesDirectedBy: directorName = (
^movieDB findAll select:[:m |
m director = directorName
].
)
The idea is that MovieLister supports one method, moviesDirectedBy:, which takes a string that contains the name of a director and returns a collection of movies directed by said director. The point of Fowler’s example is that there is an undesirable dependency on a class, ColonDelimitedMovieFinder, embedded in MovieLister. If we want to use a different database, we need to change the code.
However, this code won’t actually work in Newspeak. The reason is that the enclosing namespace is not visible inside a Newspeak module. Any external dependencies must be made explicit by passing them to the module as parameters to a constructor method. These parameters could be other modules, namespaces, or classes and instances of any kind.
In this specific case, ColonDelimitedMovieFinder cannot be referenced from MovieLister. If we try and create a MovieLister by writing: MovieLister new, creation will fail with a message not understood error on ColonDelimitedMovieFinder. We’d have to declare a constructor for MovieLister with the movie finder as a parameter:
public class MovieLister finderClass: MovieFinder = (
|
private movieDB = MovieFinder from:’movies.txt’.
|)
(
public moviesDirectedBy: directorName = (
^movieDB findAll select:[:m |
m director = directorName
].
)
At this point, we can immediately see that we can replace ColonDelimitedMovieFinder with any class that supports the same interface, which was the object of the entire exercise. Newspeak won’t let you create a module with concrete external dependencies, because that wouldn’t really be a module, would it?
In Newspeak code in a module doesn’t have any concrete external dependencies, and no dependees need to be injected. What’s more we can subclass or mix-in any class coming in as a parameter - something a DIF won’t handle.
What about a subsystem within a module? What if I don’t want it using the same name binding as the enclosing module? I can explicitly parameterize my subsystem, though that requires pre-planning.
I can also override any class binding in a subclass. Newspeak is message-based, so all names are late-bound. Hence any reference to the name of a class can be overridden in a subclass. Classes can be overridden by methods or slots or other classes in any combination. So even if you do not explicitly parameterize your code to allow for another class to be used to construct an object, you can still override the binding of the class name as necessary.
In summary, Newspeak is designed to support, even induce, loose coupling. That’s the point of message based programming languages. DIFs are an expedient technique to reduce code coupling in the sad world of mainstream development, but in a language like Newspeak, they are pointless.
Tuesday, September 25, 2007
Executable Grammars
Way back in January, I described a parser combinator library I’d built in Smalltalk. Since then, we’ve moved from Smalltalk to Newspeak, and refined the library in interesting ways.
The original parser combinator library has been a great success, but grammars built with it were still polluted by two solution-space artifacts. One is the need to use self sends, as in
id := self letter, [(self letter | [self digit]) star]
In Newspeak, self sends can be implicit, and so this problem goes away. We could write
id:: letter, [(letter | [digit]) star]
The other problem is the use of blocks. We’d much rather have written
id:: letter, (letter | digit) star
The solution is to have the framework pre-initialize all slots representing productions with a forward reference parser that acts as a stand-in for the real parser that will be computed later.
When the tree of parsers representing the grammar has been computed, it will contain stand-ins exactly in those places where forward references occurred due to mutual recursion. When such a parser gets invoked, it looks up the real parser (now stored in the corresponding slot) and forwards all parsing requests to it.
Our parsers are now entirely unpolluted by solution-space boilerplate, so I feel justified in calling them executable grammars. They really are executable specifications, that can be shared among all the tools that need access to a language’s grammar.
Below is a small but complete grammar in Newspeak:
class ExampleGrammar1 = ExecutableGrammar (
|
digit = self charBetween: $0 and: $9.
letter = (self charBetween: $a and: $z) | (self charBetween: $A and: $Z).
id = letter, (letter | digit) star.
identifier = tokenFor: id.
hat = tokenFromChar: $^.
expression = identifier.
returnStatement = hat, expression.
|
) ()
If you want to understand all the details, check out this paper; if you can, you might also look at my JAOO 2007 talk, which also speculates on how we can make things look even nicer, e.g.:
class ExampleGrammar3 = ExecutableGrammar (
|
digit = $0 - $9.
letter = ($a - $z) | ($A - $Z).
id = letter, (letter | digit) star.
identifier = tokenFor: id.
expression = identifier.
returnStatement = $^, expression.
|
)()