A place to be (re)educated in Newspeak

Friday, December 05, 2008

Unidentified Foreign Objects (UFOs)

I recently found out that Newspeak’s basic foreign function interface (FFI), called Aliens, is being made available in Squeak (though that will require new VMs with the required primitives). Thanks to John McIntosh for doing this.

I should also thank Eliot Miranda for most of the original work on aliens, and Vassili Bykov, Peter Ahe and Bill Maddox for the rest. Also thanks to Lars Bak, whose work on the Strongtalk FFI inspired the VM level view of aliens; and to Dave Ungar, who was the first to understand that objects were all you needed on the language side of an FFI. Lastly, this post benefited immensely from conversations with Vassili.

So I figured I’d write a little bit about Aliens from a high level perspective. As usual, the ideas apply to programming languages in general.

In Smalltalk, there isn’t a standard FFI. Various dialects provide different solutions, with varying degrees of functionality, performance and ease of use. To be honest, they are usually a poor fit with the surrounding language and fairly awkward to use. This inhibits Smalltalk’s interoperability with the rest of the world. I’d argue that the absence of a good, standard FFI has cost the Smalltalk community dearly.

In Java, by contrast, native methods and JNI provide a standardized FFI. This mechanism is far from perfect, but at least there is a more or less standard solution.

What these and other systems have in common is support for a special construct (such as the native modifier for methods, or declarations like extern C, or the truly ugly ad hoc FFI syntax extensions used in various Smalltalks) for foreign functions.

Newspeak’s FFI was strongly influenced by the Strongtalk FFI; but unlike Strongtalk, Newspeak doesn’t have a special syntax for foreign calls. As Self showed many years ago, one doesn’t really need a special syntax for the FFI. The foreign functions, APIs, DLLs etc. can all be represented as objects. They just happen to be foreign objects.

The idea of a foreign object, which we call an alien, is at the foundation of the Newspeak FFI.

For starters, any decent language should be able to represent functions as values; and in an object-oriented language, these values are objects, accessed via a standard interface. Foreign functions are just a different implementation of that interface.

Another natural way to model a foreign function is as a method defined on a foreign object. For example, one can view an entire DLL as an object with a set of methods corresponding to the functions defined by the DLL. Better yet, we could represent an entire API as an object, independently of what DLLs actually defined it.

Aliens can be defined for different foreign languages; for example, while Alien is used to interface with C, we also have a class called ObjectiveCAlien that can be used to interface with ObjectiveC, which is the native language on MacOS X. C Aliens and ObjectiveC Aliens do not interfere with each other, and when/if we need to add Java Aliens or CLR Aliens we can do that as well.

The alien approach is also a good fit with security: one need not be concerned that code may bypass high level language safety guarantees by calling out to C; untrusted code can be prevented from doing that, simply by not providing any Alien library objects to it.

Newspeak’s C Alien implementation is fast, but also dangerous. An alien is basically a blob of memory. The user of an Alien is responsible for interpreting and accessing that data correctly. There is no checking being done for you.

Tangent: It's worth noting that the basic Alien layer may evolve further; for example, we aren't thrilled with the practice of subclassing Alien. It's not clear if the Alien class really needs to change, or just the pattern of using it.

On top of this foundation, safer and/or more convenient abstractions can be built. We have built objects that support not just methods corresponding to the functions of an API, but also methods that provide factories for the various datatypes used in the function’s signatures, including those defined by macros. These objects wrap the basic alien API, and help with error prone book keeping - converting between Newspeak types (e.g., Strings) and foreign types, freeing aliens after use etc.

At the moment, both the declarations of low level aliens and higher level APIs are constructed manually, which is tedious and error prone. We’ve been planning on a higher level tool called CSlick, which would allow you to specify a set of .h files and the requisite DLLs, and obtain an object that supports the desired functions automatically.

As a first approximation, you could think of CSlick as a function:

CSlick: List -> List -> ForeignAPI

The signature above is deliberately curried, because you may actually want to be able to specify just the header files, and later bind different DLLs to provide the actual functionality, just as a .h file can be associated with different .c files.

When this will happen is anyone’s guess right now; but Vassili has done this before (in the context of Lisp) and I’m sure he can do it again.

The resulting foreign API should incorporate the low level alien API, and, as much as possible, a higher level API as well.

The CSlick implementation will need to know how to parse C header files, and how to reflectively manufacture the low level code that actually invokes the C functions. Fortunately we have a strong parsing infrastructure, so that isn’t as daunting as it sounds.

When I’ve told people about CSlick, they often mention SWIG. However, I believe CSlick can be made substantially easier to use than SWIG. SWIG has to cope with multiple languages, each with a pre-existing story on how to do foreign calls. In contrast, we can integrate CSlick more tightly with the language. Ultimately, that should translate to a simpler model for the user.

The key take away is that objects are all you really need to interact with foreign programming languages. They are better than built in language constructs in terms of ease of use, security, and multiple language support. As usual, less is more.

6 comments:

Unknown said...

This resonates with this thought regarding the appeal of the JVM. It is based on two axioms:
(a) Objects provide a straightforward way for one piece of code to communicate with another.
(b) In modern program it is typical to use more than one language.

The immediate conclusion is that you want your machine language to support the notion of objects. This will ease interoperability across language boundaries.

This explains a lot of the popularity of the JVM: the machine code, that is: the bytecode, is object-aware so mixing different languages is easier than with languages that are compiled to C/assembly.

Unknown said...

It's interesting how Ruby allows to use "Aliens". There is a gem called RubyInline http://www.zenspider.com/ZSS/Products/RubyInline/, which allows the C/C++ code to be inlined in the regular Ruby class. When it happens, the class has new method, which name is the same as the defined C function. It looks really nice.

Recently Charles Nutter made some enhancements in the JRuby which allow similar constructs in this implementation (i.e. inlining Java code in Ruby classes).

James said...

Itay - shouldn't that be

(b) In post-modern program it is typical to use more than one language.

sorry. couldn't resist.

Unknown said...

James,

I just went back to your paper the other week, so I guess that in some subliminal way that is *exactly* what I meant...

Maxim Fridental said...

How is this approach different from that used in Dolphin Smalltalk?

I've also used to define classes there, one class per DLL. Class methods where mapped to DLL functions. To use them, I've fetched an instance of the class (via Singleton pattern) and called its methods.

And another topic. If you need a task where you can test how flexible and "composable" your Alien approach is, you may want to try WMI via COM.

WMI uses COM as metalanguage, so for example a method of an WMI object is represented as a COM object. To call a WMI method, you instantiate a correponding COM object and call its Invoke method.

So you have several levels of indirection there, and flattening them in the way that WMI alien objects would look just like native Newspeak objects may be an interesting challenge.

Gilad Bracha said...

Maxim,

Like most Smalltalks, Dolphin has a special syntax for making foreign calls. Consequently, such calls are NOT subject to control via a capability system. Foreign call in Dolphin is a language construct with special syntax and semantics.