A place to be (re)educated in Newspeak

Sunday, June 05, 2011

Types are Anti-Modular

Last week I attended a workshop on language design. I made the off-the-cuff remark that types are actually anti-modular, and that comment resonated enough that I decided to tweet it. This prompted some questions, tweets being a less than perfect format for elaborate explanation of ideas (tweets are anti-communicative?). And so, I decided to expand on this in a blog post.

Saying that types are anti-modular doesn’t mean that types are bad (though it certainly isn’t a good thing). Types have pros and cons, and this is one of the cons. Anyway, I should explain what I mean and how I justify it.

The specific point I was discussing when I made this comment was the distinction between separate compilation and independent compilation. Separate compilation allows you to compile parts of a program separately from other parts. I would say it was a necessary, but not sufficient, requirement for modularity.

In a typed language, you will find that the compiler needs some information about the types your compilation unit is using. Typically, some of these types originate outside the compilation unit. Even if your program is just: print(“Hello World”), one may need to know that string literals have a type String, and that the argument type of print is String. The definition of String comes from outside the compilation unit. This is a trivial example, because it is common for String to be part of the language definition. However, substantial programs will tend to involve user-defined types at the boundaries of compilation units (or of module declarations,which may or may not be the same thing).

A consequence of the above is that you need some extra information to actually compile. This could come in the form of interface/signature declaration(s) for any type(s) not defined within your compilation unit/module, or as binary file(s) representing the compiled representation of the code where the type(s) originated. Java class files are an example of the latter.

Whatever the form of the aforementioned type information, you depend on it to compile your code - you cannot compile without it. In some languages, this introduces ordering dependencies among compilation units. For example, if you have a Java package P1 that depends on another package P2, you cannot compile P1 before compiling P2. You either compile them simultaneously (giving up on even separate compilation) or you must compile P2 first so you have class files for it around. The situation is better if the language supports separate signature declarations (like Modula-3 or ML) - but you still have to have these around before you compile.

Semi-tangent: Of course, you can fake signature declarations by dummy package declarations. Java chose to avoid the conceptual overhead of separate signature declarations, on the assumption that pragmatically, one could get by without them.

Contrast this with independent compilation, where you compile your module/compilation-unit independently of anything else. The code that describes the values (and types) that your module requires may not even exist yet. Obviously, independent compilation is more modular than separate compilation. How do you achieve this in the presence of types? The short answer is you don’t.

Wait: we are blessed, and live in a world where the gods have bestowed upon us such gifts as type inference. What if I don’t have to write my types at all, and have the compiler just figure out what types I need? The problem is that inference only works well within a module (if that). If you rely on inference at module boundaries, your module leaks implementation information. Why? Because if you infer types from your implementation, those types may vary when you modify your implementation. That is why, even in ML, signatures are declared explicitly.

Wait, wait: Surely optional types avoid this problem? Not exactly. With an optional type system you can compile independently - but you cannot typecheck independently. Indeed, this is the point: there is no such thing as modular typechecking. If you want typechecking across modules, you need to use some of the same types in across modules. You can either replicate the types or place them in some specific module(s). The former clearly isn’t very modular. The latter makes some modules dependent on declarations defined elsewhere which means they cannot be typechecked independently. In the common case where types are mandatory, modules can not be compiled independently.

Now, there is an argument to be made that modules have dependencies regardless, and that we cannot reason about them without being aware of these dependencies. Ergo, the types do not make change things fundamentally. All true. Even in dynamic language we have some notion of type or signature in our head. Formalizing that notion can be helpful in some ways, but it has downsides. One such downside is that formalizing the types reduces our ability to manage things in a perfectly modular way. You cannot typecheck modules in isolation (except in trivial cases) because types capture cross-module knowledge.

One often hears the claim that types are in fact valuable (or even essential) to modularity because they can describe the interface(s) between modules. There lies the problem: the type cannot serve this purpose unless it is available in more than one module. Types are inherently non-local - they describe data that flows across module boundaries. The absence of types won’t buy you modularity on its own though. Far from it. But types and typechecking act as an inhibitor - a countervailing force to modularity.

68 comments:

Edward Z. Yang said...

Benjamin Pierce has also commented on this subject, since the anti-modularity of types really becomes evident when you're dealing with dependent types. The talk was (facetiously) named "Types considered harmful." http://www.cis.upenn.edu/~bcpierce/papers/harmful-mfps.pdf

James Iry said...

The usual definition of modularity involves a separation of interface and implementation. Changes to interface may require changes in other implementations while changes to implementations never should.

Static types are about interface and not about implementation (at least, once you get past C). So saying types are anti-modular is just to recognize the role they play as interface descriptions.

That said, it seems like this article is focused too narrowly on simple types and ignores quantified types.

Existential types are an explicit accounting of modularity. And thus lumping all types under the anti-modular umbrella is missing an important aspect of static typing.

Similarly, parametricity theory says that parametric polymorphism creates a wall between a parametrically polymorphic function and the values it works with. Languages like Java aren't parametric in this sense, but Haskell is. In Haskell a function with type forall a. a -> something cannot examine the values it is given. And that too is a kind of modularity.

Casper Bang said...

I've often thought that, as software grows in complexity and size, we're bound to end up with a pragmatic hybrid-paradigm where we're not confined by types (defined and shared up-front) which can be a major hassle especially with Java's checked exceptions thrown in.

Perhaps the analogy is broken, but I don't think it's unlike the issues in theoretical physics where different models applies to different scales (quantum mechanics vs. general relativity, or even modern M-theory's multi-theories).

What I can not figure out, is at what level types makes the most sense; within an isolated module or between modules. I think it's the former though. Various dependency injection frameworks already live as "type brokers" at this abstraction level.

Brian said...

Dynamic typing doesn't restore the modularity- it simply delays checking for violations. Say module X depends upon module Y having function foo- this is a statement which is independent of static vr.s dynamic checking. This means that that modularity is lost- you can't have a module Y which doesn't have a function foo. If module Y doesn't have a function foo, the program is going to fail, the only questions are when and how.

What I don't get is why detecting the error early is a disadvantage. It's well known that the earlier a bug is detected, the cheaper it is to fix. And yet, all the arguments in favor of dynamic typing are about delaying the detection of bugs- implicitly increasing their cost.

Josef said...

"There is no such thing as modular typechecking"

You're correct in practice but wrong in theory. All typesystems in practical use today fail modularity in the sense you outline. But that doesn't mean all typesystem fail at modularity. Indeed there are typesystems which enjoy the "principal typings" property, and they allow independent compilation. Examples of such typesystems include intersection types.

Contrast this with the weaker property of "principal types", which is standard for polymorphic typesystems. These do indeed only support separate compilation, not independent compilation.

I recommend "What Are Principal Typings and What Are They Good For?" by Trevor Jim and "The Essence of Principal Typings" by Joe Wells as further reading.

Sandro Magi said...

Type inference for first-class messaging disprove your hypothesis that type systems are anti-modular. The above can fully infer messge dependencies, and these dependencies can, in principle, be lifted to module boundaries automatically and checked at link time.

Really all you're looking for is that dependencies be inferred and checked lazily, rather than eagerly as in most languages. Static typing does not conflict with this goal, just *certain types* of static typing.

David Teller said...

With Opa, one of the most important design choices was to make the type system structural by default. So, if you, as a developer, know the definition of, say, lists, you can perform pattern-matching on lists without depending upon the definition of the type.


However, experience shows that this is not how developers reason. In almost all cases, Opa developers fall back to nominal typing, because this is a good documentation practice, because this makes error messages much nicer and because this seems to be how developers reason in the first place.

The only common exception is error handling, in which the pattern of "handle the cases I know how to handle, propagate everything else without having to know its type" is common - and closely matches exception-handling.

Gilad Bracha said...

Great. One of the advantages of blogging is I get so many people to tell me I'm wrong, forcing me to sharpen or narrow my arguments.

I've seen Benjamin's talk.

Sandro: I'd be interested how a type inference scheme can avoid leaking information about the implementation from which it is derived.

Josef: Thanks for the pointers. How practical do you think that stuff is, by the way?

Brian: I did not say dynamic typing will ensure modularity - in fact I very explicitly said that it would not.

David: Perhaps I should have expanded on structural types. When I said that you might have to replicate type definitions in multiple places - that covers structural typing as well. And, yes, besides that, structural typing might have other problems - I agree with those you suggest - but that was not my focus here.

Gilad Bracha said...

James: Perhaps I am focused to narrowly. It's more about type checking than any particular type construct.

But yes, checking interfaces between modules seems to require either replication (say, structural typing), leaking implementation information (inference) or a dependency that precludes independent type checking (though someone argued that's not the case; we'll see).

This isn't all that surprising. Typechecking relates to combining modules rather than building them independently. There is a tension there and it should be more widely recognized.

Sam Tobin-Hochstadt said...

This argument applies to any form of static information about a module, not just to types. For example, if you want to (a) import identifiers from other modules and (b) prohibit unbound variables, that's "anti-modular" as well. Newspeak handles this by not doing (a).

A second example is macros. If you want macros to be exported from modules, then you can't have independent compilation in the sense you want.

Lex Spoon said...

Another way to express the point is that not only do modules depend on each other, but also interfaces depend on modules.

The real interface between two modules, the interface that the programmers think about, is richer than what any pragmatic programming language can express. It involves more than just the class and method names, and it involves more than just their type signatures. It also involves something about what those classes and methods should behave.

That interface involves types, and, as excellently expressed in this article, the types are defined in modules. Thus, interfaces depend on modules.

The dependencies are real, and they need to be managed. However, getting rid of types doesn't get rid of the dependencies! If the ServletContext module exports a getUrl method, then anything depending on ServletContext also depends on the URL module. This is true whether or not the programming language lets you express the dependency.

Lex Spoon said...

I should add, I agree that types in the language will add *compilation* dependencies. Thus you have to recompile a lot more often.

Gilad Bracha said...

Sam: Yes, classic imports are anti-modular. I've posted about this on this blog in the past.

Lex: all true. And ironicaly, by allowing expressing the dependencies in the type system, the pragmatics of managing your modules independently become more difficult.

vas said...

A module by definition is something connectable. In software connections by definition are communication. Communication requires a language understood by both parties.

How can one define an interface language without some notion of type? How can something that is essential also be anti?

Gilad Bracha said...

Sandro: the link is broken.

Gilad Bracha said...

vas: it isn't essential. As Lex pointed out, no practical language can capture the dependencies in any case. You can build high quality, reliable software without a type system.

Eugene said...

In principle you can infer expected types for everything imported from other modules and check that these expectations hold at run-time, when you load all modules. I believe this will be precisely as "modular" as in dynamically typed languages.

In general, to use a module you need some information about it. In dynamic languages you can get away with (informal) documentation. With static types you need at least signatures to compile your code. But why focus on compilation? In dynamic languages you need unit tests, where with static types there's at least some guarantees given by compiler. Certainly you'll need the whole imported module to run unit tests.

Macneil Shonle said...

Modules require an interface. When the interface is stable [allowing for the implementation to vary], then you have modularity.

That is the Parnas definition of modularity, and it has been elaborated on in depth by Baldwin and Clark.

What you are talking about is closer to Filman's notion of "obliviousness," which isn't about modularity either.

Gilad Bracha said...

Macneil: No. I am pretty much aligned with Baldwin & Clark. I don't really believe in obliviousness. One of the great fallacies of static typing is that it is required in order to have interfaces. It isn't; one of the nice things about Baldwin & Clark is that it discusses modularity in domains very different than programming, where there is no analog of static typing.

Gilad Bracha said...

Eugene:

I might focus on compilation because I don't want to wait for builds, or recompile a lot. But I actually focus on typechecking.

You need unit tests regardless of typing, precisely because the type system's guarantees are far too weak.

But you most interesting suggestion is to infer types for external dependencies and test them at link time. Check against what? A nominal type that may still have to be replicated? Or an inferred structural type that leaks implementation data? Either way, my point stands.

Also, link time is often run time, in which case it it is dynamic typechecking. It also implies a significant degree of structural checking which has other issues.
But these are separate from the main argument.

akocademir said...

I agree with your point but think it's more of a nuisance than a major obstacle. Having a modular design seems more important to me. Overall I find the benefit of making things more explicit outweighs its slight disadvantages.

Macneil Shonle said...

Indeed, Parnas spoke about modularity with regards to assembly programs. It is a sad fallacy that people believe modularity is some feature that a programming language has, rather than a feature of a program's design.

Gilad Bracha said...

Macneil:. Most languages make it unnecessarily difficult to build truly modular systems. Support for modular software design and construction is an important feature of a programming language.

John Cowan said...

Consider what would happen if you changed all talk of types in your post to talk of names instead. We would have to say that names are anti-modular, because when module A invokes a function (variable, method, class) in module B, it will have to be rewritten (not merely recompiled) if module B changes any of its exported names. At best, we may have an "import x.a as b" declaration which renames on import, but most languages don't even provide that.

The reason we don't talk of the anti-modularity of names is that the whole distinction between separate and independent compilation is founded on the behavior that linkers have provided over the past 50 years: that is, they unify definitions and references of names in different modules, but nothing else. If linkers didn't exist, there would be no separate compilation. If linkers understood types, we wouldn't see types as anti-modular.

Gilad Bracha said...

John: Maybe you're saying that execution cannot be modular, because it requires connecting all the parts, and so, to some degree does typechecking. So expecting it to be done modularly is unreasonable/unfair.

Alex said...

It often seems to me that the types of static type-checking dependencies you talk about are often a performance optimization.

Consider a C++ class versus a Java class that implements some interface. C++ requires a compile-time dependency on the order of compilation because it wants to know the exact memory offset to a member function definition so it can call without needing runtime lookup. This also allows inlining where desired. On the other end of the spectrum, any class that satisfies an interface in Java can be called at runtime, because all methods are virtual. This is less performant, but this problem can be lessened by a good JIT compiler.

Programming languages are getting much better at this. Correct me if I'm wrong, but I think .NET lets you swap out one assembly with another one satisfying the same interface without recompiling. Coming from the other direction, dynamic languages like python are getting some interesting performance gains from JIT compilation, and you can certainly swap out implementations, though there the problem is simpler since there is really only one user-defined type: a dynamic object.

Unknown said...

I fail to see how inferred structural types is any different from no types at all wrt modularity.

The only difference I can see is when integration errors are discovered, link-time or run-time.

Gilad Bracha said...

John: w/structural types you either replicate the declaration across modules or infer them, leaking information. Either way, you have a modularity problem. Quite apart from their other issues that have come up in this thread.

Sandro Magi said...

Gilad, I just forgot the http. Corrected link.

As for leaking information, it leaks only as much information as needed to implement the interface required of the parameters. This information is all latent in a dynamic program anyway, so I'm not sure what your objection on this basis would be.

Gilad Bracha said...

Sandro: I think this may work but you are giving up on a certain degree of typechecking. Let me try and explain what I mean inside this little blogger box.

Ideally, even dynamically typed code has a documented interface. Otherwise, you find the programmer of client code acting as type inferencer - looking at the implementation and deriving a(n approximate) structural type. Of course, with out a type system there is no checking of the validity of this documentation (but there never is full validation, as there are semantic constraints not covered by the types).

You are suggesting one can infer an interface locally, and check its use at link time. However, what we check is the inferred interface, that varies with implementation. I believe you still need a documented interface that is stable - without a stable interface there is no modularity. We never typecheck the use or validity of that documented interface.

So yes, you get a type system as modular as a dynamic one by giving up on typechecking the stable cross module interface. That lack of typechecking keeps you modular as it were.

It is an interesting point in the design space; as a practical matter, it may be quite worthwhile.

Gilad Bracha said...

One more thought: What if one combines the idea of typechecking modules in isolation based on infering their dependencies, with declaring (only once!) intentional, nominal types in modules and crosschecking these against the inferred types at link time.

I don't know of any such system, but that would be modular. Maybe we'll try that when we build the typechecker for Newspeak.

Sean McDirmid said...
This comment has been removed by the author.
Sean McDirmid said...

In reply to Gilad's reply to John Cowan, Execution modularity of course exists, it is called a thread, process or actor; sometimes enforced by a software architecture of some sorts, and never really by a type system.

The importance of a type system, these days, comes mostly from tooling rather than promoting modularity or enforcing encapsulation boundaries: nice auto-complete doesn't work well without it. I'm pushing this a bit further in a new language designed for tablets, where typing (the keyboard kind) is inconvenient and you would rather present the programmer a nice menu of contextually relevant choices where context is represented by static types. Now such a type system needs to be more descriptive (the ability to express context) than prescriptive (the ability to soundly detect bad program constructions). It should direct the programmer toward meaningful compositions (via the menu system) perhaps by encoding fuzzy/statistical relationships.

Gilad Bracha said...

Hi Sean: There's nothing modular about a thread. A process or actor are a different matter. Anyway, this post is not about types in general - just about the specific tension with modularity. As you are no doubt painfully aware, I can expound on types a great deal more :-)

Robin Hoode said...

Alas, the discipline of type theory is an old and, hence, unwavering one. Perhaps one day your detractors will understand, Gilad, how much more useful modules are than to types. In the mean time, I anxiously await your new book "Introduction to module theory".

Sean McDirmid said...

A thread encapsulates a stream of executing instructions even if it doesn't encapsulate their address space like a process does. We talk about composing (coordinating threads) via certain concurrency primitives, which pretty much constitutes gluing modules together. Threads are not very good execution modules, but that is beside the point.

Most programming constructs/artifacts have some modular aspect to them: DLLs, source code files, projects, packages, namespaces, objects, classes, functions, values, predicates, design documents, CPU and GPU cores, processes, ...to name a few. The fact that the term "module" is used to describe any of these without qualification leads to lots of confusion.

James Iry said...

Actually, Gilad and Sean, things like linear types, region types, effect types, etc can be used to prove isolation properties of threads. Which I think again goes to saying that "types are anti-modular" is either too broad a statement or too narrowly focused on one aspect of the modularity of simple types.

Gilad Bracha said...

James: Of course, a short phrase like this is a broad brush.

Types let us state theorems - and my point is really that proving these theorems (via type checking) requires knowledge that goes outside a single module. One can restate some of this knowledge within the module - but then this knowledge is being stated multiple times.

In some cases you could use types to help build a modular system - but the types themselves tend to be non-modular in every real system I know.

But as I noted above - there may be a way around this, though whether it is practical or not remains to be seen.

There is a student who is supposed to look into this in the near future. The discussion has been useful to me (if not to my type-theory loving friends).

John Cowan said...

Let me mention three concrete cases of implementations that provide modularity and separate compilation where there is substantial compile-time information that dependent modules need to know. In GNU Ada and Sun javac, the compile-time information is types: in Chicken Scheme, it's Scheme macros.

GNU Ada requires compile-time and run-time information to be written in separate files, called interface and implementation files. If module A depends on module B, then the source code of both A's and B's interface must be present in order to compile A's implementation, but B's implementation need not even exist. Implementations can be compiled in any order, even when cross-dependencies are present, and produce conventional object modules. Compiling an interface just checks the syntax and the static semantics, producing nothing.

In Java, compile-time and run-time information is commingled in a single source file. In Sun javac (but not, say, gjc), if module A depends on module B, then A cannot be compiled unless B has already been compiled or is to be compiled in this invocation of javac. (Earlier versions of javac would compile B behind the scenes if B's object code was out of date or nonexistent.) Compiling produces classfiles containing information normally not present in object files. If A and B are cross-dependent, they must be jointly compiled by a single invocation of javac.

Chicken Scheme has two compilation modes. One produces a non-conventional object representation called an import library that contains just the macros in the source. The other produces conventional object files (by way of generated C and gcc). If A depends on B, then an object file cannot be produced for A unless B's import library is present.

Which of these, if any, count as "independent compilation"?

Gilad Bracha said...

John: What you describe for Ada sounds alot like good old Modula. This doesn't cut it because you cannot compile a module without the requisite interface modules. As for javac - I know it only too well, and of course does not provide independent compilation. Not familair with Chicken Scheme, but it doesn't sound like a counter example.

Dave Herman said...

Mitch Wand likes to talk about the "conspiracy theory" of modules: modules need to "conspire" to get work done, even though they're defined separately. This means modules are defined with invariants, contracts, etc. These are essentially the assumptions a module author makes about the behavior of the other modules it will interact with. These assumptions may not always be stated, but they're still there.

In a statically typed language, some of these invariants can be expressed and checked without running the code. In order for the invariants to be checked, the tool needs to know what they are. With inference they can be figured out automatically, but you felt that this leaks implementation details (I don't know exactly what you meant, but I do agree that inference incurs costs on language design). Otherwise you have to write them down somewhere, this is true.

But is the same not true of a dynamically typed language? If you need to document invariants or contracts which modules need to agree on, these have to be written down somewhere. So generally this shows up in the API documentation. You just don't have a tool that checks that the code and documentation are consistent with one another.

Of course, you can always document things that can't be expressed in an automated type system, and so the same cost may remain in a static system. But there's no reason that couldn't be done literate-programming style, where the API docs are written in the same place where the type definitions are written.

I don't mean any of this as an argument about static vs. dynamic typing; I love dynamically typed languages and use them often. In particular, I think there are times where it's not as valuable to state, document or check any invariants about code. In those situations, maybe I find your argument a little more compelling.

John Zabroski said...

Gilad,

Thank you for writing your thoughts on this hot button.

Awhile back on Lambda the Ultimate, Rich Hickey was talking about the Expression Problem coined by Phil Wadler, and Martin Odersky chimed in to say that Rich misunderstood Phil's original note on the Expression Problem, and that Phil's challenge only applied to statically typed languages. I rebutted by referencing a Scheme paper on the Extensibility Problem, suggesting the Extensibility Problem was simply the same problem by another name. Martin did not like this suggestion, and left it as "proof by authority" that his take was right and mine was wrong.

Since I am merely a practitioner and not a researcher like you, can I get your take? It wasn't just me and Rich on LtU that were surprised by Martin's comments - many seemed to have interpreted Phil's challenge as something that applies to any language. It seems to me if "Types Are Anti-Modular", then Phil's Expression Problem is a contradiction in terms. At the very least, Phil's wording "without recompiling code, and while retaining static type safety (e.g., no casts)." seems to not fit your criteria for neither separate nor independent compilation.

In general, I agree types are anti-modular. The particular modularity I was struck by was that a global type system over a distributed system implies there are no "partition barriers" in the system. This automatically limits the design options of a distributed system by disallowing partition barriers. Put another way, if there are three axes for designing scalable distributed systems, then a global static type system only allows one axis. The only way to potentially get it back would be to encode the lost information into the type system, which produces a more complicated theory of distributed systems. The more complicated the theory, the more susceptible the proof to erroneous reasoning.

A particular example would be any non-deterministic bootstrap process, such as a self-organizing network.

On the other hand, it should be possible to use *type erasure* to get some of the benefits of type systems while not distributing the types. What are your thoughts on this?

John Zabroski said...

Sandro,

That "Type Inference for first-class messaging" paper is very good, but it does not say anything about modeling the storage for these types.

A lot of mathematical formalisms have these sort of practical issues. For example, denotational semantics as outlined by McCarthy and Strachey did not address parameterizing storage models, and so a semantics would only work for a particular storage model. If you wanted to change the physical representation of your system, that required re-doing a bunch of work to explain what a cache is and how the cache behaves and what that means for the messages in the system.

John Zabroski said...

Sean and Gilad,

The difference between modern OS threads and actors is that an actor keeps state together with control flow. Oddly enough, database systems have a phrase to describe this kind of modularity: "Shared Nothing" Architecture.

migmit said...

So... it seems that you're equating "modularity" with "independent compilation". Doubtful as it is, another question is more interesting: are there any practical examples of "independent compilation"? Seems like the only choice we have is between "non-independent compilation with types" and "no types and no real compilation at all".

migmit said...
This comment has been removed by the author.
Sean McDirmid said...

migmit,

You can parameterize the types in your module to obtain real independent compilation. Jiazzi was one of my old research projects that did this for Java. To make that scalable is challenging, but not impossible.

However, it does not seem to add much value, types aren't the real problem with separate compilation, or separate compilation is not a real problem (C# chugs away just fine with no real support for separate compilation).

migmit said...

> You can parameterize the types in your module to obtain real independent compilation.

Ehm, really?

But shouldn't the type parameters at least implement some specific interface?

And interfaces are types, which brings us to square one.

Sean McDirmid said...

migmit,

It is not strictly true that you need a global namespace somewhere to boot strap a module system. Modules can always directly describe the shapes of their imports and exports, while module linking can occur via tangible reference (basically, you refer to the module by its complete definition). See the OOPSLA '01 Jiazzi paper for a discussion on that.

For usability and compatibility reasons, you do want a global namespace somewhere; i.e., modules are referred to by name in a global namespace while they share signatures to economize shape description. Global namespaces are also very convenient for helping people to find and reuse things (but make the namespace truly global, as in the web).

Gilad Bracha said...

John,

Your comments are most apropos.

My understanding of the expression problem as Phil formulated it is that it is problem of typed languages as Martin said. I find that quite in line with what I have stated in this post.

I'm not sure I see how erasure helps here - we still have to specify the types which has the same problems of redundancy/information leakage/external dependency.

What might bepractical in my view may be, as I indicated, to type check locally based on inferred dependencies and then check intention when linking globally.

migmit said...

> It is not strictly true that you need a global namespace somewhere to boot strap a module system.

I'm kinda confused. When did I mention namespaces?

Read Jiazzi paper; not impressed. Seems a lot like C headers.

John Zabroski said...

Gilad,

I am not sure what you mean by checking intention when linking globally, but my best guess based on intuition is that you would need to have an inconsistency tolerant linker.

I am not saying this as an ad-hominem attack on your idea, e.g. "Is inconsistency tolerance really a path you want to go down?". I have just thought about this sort of stuff for a very long time, and focused on my personal dream of building a computer network that can function at the scale of a hojillion nodes. When I think about global design trade-offs, I am literally wondering about things like propagation delay when the speed of light is your bottleneck. What does the structure of an efficient software system look like at that scale?

Postscript:
Modularity is a complex topic. Surveying Hacker News, Reddit and Lambda the Ultimate meta-discussion about this blog post reveals many different viewpoints on the topic. I could add several more. For example, algebraic semiotics of Joe Goguen and his classical example of a "modular" description of a sign post that says "Stop".

I think there are enough "good" theories. What needs to be outlined are the trade-offs, and those trade-offs then need to be used to judge how we can automate human social processes in niche human organizations - like your pet peeve of static build definition for the Organization of Software Engineers.

Jon Rimmer said...

I don't have the theoretical background to contribute much to this discussion, but it seems to me that saying types are a countervailing force to modularity is like saying the land is a countervailing force to the sea. There's a certain attraction to it on a surface level, but on a deeper level it misrepresents the relationship between the two as one that is somehow antagonistic, when in fact it is one of existential interdependence.

A module is a sort of a partial enclosure. And the 'partial' aspect, implying a degree of flow or exchange between modules, is just as important to the concept as that of enclosure. Modularity is not synonymous with independence, but rather a conscious and managed dependence. In situations where they are used therefore, it can be said that types are pro-modular, as they facilitate the managed dependence that is essential for modularity to even exist.

Discussions of static-vs-dynamic typing, independent compilation, separate interfaces, etc., are all very important. But in discussing the practical details of these approaches, we shouldn't lose sight of modularity being both independence and dependence (sort of a yin-yang thing, if you will). With this in mind, it seems unfair to denigrate typing as a whole, and will just lead us down an incorrect path of viewing greater independence as more perfect modularity, when it is not so.

Sean McDirmid said...

migmit,

Units are more like ML functors that support recursive assembly. Or, if you prefer the C header analogy, they are parameterized, statically type checked, and support true separate compilation.

Research on separate compilation is pretty flushed out, its not a problem anymore. However, not much research on modular compilation has been adopted by industry and new languages continue to lack it. To me, this means that separate compilation wasn't a real problem needing to be solved.

On closer reading, Gilad presents independent compilation as a new concept distinct from separate compilation, where you compile something without even knowing what the signature are of what you are depending on. Of course, that is impossible to do in a statically typed language: you need the context of your unit to be encoded in order to compile it.

migmit said...

> where you compile something without even knowing what the signature are of what you are depending on. Of course, that is impossible to do in a statically typed language

Exactly. And my two statements are: 1) it's impossible in any type-safe language ("independent compilation" is impossible in statically typed languages, "compilation" is, in fact, impossible in dynamically typed ones), and 2) it has absolutely nothing to do with modularity, the difference being - forgive me for this analogy - same as between freedom and anarchy.

Sandro Magi said...

It's not impossible, you simply need type inference to lift the dependencies inferred from the implementation to the module boundary. See the link I provided earlier on first-class messages. There will obviously be some limitations, and Gilad is concerned this will leak implementation details, but it doesn't seem like this has really been tried before, so it's not a foregone conclusion I'd say.

Faré said...

Not only does module composition consume information about the code inside the modules, the entire NATURE and PURPOSE of module composition is to consume information about the code inside the modules. Therefore, any such information is "anti-modular". It's a feature, not a bug!

The Observer said...

To be completely modular, a program must take in arbitrary input and put out arbitrary output, with no regard for what either the input or output is. The problem is that such a program can't do anything useful. In order to preform useful operations, one must have some knowledge of what the input and output will be. Type systems simply formalize this notion and make its declaration explicit so that it can be checked by a compiler. This relieves the burden of having to check and debug types manually, but adds the burden of having to structure your code properly and provide extra information before compilation. Does this restrict the freedom of the programmer? Yes, but it does so in order to limit errors. It's a tradeoff.

Uncompetative said...

I was under the impression that Barbara Liskov's CLU had modules with separate compilation. I have a copy of the CLU Reference manual which seems to indicate that this is supported on page 17.

Digging around online throws up this .pdf version:

http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-225.pdf

I've really no idea whether this is THE solution, but I have long favoured its approach (especially its use of type-orientation rather than object orientation).

David V. McKay said...

Broad, negatively flavored expressions such as "[x] is Anti-[y]" or "[x] Considered Harmful" make excellent headlines - because they are inflammatory and command attention; but only just long enough for readers to be convinced the headline is incorrect.

I will give the author the benefit of the doubt in assuming he was not attempting to broadly dismiss the utility of Static Type Systems in constructing practical, maintainable software solutions with useful degrees of modularity. Rather, I will assume this is an academic argument that Static Type Systems strongly inhibit our ability to construct units of software that can inter-operate at runtime without common references to the same type definitions. The author uses this assertion to advocate Dynamic Type Systems, petitio principii, are more useful due to their late-bound composability. ...Because this is "more modular" and because "more modular is better" because "it is more modular."

Now I will argue that dependent compilation is in fact a benefit of Static Type Systems, because it limits the possible behaviours of the system to strictly those behaviours that can be predicted, analysed, and verified at design-time -- automatically, and with ample opportunity to correct the defects before affecting users in production. By counterpoint, the late-bound composability of dynamic type systems cannot make such guarantees via automatic verification; to ensure these systems do not fail in deployment, the engineers must manually devise a comprehensive suite of unit tests - but to achieve full comprehensiveness, the programmers must either have access to all modules that might be integrated at runtime, or must make strong assumptions about the competence of other programmers, in that third party modules must conform to the documented interfaces. This entails documentation that is clear, up to date, and is furthermore correctly understood and implemented by the third party.

In short, Dynamic Type Systems achieve this "freer composability" at the expense of shifting the onus of verification to humans. Regardless, the total information required to construct and compose all modules is equivalent whether the composition be late-bound or not.


I appreciate the reflection of these earlier posts:
Dave Herman 6/11/11 4:21 PM
Jon Rimmer 6/25/11 6:13 AM
The Observer 7/06/11 7:51 AM

shelby said...

For independent compilation (and design) of typed interfaces, modules can declare their interfaces independently, then a glue module coerces their interfaces to interopt.

Imagine a client that integrates a test server. The interface of the test server is the external interface of the client.

Or in other words, the boundary between intra-module modularity and inter-module modularity is arbitrary and thus doesn't exist.

All type systems leak implementation to the extent that the semantic invariants are not checked by the compiler. Checking more invariants (stronger typing) is an attempt to leak less implementation, to catch more errors at compile-time.

P.S. the failure of an arbitrary barrier to the free market is expected by Coase's Theorem. Coase's Theorem is a special case of 2nd law of thermo, and the 2nd law states the the universe trends to maximum independent (more precisely orthogonal) actors (i.e. disorder/entropy).

Philip Wadler said...

Gilad,

Modulo mottoes, everything you say is true.

The same word may carry many meanings, so let's clarify terminology:

S-modular: supports separate compilation. Every module has an interface and a source. Compiling a module requires access to the interface, but not the source, of every module the compiled module depends upon.

I-modular: supports independent compilation. Compiling a module requires access only to the source of the module being compiled.

Your motto becomes 'Types are anti-I-modular', which is unarguable.

Implicit in your argument is the claim that I-modularity is to be valued over S-modularity. I would claim that the additional checking offered by S-modularity is more important to separation of concerns than the independence offered by I-modularity. What is your argument against my claim?

(Apologies if you answered this point in the comments thread---it was too lengthy for me to read all of it.)

Yours, -- P

Lex Spoon said...

The point is about S-modularity, and thus more troubling.

In a large program, in a language with user-defined types, the types used in an interface are themselves defined in other interfaces. To compile a module, you need not just the interfaces of all the modules it depends on, but also all the interfaces of all the types that interface uses, transitively.

Thus, (nominal) types add a significant number of extra interfaces the compiler has to look at in order to compile one module.

Gilad Bracha said...

Hi Phil,

Thank you for your comment, which is very much to the point. The high bit is that there are not only benefits, but also costs associated with types and typechecking.
The absence of I-Modularity is one of them.

Depending on ones goals, context etc. such costs may be considered very worthwhile or not. I value I-modularity, full reflection, and freedom to express my ideas in code more than the benefits of typechecking. There isn't much in the way of science to measure this, so the debate rages on religiously, colored by personal experience.

In practice, I continue to advocate a pragmatic engineering approach where typechecking is one of many possible static analyses (pluggable types) and not a basic requirement every a legal program must fullfill. I believe it gives a better engineering tradeoff

But you knew all that. And I agree, the thread is too long.

David V. McKay said...

"I value I-modularity, full reflection, and freedom to express my ideas in code more than the benefits of typechecking."

That's crazy! Just because I-modularity allows modules to be *compiled* without verification against one another doesn't mean you are free to express ideas more freely: any data that can be passed from module to module requires that both modules understand the structure of the data - whether that structure is defined (verifiably) in a third module or is merely defined (unreliably) in the heads of the programmers, there is still a dependency that must be respected to achieve a correct program. Not even full reflection can properly recover the *meaning* of that exchanged data without either a programmer or a type somehow embedding the functional constraints of that data - again, forcing at least some kind of dependency.

The "free composability" or "free expression of ideas" supposedly afforded by I-modularity is an illusion!

John Zabroski said...

Phil,

Thanks for the clarification! Now your original note on the Expression Problem is much clearer, and explains the confusion many (including me) had!

The Expression Problem deals with the fact that I-modular is impossible in typed languages, so S-modular is the best we can do when extending both functions and/or data.

I buy that.

This then sets up a delightful question: What about I-modular for untyped languages and extending both functions and/or data? If Gilad's approach with many forms of static analysis is a good one, then this question should help explain why it is a good one.

Gilad Bracha said...

John,

Good point. The expression problem is really a type problem. It isn't nearly as acute if you aren't worried about the typechecking. I guess I should write a post about that sometime. But I have to get on a plane soon - so don't expect prompt follow up :-)

Faré said...

Even without types, syntactic extensions (such as macros or readtables in Lisp, a dynamically-typed language) constitute essential meta-level information about modules that require this pulling in of all transitive dependencies to compile a given module.

Static I-modularity is only a special case of S-modularity where your language is too poor to express rich meta-level information.

Note that if you're interested in best effort for the common case, you can pre-compile with incorrect information about the meta-level context, then recompile the parts where you got it wrong when the hypotheses are invalidated. Or why not be completely lazy at compiling? A JIT is the ultimate provider of dynamic I-modularity. No need to statically compile at all!