Pages

Wednesday, September 13, 2006

Disadvantages of user-extensible languages

A lot of languages are going the way of giving as much user extensibility as possible. Lisp is the obvious forerunner: With it's flexible macro facility, it's possible to create nearly every imaginable extension to the language. Look for example at the loop facility in CommonLisp: It's a 'language inside a language' (sometimes called 'DSL' - domain specific language). In Lisp the 'price' for this extensibility are 's-exprs', a very simple syntax which is in fact like entering a program as a syntax-tree with lot's of parenthesizes.

Newer languages tried to have comparable extensibility and also a rich syntax (for example Nemerle), others tried other ways without using macros. There are lot's of ways to give some extensibility, for example Smalltalk used 'blocks' (a kind of closure with additional non-local-return) to create all well known control-constructs (like if-then-else, while-loops etc.), others use extensive operator definition and overloading (like Haskell) or use reflection to provide some degree of user extensibility (like Java).

On the first sight, user extensibility seems like a good idea. But like everything, it's a two edged sword and I'll try to show some of the disadvantages here.

Of course there also advantages and it's always a question of the domain you're using a language for: If you use a language in a big team you have other requirements then if your working alone or in a small group (up to 3 people). If you write some small 'one-time-use'-tool, you also have other requirements then writing a application which has be used and maintained for years. And if you're doing prototyping or experimenting with new language features and abstractions you also have other requirements then if you're working on yet another business application.

So what's the big disadvantages now?

  • Language design is hard, so writing extensions to an existing language is hard too.

    Many people think that the main difficulty of designing languages is building a syntax. In fact that's the easiest part. The main problem is to get a set of language features which aren't conflicting, which all solve different problems (no overlap) and which are easy to use and understand. With language extension, creating the syntax is easy, but integrating the new language features into a common framework is as difficult as usual. So don't expect that an extensible language makes language design more easy. But of course it's easy to create bad or useless extensions. In a 'from scratch' language where you have to build a whole compiler first, this would lead to a certain selection of programmers who are able to do it. But with an easy extensible language, this kind of selection won't happen and thus most extensions would be poorly thought out, useless and even detrimental to the language.


  • Language-extensions can (and will) conflict if you have to use 3rd party code

    If a language is designed by a communication team of programmers (or even a single programmer) conflicts won't happen easily, because the designers have a complete overview over the language. But with user extension this overview is missing, because extensions will created independently. This could easily lead to conflicts if you want to use 3rd party software in your code which uses similar but slightly different language extensions as you do. Even if the language provides a certain 'isolation' feature to prevent conflicts you still have the problem that two parts of code uses similar looking but semantically different language extensions.


  • Extensible languages provides the language designers with an excuse to 'under-design' them

    As a application programmer you want to have a complete language, not a 'language construction kit'. But with extensible languages it's tempting to the language designer to leave the language 'simple on purpose' to allow the users to create the extensions they want. But creating those extensions is still hard, especially if they are created by independent programmers. So a incomplete language is of no use for an application developer and it's more likely that the language is really of a piece.

  • Language-extension creates another 'thinking-level': A programmer always have to consider writing a language extension to solving a certain problem

    If a languages supports language extensions, it's always very tempting to write a new abstraction to solve a problem. While this is (in theory) a good idea, in fact it often leads to increased development times and less maintainable code. Why? Because creating new abstractions is much harder than just solve a certain special problem. Many programmers underestimate the difficulties and try it nonetheless - with often really bad results. If your abstraction isn't well enough thought out, it's not reusable and it's harder to understand than just a single solution to a problem. Sure, sometimes it really works fine, but often you know it only in the end and when it's to late. So while this is something somebody with lots of time can try, if can be a death blow to a project with time constraints.

  • Language extensions have to be learned like 'build-in'-features of the language. So having much extensions makes a language hard to learn. And language extensions have to be remembered or learned if you have to maintain code.

    Sure, you also have to learn normal libs and frameworks. And we all know that learning a framework can be quite hard. But if language extensions come into play it's even getting harder because real language extensions have possibilities to change the semantics of otherwise well known constructs. This can obfuscate the real meaning of an operation and leads to harder to read code. And if you read code you always have to consider the active language extensions and 'parse' them in your head too. Also to use an extension you have to learn it first. Just take a look a CommonLisp's loop-facility: While it's relatively easy to read and understand it's quite hard to learn all it's possibilities and features. If, on the other hand you have some code which does the same as a language extension, this code is often directly readable, even if it's maybe a bit longer.

    With build in language features this isn't a problem, because those features are limited and have to be learned once (or maybe some more later if the language gets some official extensions), but with a user extensible language the number of extensions is unbounded and can increase quite rapidly if multiple teams use their own extensions.

If you're a 'lone wolf'-programmer all those disadvantages don't apply to you. But if you have to work in (maybe big) teams or have to use lots of 3rd party code, you will get some of those problems if you use an extensible language. I suspect (but can not prove) that the rigidity of the Java language is the prime reason why there is so much 3rd party code right now - sure, the code looks sometimes ugly but on the other hand Java really leads the way in how to do something which leads to better fitting software pieces. But with better extensibility this effect would fade because then there would be more ways to solve a certain problem and all those solutions won't fit together.

And what's the way out now? Do we really have to live with those boring, inflexible languages for ever (at least for 'productive languages'. Languages for prototyping, academic purposes etc are a different breed)?

I think the way is to provide languages with as much as is needed to solve the problems the language is designed to. Make the language 'as rich as necessary'. Ada is a example which shows the advances of this approach. Also Java has it's parts where this approach is clearly visible - and quite well working. But of course a language designed with this principle in mind has to be updated more often to satisfy the needs of the programmers. The difference is that those updates and extensions are well thought out and they are can created with the whole language in mind by people who now their job. This leads more probably to sensible extensions, even if it depends on the language designer or the community which way they want to go.