Core vs Library
written by Walter Bright
April 13, 2008
When thinking about the design of a programming language, the inevitable question arises: what types should be part of the core language, and which should be put in a library? One would think that ideally the language would be so powerful that all types can be specified in the library. But I’m not so sure about that.
Let’s list the advantages of a type being a core language feature:
- Customized syntactic forms
- Type’s primitive operations may not be expressible in the language
- Gains important optimizations based on the compiler ‘understanding’ the type
- Consistent, reliable behavior
- Better error messages
And the advantages of a type being a library feature:
- It can be developed independently of the compiler
- It simplifies the development of the compiler
- The user can customize it
- It can be added on later without having to modify the compiler
- The implementation of it can be easily inspectable by the user
- It can drive improvements to the language to better support user defined types
- There can be vastly more library types than could ever be possible in the core language
For fun, let’s imagine that the integer type should be implemented as a library feature. What are the consequences of this? First off, we won’t have any integer literals, so we’ll need a function to create them. To create an integer literal for the number 567:
Int("567")
(Of course, in a way, such syntax only transfers the problem of core vs. library from integers to strings.)
The language may support user definable tokens, but there haven’t been any successful production quality languages that do this. Next, there may not be any arithmetic operators. If the language does not allow the user definition of infix operators, you’ll be reduced to things like representing the expression x+5*y as:
Add(x, Mul(Int("5"), y))
It’s starting to look rather unpleasant. The Add() and Mul() functions would also have to be written in a foreign language, like assembler or C, to get any reasonable performance.
What more do we give up with integers as library types? We toss out the window pretty much all the optimizations the compiler can do on integers. After all, the compiler knows a lot about integers and arithmetic (at least the programmer writing the compiler did). A typical compiler will optimize things like:
5+1 => 6 x*2 => x<<1 (x+2)+4 => x+6 (x+2)+foo() => (foo()+x)+2
(This last one is computable in fewer registers.)
There are a lot of those patterns. Then there are loop induction variable optimizations, where integer indices are replaced with pointers. In the code generator, there is a lot of effort expended to efficiently map integers onto machine operations and registers, for example:
a = x / 10 b = x % 10
can be done with one divide instruction rather than two.
It’s very hard to see how this could be pushed into a library type. Even if it could be done, it makes the library as or more complicated than the compiler, making it hard to see the win. I don’t know of any usable language that doesn’t make integers a core language type, with the notable exception of bash.
On the other hand, there’s the complex data type that consists of two floating point values — a real and imaginary part. Complex made its debut with FORTRAN, it was added to C99, and is even in the D programming language. But advancing compiler technology has whittled away its core advantages one by one, and it’s getting increasingly hard to justify it as a core type.
For example, compilers have traditionally treated a struct as a monolithic block of data. For a user defined complex type, this means it doesn’t get put in registers. Newer compilers will look inside the struct, and see if it can be put in registers, register pairs, or the CPU floating point registers. Not only will the user defined complex type then work efficiently, but other user defined structs of paired values will benefit equally.
In a future column I’ll examine strings, arrays, and associative arrays as core types and see how they stack up.
Acknowledgements
Thanks to Andrei Alexandrescu and David Held for their valuable contributions to this article.