User Defined Literals in the D Programming Language
April 6, 2011
written by Walter Bright
Programming languages define many kinds of literals. The most common ones are:
- string literals, like "hello"
- integer literals, like 1234
- floating point literals, like 1.06e+3
- character literals, like 'a'
The C programming language adds some more, like:
- hexadecimal integers, like 0xDEADBEEF
- octal integers, like 0677
While programming languages allow the user to define his own types, functions, and variable names, it's pretty rare to allow defining one's own literals. It sure would be nice to be able to do so to go with defining types.
Why not? One answer comes from how programming language compilers are designed. The compiler operates as a series of passes:
- lexing
- parsing
- semantic analysis
- optimization
- code generation
Literals are recognized in the lexing phase, while user defined things are recognized in the semantic analysis phase. Having the semantic phase feed back into the lexing phase tends to make for a mess of both the language and the compilers for it. Most language designers eschew doing that with the fervor of an english professor reviewing one of my essays.
But, darn it, it sure would be nice to have user defined literals. Hmmmm.
Let's harken back to C's octal integers, i.e. things like 0677. The leading zero makes it octal rather than decimal. Who the heck uses octal, and why is it in C? It turns out that many old machines (before the dawn of man) were programmed in octal rather than hexadecimal. The rise of the microcomputer pretty much killed off octal in favor of hexadecimal. A vestige of octal remains in the file permissions on Posix operating systems.
It's pretty much all that's left of octal.
It's rare enough that having the leading 0 meaning octal often comes as a nasty surprise to modern programmers. Hence there's pressure to remove those literals. The D programming language certainly feels that pressure.[1]
But, I like octal notation. I have a soft spot for it, it feels nice and comfortable. It's like a favorite shirt that unfortunately has too many holes in it to wear in public anymore, and frankly needs to go in the rag bin to wipe oil up from my leaky hotrod. I still like octal, though, and the thought of writing Linux file permissions
creat("filename", 0677);as
creat("filename", ((6<<6)|(7<<3)|7));
leaves me cold.
What can we do about it?
Let's start with a D function to turn an octal string into a number:
auto octal(string s) { uint result = 0; foreach (octalDigit; s) { enforce(octalDigit >= '0' && octalDigit <= '7' && result < (1u << 29)); result = (result << 3) | (octalDigit - '0'); } return result; }
(The enforce is error checking for valid octal digits and overflows.) We can then write file permissions as:
creat("filename", octal("677"));
But because the octal value is computed at runtime rather than compile time, this just irks me like a bug in my soup. D has a perfectly marvy feature where functions can be executed at compile time rather than run time. Let's see if this can be pressed into service.
We could try:
enum mode = octal("677"); creat("filename", mode);
and that'll work at compile time. (D enums are manifest constants.) But of course that is hardly a workable user defined literal.
Another way to force a function to be run at compile time is to wrap it in a template,
auto octalImpl(string s) { ... same implementation as above ... } template octal(string s) { enum octal = octalImpl(s); }
D templates can use the ‘eponymous name trick’ where if there is only one member of the template and it matches the name of the template, the template gets replaced by its member.
It is then used like:
creat("filename", octal!"677");
(Templates with only one argument can be called with the name!arg syntax.) This is not looking half bad. But we can make it even better:
creat("filename", octal!677);
Wait, what? Isn't 677 a decimal literal? Yes. The trick is to overload the octal template to take an integer literal, then take the number apart digit by digit and rebuild it as octal:
auto octalImpl(uint i) { uint result = 0; int n; while (i) { auto octalDigit = i % 10; i /= 10; enforce(octalDigit < 8 && result < (1u << 29)); result |= octalDigit << n; n += 3; } return result; } template octal(uint i) { enum octal = octalImpl(i); }
This all happens at compile time, which can be verified by looking at the output for
int main() { creat("filename", octal!677); return 0; }
which is:
__Dmain: push 01BFh // octal 677 in hexadecimal mov EAX,offset FLAT:_DATA push EAX call near ptr _creat xor EAX,EAX add ESP,8 ret
Here is the complete code conceived and implemented by Adam D. Ruppe.
The implementation is a fair bit more involved than above, but for good reason; the idea stays the same. The complete library implementation detects and minds the usual integral suffixes and automatically switches to 64-bit representation when the input string is too large — just as you'd expect from a well-behaved literal. In fact, the code is not unlike the code handling C-style octal literals inside the compiler. That this can all be done in ‘user space’ is, I think, quite remarkable.
Conclusion
While this isn't technically a user defined literal, it came surprisingly close to one: flexible notation, compile-time evaluation — all with user-defined code, not code hardwired in the compiler.
The key to user-defined literals is compile-time evaluation of complex code (in this case, code that computes octal values from decimal values or strings). Putting ‘octal’ in the standard library brings progress — it allows us to gracefully remove an obsolete and troublesome feature like octal literals from the language, and opens the door to all sorts of user defined literals customized for user defined types. The feature is compelling enough that we have recently decided to effectively phase out built-in C-style octal literals from the D reference compiler [2].
References
Acknowledgements
Thanks to Andrei Alexandrescu, David Held, Eric Niebler, and Brad Roberts for reviewing a draft of this.