Purity

Sep 21, 2008

Pure functions have some interesting and useful properties. A pure function is one that only depends on its parameters, and its only output is its return value. In the D programming language, the requirements specifically are:

Global variables (or any variables that persist outside of the scope of the function) cannot be written to.
Such variables cannot be read from, either, unless they are invariant. (Invariant means immutable.)
Pure functions can only call other pure functions.
Parameters to a pure function can be mutable, but calls cannot be cached or executed asynchronously if the arguments contain any references to mutable data.
A pure function can throw an exception (purity does not imply nothrow).

An example of a pure function in D is:

pure int foo(int x, const int[] a)
{
   int sum = 0;
   foreach (s; a)
       sum += s;    // mutable non-static local variables are allowed
   return x * sum;
}

An impure function:

int bar(int x, int[] a)
{  static int sum = 0;
   foreach (s; a)   // reference to mutable array contents
       sum += s;    // read/write to mutable persistent variable
   return x * sum;  // read of mutable persistent variable
}

These properties can be checked at compile time if the function is typed as being pure. A common question is that if the compiler can check to see if a function is pure, why annotate the function? Why not just let the compiler check for purity, and if so, then accrue the advantages of purity? The answer is that purity is a property the programmer needs to know about. If purity was inferred rather than specified, a seemingly innocuous change in one function can make it impure, and every function that calls it impure, etc. There will be no reasonable way for the programmer who expects a function to be pure, to see if it actually is pure. If he does check (which might be arbitrarilly difficult to do) and it is not pure, he has no reasonable way to determine why the compiler thought it was impure.

So now that we have pure functions, what are they good for?

The first is self-documentation. A person trying to understand a code base, once they see that a function is pure, they know it only depends on its arguments, has no side effects, and there’s no monkey business going on inside it. This greatly reduces the cognitive load of dealing with it. A big chunk of functions can be marked as pure, and just this benefit alone is enough to justify supporting it.

Pure functions do not require synchronization for use by multiple threads, because their data is all either thread local or immutable.

Common subexpression elimination is an important compiler optimization, and with pure functions this can be extended to cover them, so:

int x = foo(3) + bar[foo(3)];

need only execute foo(3) once.

User level code can take this further by noting that the result of a pure function depends only on the bits passed to it on the stack (as the transitivity of invariants guarantees the constancy of anything they may refer to). Those bits can therefore be used as a key to access memoized results of the function call, rather than calling the function again.

Pure functions can be executed asynchronously. This means that not only can the function be executed lazily, it can also be farmed out to another core (this will become increasingly important as more cores become commonplace).

They can be hot swapped (meaning replaced at runtime), because they do not rely on any global initialization or termination state.

Functional programming capability is an exciting addition to imperative programming languages.

Articles

Purity