A blog about in-depth analysis and understanding of programming, with a focus on C++.

Thursday, January 11, 2007

Functional Programming in C++ 3

I'm skipping part 2, because I'm having some difficulty coming up with a really good example of why you need functional programming. Oh, sometimes it's nice, but I've yet to see a point where it's necessary.

We've actually had a slight introduction to functional programming, or some aspects of it before. The article involving Boost::Bind, for example. At the end, I mention the ability to compose bound functions to create new ones. And Boost::Bind has the ability to convert the parameters of a functor from one to another.

Of course, that's not nearly good enough for real functional programming.

In terms of getting this done in C++, there are several options. Boost (the fount from which all good things flow), of course, has a library called Lambda. This allows you to generate functions of the form:
cout << (1 + _1)

This creates a function object inline. This function object takes one parameter, adds one to it, and prints it to standard out.

However, Boost::Lambda has a number of limitations. Chief among them, having only 3 arguments, which can be quite limiting. Also, it replicates Boost::Bind (and is incompatible with Boost::Bind), but replicates it without the powerful Boost:MemFn functionality, so you can't use member functions in Boost::Lambda.

In short, Boost::Lambda doesn't take things to the limit, doesn't give us all of functional programming. So, how do we do it?

Within Boost lies the greatness that is Spirit. Spirit is a parser framework; you build parsers with it in a style that seems very similar to Extended Backus–Naur Form. But that's not why we're talking about Spirit. For within the Spirit slumbers a beast of great power: Spirit::Phoenix.

Phoenix is functional programming. It is Boost::Bind combined with Boost::Lambda, mixed with stuff that you never expected to see in C++. Behold:

for_each(c.begin(), c.end(),
if_(arg1 % 2 == 1)
cout << arg1 << ' '

Yes. That does exactly what it looks like. Yes, that is an 'if' statement. Yes, that 'if' statement is in a place where no 'if' statement ever belongs. And yes, that entire expression generates a function object that is passed to for_each and does what it says it does.

So, let's take things slow.

Remember back to our discussion of Boost::Bind. It used entities named '_1', '_2', etc to represent the first argument, second argument, etc. Phoenix uses 'arg1', 'arg2', etc. Personally, I prefer Phoenix's style, since it uses something close to a word, and I'm pretty wordy in my naming.

Let's start with a basic example:
cout << arg1;

This statement does not execute 'cout <<'. It merely creates a functional object of one parameter that outputs that parameter.

Note that the functional object created does not know nor care what the type of that object is. As long as the operator << has been overloaded for whatever argument you pass in (and that the overload takes the appropriate type, in this case an output stream object), then you may call this function.

This goes back to our discussion in part 1, where we said that a function 'f(x) = log(x)' could only take positive values of X or 0, because the function log(x) is undefined on negative numbers. This is the same thing.

Note that there is one thing we cannot do, just as we couldn't do with Boost::Bind: catch this function and store it directly. Oh, 'for_each' can do it, but that's only because 'for_each' is templated on the type of that function object. It can store it because it doesn't need to type out what that type is.

The type of 'cout << arg1' is not for the feint of heart. And that doesn't even get into some of the functional programming statement stuff.

Then again:

boost::function<void (int)> integerOutputer = cout << arg1;

Maybe we can ;)

Thanks to Boost::Function, we can capture one of these lazy functions and store it. The only limitation is that, because C++ is strongly typed, we can only capture it in a strongly typed object. So we loose a lot of the type genericness that we would normally have with lazy functions.

A minor limitation, but one we can accept. Especially since most of the time you're wanting to capture it (for say, a registered callback), you can easily tie down a specific argument list.

Another thing we can do is get local variables involved:

int iSomeValue;
for_each(someVector.begin(), someVector.end(), cout << (variable(iSomeValue) + arg1));

This will create a reference to iSomeValue, store it in the functional object, and use it on each envocation. If you have a list, and wish to take the summation of that list:

int iSomeValue;
for_each(someVector.begin(), someVector.end(), variable(iSomeValue) += arg1);

Much easier, and much more intuitive, than making a functor call.

It's important to note that 'variable(iSomeValue)' itself is, much like 'arg1', a functional object. So, you could execute, 'variable(iSomeValue)()', if you like really convoluted ways of getting the value of something.

Though I haven't discussed Boost::Signal, it is one of the chief places where you're going to want to use functional programming, aside from std::algorithms. There is a big difference, though: std::algorithms will immediately use your functional object and then destroy it. Signals, pretty much by definition, will call it whenever it feels like.

That's bad when you use 'variable(iSomeValue)'. That's because this variable will not exist by the time the Signal calls it. Remember: the variable() function stores a reference to the variable. And likewise remember that references to stack variables can be hugely dangerous if someone stores them and uses them outside of your scope.

Just FYI.

To get a bit more formal about things, what 'cout << arg1' is, in Phoenix parlance, is a "lazy function". This means that it is only evaluated when it's arguments are called. However, it is important to note that anything that is a Phoenix lazy function can also have certain things done to it. Like this:
(cout << arg1)(arg2, arg1)

This creates a new lazy function. This new lazy function takes two arguments. But the original one only took one.

Fortunately, Phoenix doesn't care. What this will do is create a lazy function can can only be evaluated to produce an answer if it is given two (or more) parameters. Because of how this function was constructed, the first argument will be discarded, and the second will be sent to standard out.

This becomes very interesting when you start using non-lazy functions as lazy function. AKA, when you start doing Boost::Bind-like stuff:

This returns a lazy function. It takes the arguments that the function, functor, or member function took. However, that may not be what you want, especially if that is a member function.

Just like with Boost::Bind, the first argument to a member function is the 'this' pointer. If 'this' isn't going to be an argument you expect the caller to use, you need to add it yourself:
bind(&TheClass::SomeFunc2)(*this, arg1, arg2)

Notice that it takes a reference to the class rather than a pointer. This means that using a smart pointer is impossible, so you need to make sure that your code makes it impossible for this code to be called beyond the lifetime of this object. Fortunately, Phoenix version 2 will allow the use of references, pointers, or smart pointers.

Because Phoenix is a lot more powerful than Boost::Bind, it can do some of the things that we only wished Bind could do:

bind(&TheClass::SomeFunc2)(*this, arg1, arg2 + bind(&TheClass::SomeFunc3)(*this, arg3));

Which creates a functor that takes 3 arguments and does functional composition. Even modifying the return value of the call to SomeFunc3 before passing it to SomeFunc2. You can do all kinds of nifty stuff with this.

The farthest limit of Phoenix however, comes from so-called 'lazy statements'. I scared you with this at the beginning:

for_each(c.begin(), c.end(),
if_(arg1 % 2 == 1)
cout << arg1 << ' '

The 'if_' construct is a lazy statement that evaluates it's contents (within the square brackets) only if the condition is true. Just like a regular 'if' statement.

Except that, since it's functional programming, it creates a function object that does this. And 'if_' is just the tip of the iceberg. Just about every C++ flow control construct exists as a lazy statement (except goto. But we don't use that anyway). Blocks of code can be multi-line, through the use of comma-separated elements:

if_(arg1 % 2 == 1)
cout << arg1 << ' ',
cout << (arg1 + 5) << endl //Note: no comma on the last line.

This is incredibly powerful.

However, it's important not to go thinking you can just write anything as a lazy statement. For performance reasons (or potential ones; these constructs have not been deeply profiled), it's generally a good idea to use lazy statements for rapid development, then factor them out into real functions that get called. The thing I like about them is that you can use std::for_each() and make it look like a regular for() statement.

This article is getting long, so I'll punt on closures. They're not the most important or interesting part of Phoenix anyway.

In any case, what you've seen here is merely the beginning. Through Boost::Function and Phoenix, you can have serious functional programming. You can create unnamed functions, compose functions, pull functions from standard C++ functions, and even build functions in a pseudo-C++ language.