A blog about in-depth analysis and understanding of programming, with a focus on C++.

Wednesday, November 15, 2006

Making Algorithms Work for You

I'm not a fan of the STL algorithms. Here's one of the biggest reasons why.

Let's say you've got a class. And you've got a std::vector. Now, you want to iterate over that vector and, say, call a member function of that class. This is what you do under STL:
class CallerFunctor
{
private:
TheClass *m_pTheClass;

public:
CallerFunctor(TheClass *pTheClass) : m_pTheClass(pTheClass) {;}
void operator() (const TheObject *pCurrObject)
{
m_pTheClass->TheFunction(pCurrObject);
}
}

void TheClass::OperateOnList(const std::vector &theList)
{
std::for_each(theList.begin(), theList.end(), CallerFunctor(this));
}


This is, in my eyes, incredibly verbose and obtuse code. You have this function object whose sole purpose is to call an object. Writing dozens and dozens of these functors is not a valid method for doing this. I prefer code that makes sense, and doesn't have 2 apparent levels of indirection in order to determine what's going on.

The biggest problem is that looking at TheClass::OperateOnList alone isn't enough to know what happens. All you know is that some functor is going to be called on every element; you have no idea what this functor will do without looking at it. And it is not apparent that the functor is going to call the ThisClass member function that it eventually will.

I could have named CallerFunctor better, perhaps. But the underlying problem is still there: the structure of the code alone is insufficient to let you know what's going to happen. I prefer more obvious code; code that seems correct by inspection.

What we'd really like to do is either remove the need for the functor entirely, or simply transform it into something generic that lets the reader know what the generic functor will do without having to track it down.

Enter Boost::Bind and Boost::MemFn. Boost::Bind allows you to take a "callable" (function, functor, or member function. Things that can be called) and mess with its parameters. But Boost::MemFn is just what the doctor ordered: it allows you transform a member function into a functor. Basically, it does what CallerFunction does, but generically.

So, let's see the original code written in this:
void TheClass::OperateOnList(const std::vector &theList)
{
std::for_each(theList.begin(), theList.end(),
boost::bind(&TheClass::TheFunction, this, _1));
}

Wait, didn't I say I was going to use boost::mem_fn? So what's with the boost::bind?

Well, there's two issues. First, boost::mem_fn is actually used by boost::bind. The above boost::bind statement is the equivalent of:
boost::bind(boost::mem_fn(&TheClass::TheFunction), this, _1);

Which brings us to problem #2. The construct, "boost::mem_fn(&TheClass::TheFunction)" returns a functor that takes two parameters, not one.

The first parameter for anything wrapped in a boost::mem_fn object is the 'this' pointer. That is, a pointer to the object type specified for the member function (note: boost::mem_fn can also take data member pointers and convert them into 1-argument functions that return const references to the member). Makes sense; after all, you can't use a member pointer without the class name.

The second parameter to the boost::mem_fn object is, of course, the parameter to be passed to the actual member function. If the member function took several parameters, then it would receive all of them.

The reason why we need boost::bind is because the std::for_each algorithm expects the function object to take only one parameter. We therefore needed a way to turn boost::mem_fn's 2 argument functor into a 1 argument functor, one that passes a particular 'this' to all invocations of the functor.

Hence the use of boost::bind. The job of boost::bind is to turn callables into functors that take fewer arguments than the original callable. So, let's dissect the statement.

The first argument passed to boost::bind is the callable. Simple enough. In our case, it was a boost::mem_fn functor that takes 2 parameters.

Following that argument is a list of other arguments. These arguments are all what the user expects to pass in to the callable. Since this is all done through template programming hackery (which thankfully we are not exposed to), if you don't pass enough arguments, the compiler will know and complain appropriately. It'll also do type checking for you when you try to use that functor, so that's all good too.

You may have noticed this identifier: '_1'. I never defined it (nor a lot of other things), but it's purpose is crucial. When, in a boost::bind argument list, you use _1, you are telling boost::bind to create a function argument that takes 1 argument. More specifically, you are telling the boost::bind object that the first parameter you give to the functor will be passed in this slot to the embedded callable.

Confused? Here's an example, without our member function cruft:
int Adder(int x, int y, int z)
{
return x + y + z;
}

void main()
{
boost::bind func1(Adder, 1, 2, 3);
boost::bind func2(Adder, _1, _2, _3);
boost::bind func3(Adder, _1, 5, 10);
boost::bind func4(Adder, _1, _1, _1);
boost::bind func5(Adder, _2, _3, _1);
}

Important note: This is not real boost::bind C++. The type of a boost::bind function is not readily store-able outside of a template parameter. However, let's pretend that this works as it looks like it would if boost::bind were a C++ type. That said, it is possible to use Boost::Function if the signature of the method can be predetermined.

Well, we create 5 boost::bind function objects. What do they do?

The workings of func1 are quite simple. func1 is a functor that takes zero parameters and returns 6. Congratulations; we just made a really complicated way of writing the identifier 6 ;)

Those underscores in the func2 constructor aren't mistakes; they're vital. The numbers on them refer to the positions of arguments in the resultant functor. Look at it from the perspective of something you understand:
int CallAdder(int a, int b, int c)
{
return Adder(a, b, c);
}

That is literally what we did with func2. It implements CallAdder as a functor; it takes three parameters and passes them in to Adder, returning whatever Adder does.

Well, func3 should be fairly simple. It replaces some of Adder's parameters with constant values, but still allows for one parameter. So, it returns a functor that takes one parameter, and adds 15 to it.

It is func4 where confusion can start to set in, but it's really quite simple. It creates a functor that takes 1 parameter (because it only ever uses one argument specifier), and uses that argument for all 3 parameters passed to Adder. Therefore, the result is a functor that multiplies its parameter by 3.

func5 shows that you can re-order parameters. It is the equivalent of:
int CallAdder(int a, int b, int c)
{
return Adder(b, c, a);
}

Now, since addition is commutative, this has no effect on the outcome.

Back to the original example:
boost::bind(boost::mem_fn(&TheClass::TheFunction), this, _1);

This creates a functor of 1 parameter. That parameter is passed as the second argument to the given callable, which is a functor of 2 parameters. The value of 'this' is passed as the first argument to the inner callable.

Which is exactly what we need, of course.

Now, I am aware that std::mem_fn exists, which technically allows what I originally asked for. But Boost::Bind and Boost::MemFn take this concept to the limit, which is what we're all about ;)

However, there's more here than just the niftiness of this use of C++. There are innumerable uses of Boost::Bind and Boost::MemFn outside of STL algorithms. But even within that narrow use, there are useful things that you can do which would be impossible with just std::mem_fn.

Our initial example made an interesting assumption: that there existed a function TheClass::TheFunction that took one argument of exactly the type the list consisted of. What if it didn't?

Now, we can't do anything if TheFunction doesn't take a parameter of the list's type. So one of the parameters must be of that type. However, we could have a TheFunction like:
void TheClass::TheFunction(int iTheFactor,
const TheObject *pTheObject, bool bStoreObj);

This member function is not designed to work with a standard algorithm. But we might want it to.

We'd have to use an explicit functor again if we were limited to std::mem_fn. Boost::Bind, however, was made for this stuff:
  std::for_each(theList.begin(), theList.end(),
boost::bind(&TheClass::TheFunction, this, 15, _1, false));

We know by now what this does. More importantly, it makes those standard algorithms much more useful. They don't make your code ugly anymore, and you aren't restricted to making special functions for them.

Now, there is a class of function object that can't be encapsulated in just this. For example, if you have a function that doesn't take the listed object type, but takes something similar to it, what you need is the ability to compose functions. I guess that's too much for boost::bind.

Or is it? Attend:
  std::for_each(theList.begin(), theList.end(),
boost::bind(&TheClass::TheFunction, this, 15,
boost::bind(MutatorFunc, _1), false));

Now, granted, you still have to write the mutator function. But this does exactly what we need. It calls the MutatorFunc with the argument (it could have been a member of TheClass), then passes this as the second argument to TheClass::TheFunction.

We still needed to write a one-off function: MutatorFunc. So we reintroduced that need due to the complexity of the issue. However, I'm willing to ignore that minor bit of hypocracy, simply because we solve so many other problems without the need of the mutator. And the mutator doesn't make anything worse than what we had before boost::bind; plus, the mutator can just be a bare function rather than a class that needs a 'this' pointer. So the bloat improved a little, in the worst case.

That's good enough for me ;)

No comments: