A blog about in-depth analysis and understanding of programming, with a focus on C++.

Tuesday, November 21, 2006

Pimp my Code

C++, as a language, has a number of problems. Perhaps the biggest is the fact that, before you can really talk about an object, function, or other construct, you need to #include it. Java and C# don't impose the same requirement, and few more modern languages do.

Of course, C++ is compile-time linked too, unlike most other modern languages.

In any case, this poses a problem for a reason. Standard programming practice teaches encapsulation. That is, keeping an object's privates out of the light of day. The problem comes from something like this:

class BigBoy
{
public:
BigBoy();
virtual ~BigBoy();

void DrawBigBoy();

private:
boost::shared_ptr<Render::MeshClass> m_Mesh;
boost::shared_ptr<Anim::AnimSystem> m_AnimationSystem;
std::vector<boost::shared_ptr<Audio::Sounds> > m_SoundList;
std::vector<boost::shared_ptr<Coll::CollisionMesh> > m_CollList;
boost::shared_ptr<Physics::PhysicalObject> m_PhysicsObj;
};

BigBoy could be an entity class from a game system. It stores references to fundamental objects from 5 systems (and I could reasonably add more. It doesn't have AI yet).

If you needed to #include"BigBoy.h", you'd need to include headers from each of those 5 systems first.

The first attempt to solve this problem is to provide a forward declaration of the classes in this file. That would work, but only because boost::shared_ptr can work with incomplete types. But, then again, now people using BigBoy.h need to #include"boost/shared_ptr.h". If you're a heavy Boost user, that may be your standard mode of operation. But if you're not?

You'd also need to #include<vector>, which once again you may not need in your code. Most std::vector implementations aren't exactly short. Maybe one of those could have been a boost::array (fixed-length arrays that work like std::vector's) or a std::list; this just increases the number of different things you need to include.

What's worse is that if you change what BigBoy uses internally, you have to go to every file that uses BigBoy and change it. Kinda annoying, yes?

Imagine you have a thousand files using BigBoy (and note: this is not unreasonable for large game projects). Or worse, imagine you're writing the Unreal Engine or something, and you could have dozens or hundreds of users, each with hundreds of files that would need to be changed, just because you wanted to use a std::list or a boost::array.

What do we do? We pimp the code:

class BigBoyImpl;

class BigBoy
{
public:
BigBoy();
virtual ~BigBoy();

void DrawBigBoy();

private:
boost::shared_ptr<BigBoyImpl> m_Impl;
};

OK, what just happened?

Well, ignoring what BigBoyImpl is for the moment, I didn't really solve the problem so much as I reduced it. Since boost::shared_ptr works on incomplete types, this is compilable C++. However, I still require that people include boost::shared_ptr.

Tough. Most people using Pimpl (I'll get to what that is in a second) would use a naked BigBoyImpl *. I don't because there are times (constructors throwing exceptions) when a destructor for a class will not be evoked. But destructors for member objects will be. The use of boost::shared_ptr guarantees this. Plus, it provides for the possibility of having multiple BigBoy objects with the same BigBoyImpl object; copy constructors can work.

At the very least, I'm always going to use boost::shared_ptr. There's not going to be a sudden change to a more feature-rich pointer object (or if there was, I'd probably still have done the change for the naked-pointer version). We no longer have pressure to not change a facet of the implementation of BigBoy.

What we did was we took BigBoy and made it a nothing class. It no longer does anything; it is as empty as boost::shared_ptr. It exists to manage it's contents and provide an appropriate interface to them.

Which, if you think about it, is what you want an interface class to do: provide an interface.

This allows us to avoid the pain of including a thousand and one headers when we aren't using their contents. We get faster compile times (good, especially since heavy use of Boost can slow them down), less compiler fragility, and overall better stuff.

Pimpl stands for, "public interface, private implementation" (don't ask how they got Pimpl out of that...). And Pimpl is what we just did. We took the implementation details out of the header and put them into a new class.

So, for the implementer, what we have is, instead of a single .h/.cpp pair, we have two: BigBoy.h, BigBoy.cpp, BigBoyImpl.h, BigBoyImpl.cpp. Except that nobody sees BigBoyImpl.h or .cpp; those live in your system and nobody else needs to know they exist.

If the 4 file approach bothers you, you can even put the BigBoyImpl class definition in BigBoy.cpp.

Sounds solid: what are the downsides?

OOP, for one. OOP's more difficult to work with when you have 2 parallel object hierarchies. Plus, it's much harder for a user to derive anything from the class without needing access to the internals. If you used the 4 file approach, those internals are still available (and the user just signed a contract stating that their code can be broken by changes to the internals). Those internals can still use public/protected/private to hide greater details of the implementation.

If you used the 2 file approach, there's not much you can do.

BTW, it is entirely possible to do some forms of OOP, particularly if it is just for the user, simply by deriving from the interface class. If one is only really adding functionality to the class (in the BigBoy example, creating a BigBoyWithAI), and all crosstalk can be done with the public interface, everything is fine; it works like regular inheritance.

Another possible downside is performance. There's a level of indirection now. Allocating the interface means doing two 'new' operations. In BigBoy's case, allocating a BigBoy even in the original method, could have provoked a number of calls to 'new', so adding one more is not so bad. However, if it were a simpler class with no other dynamic allocation going on, and creating/destroying these objects was commonplace, then there could be a performance penalty.

Plus, the level of indirection means that inlining doesn't work. I wouldn't be concerned about this, as modern compilers (or, at least, Microsoft's free ones) can do global inlining and optimization. So as long as you have a good compiler watching your back, you're fine.

One other upside to Pimping your code: OOP internally is hidden. Now, people are completely unable to do a dynamic cast or figure out what the real type of the object is. This can be a big benefit, particularly if your internals need to frequently talk to one another.

Are You or Are You Not?

OOP: Object Oriented Programming. Basically, it's all about polymorphism, inheritance, and virtual methods overriding base class methods.

But there's a problem. Like all new tools, there is a certain class of people who believe that this tool should be used everywhere. These people idolize the tool, believing that the tool holds no wrong and should be used wherever possible. And even some places that it shouldn't be possible.

The key to taking C++ to the Limit is not just knowing when to use something; it's knowing when not to.

So, we're going to talk about how to properly use OOP. Or, more accurately, why it shouldn't be the default state.

First, a clarification. When I say "use OOP", I mean to derive a class from a base class for the purpose of using polymorphism to cause something to happen. Any use of classes (or structs) is not OOP; it doesn't become OOP unless you're using a class hierarchy and taking advantage of polymorphism. A class hierarchy alone isn't enough; that's merely a way of sharing interfaces. It is when you have a class hierarchy and you're using polymorphism to treat the derived class like a regular base class instance that you're truly being polymorphic.

Basically, the question when you're designing an object with relationships to other objects is this: is it "is a" or "has a"?

"Is a" means exactly what it says: Object B is an Object A. That is, it is reasonable to take a pointer to Object B and cast it into a pointer to Object A. If this is the kind of relationship you want between them, then you're using OOP.

"Has a" is likewise rather self-explanatory: Object B has an Object A. That is, it stores internally a pointer to Object A. Here, you're not using OOP.

However, even with this distinction in relationships, we need to go deeper to get at what the ramifications of either choice are.

Using "Is a" nets you some advantages.

  • It means that the public interface of Object A (the parent) are immediately usable by users of Object B. Object B also gets to use the protected interface of Object A. Basic inheritance.

  • Second, it gives you polymorphism: the ability to make code that used Object A now use Object B while redefining some of Object A's behavior, all without rewriting that code.


However, there are limitations that "Is a" gives you.
  • Basic inheritance may not really be what you want. Object A may have some public interface that Object B wants, but it may not want all of it. So users of Object B will have extra functions that they can use.

  • Object B has a 1:1 relationship with Object A. That is, there can't be two Object A instances that Object B has a relationship with.

  • If there are implementation details in Object A that are of no value to Object B, then Object B instances will have a lot of dead weight.



Notice something about those limitations? They all happen only when you want to acknowledge that Object B exists! That is, when you're using the derived class explicitly. Whether this was through the use of a cast (static_cast, dynamic_cast, or C-standard cast) or something else, it doesn't matter. The limitations come into play when the derived class needs to be used frequently. As long as the code is treating Object B as through it were Object A, and only talking to it through members, everything pretty much works.

The take-home point is this: only use "Is a", only use OOP, when your specific intent is to only (or, at least, most of the time) access the object through the base class interface. If you honestly need to look at an Object B as an Object B in a significant portion of the applicable code, it's probably not a good idea to make Object B an actual Object A.

In short, only use OOP for code that explicitly exhibits polymorphism.

Why? Well, if you don't, what will happen is that you'll create some object hierarchy. And then you're going to want to work around one of those limitations. So, you're going to do something ugly. Whether it's create a fat interface (bubble up virtual methods to the common base class even when they only matter for a derived classes), or other coding pathology.

Wednesday, November 15, 2006

Making Algorithms Work for You

I'm not a fan of the STL algorithms. Here's one of the biggest reasons why.

Let's say you've got a class. And you've got a std::vector. Now, you want to iterate over that vector and, say, call a member function of that class. This is what you do under STL:
class CallerFunctor
{
private:
TheClass *m_pTheClass;

public:
CallerFunctor(TheClass *pTheClass) : m_pTheClass(pTheClass) {;}
void operator() (const TheObject *pCurrObject)
{
m_pTheClass->TheFunction(pCurrObject);
}
}

void TheClass::OperateOnList(const std::vector &theList)
{
std::for_each(theList.begin(), theList.end(), CallerFunctor(this));
}


This is, in my eyes, incredibly verbose and obtuse code. You have this function object whose sole purpose is to call an object. Writing dozens and dozens of these functors is not a valid method for doing this. I prefer code that makes sense, and doesn't have 2 apparent levels of indirection in order to determine what's going on.

The biggest problem is that looking at TheClass::OperateOnList alone isn't enough to know what happens. All you know is that some functor is going to be called on every element; you have no idea what this functor will do without looking at it. And it is not apparent that the functor is going to call the ThisClass member function that it eventually will.

I could have named CallerFunctor better, perhaps. But the underlying problem is still there: the structure of the code alone is insufficient to let you know what's going to happen. I prefer more obvious code; code that seems correct by inspection.

What we'd really like to do is either remove the need for the functor entirely, or simply transform it into something generic that lets the reader know what the generic functor will do without having to track it down.

Enter Boost::Bind and Boost::MemFn. Boost::Bind allows you to take a "callable" (function, functor, or member function. Things that can be called) and mess with its parameters. But Boost::MemFn is just what the doctor ordered: it allows you transform a member function into a functor. Basically, it does what CallerFunction does, but generically.

So, let's see the original code written in this:
void TheClass::OperateOnList(const std::vector &theList)
{
std::for_each(theList.begin(), theList.end(),
boost::bind(&TheClass::TheFunction, this, _1));
}

Wait, didn't I say I was going to use boost::mem_fn? So what's with the boost::bind?

Well, there's two issues. First, boost::mem_fn is actually used by boost::bind. The above boost::bind statement is the equivalent of:
boost::bind(boost::mem_fn(&TheClass::TheFunction), this, _1);

Which brings us to problem #2. The construct, "boost::mem_fn(&TheClass::TheFunction)" returns a functor that takes two parameters, not one.

The first parameter for anything wrapped in a boost::mem_fn object is the 'this' pointer. That is, a pointer to the object type specified for the member function (note: boost::mem_fn can also take data member pointers and convert them into 1-argument functions that return const references to the member). Makes sense; after all, you can't use a member pointer without the class name.

The second parameter to the boost::mem_fn object is, of course, the parameter to be passed to the actual member function. If the member function took several parameters, then it would receive all of them.

The reason why we need boost::bind is because the std::for_each algorithm expects the function object to take only one parameter. We therefore needed a way to turn boost::mem_fn's 2 argument functor into a 1 argument functor, one that passes a particular 'this' to all invocations of the functor.

Hence the use of boost::bind. The job of boost::bind is to turn callables into functors that take fewer arguments than the original callable. So, let's dissect the statement.

The first argument passed to boost::bind is the callable. Simple enough. In our case, it was a boost::mem_fn functor that takes 2 parameters.

Following that argument is a list of other arguments. These arguments are all what the user expects to pass in to the callable. Since this is all done through template programming hackery (which thankfully we are not exposed to), if you don't pass enough arguments, the compiler will know and complain appropriately. It'll also do type checking for you when you try to use that functor, so that's all good too.

You may have noticed this identifier: '_1'. I never defined it (nor a lot of other things), but it's purpose is crucial. When, in a boost::bind argument list, you use _1, you are telling boost::bind to create a function argument that takes 1 argument. More specifically, you are telling the boost::bind object that the first parameter you give to the functor will be passed in this slot to the embedded callable.

Confused? Here's an example, without our member function cruft:
int Adder(int x, int y, int z)
{
return x + y + z;
}

void main()
{
boost::bind func1(Adder, 1, 2, 3);
boost::bind func2(Adder, _1, _2, _3);
boost::bind func3(Adder, _1, 5, 10);
boost::bind func4(Adder, _1, _1, _1);
boost::bind func5(Adder, _2, _3, _1);
}

Important note: This is not real boost::bind C++. The type of a boost::bind function is not readily store-able outside of a template parameter. However, let's pretend that this works as it looks like it would if boost::bind were a C++ type. That said, it is possible to use Boost::Function if the signature of the method can be predetermined.

Well, we create 5 boost::bind function objects. What do they do?

The workings of func1 are quite simple. func1 is a functor that takes zero parameters and returns 6. Congratulations; we just made a really complicated way of writing the identifier 6 ;)

Those underscores in the func2 constructor aren't mistakes; they're vital. The numbers on them refer to the positions of arguments in the resultant functor. Look at it from the perspective of something you understand:
int CallAdder(int a, int b, int c)
{
return Adder(a, b, c);
}

That is literally what we did with func2. It implements CallAdder as a functor; it takes three parameters and passes them in to Adder, returning whatever Adder does.

Well, func3 should be fairly simple. It replaces some of Adder's parameters with constant values, but still allows for one parameter. So, it returns a functor that takes one parameter, and adds 15 to it.

It is func4 where confusion can start to set in, but it's really quite simple. It creates a functor that takes 1 parameter (because it only ever uses one argument specifier), and uses that argument for all 3 parameters passed to Adder. Therefore, the result is a functor that multiplies its parameter by 3.

func5 shows that you can re-order parameters. It is the equivalent of:
int CallAdder(int a, int b, int c)
{
return Adder(b, c, a);
}

Now, since addition is commutative, this has no effect on the outcome.

Back to the original example:
boost::bind(boost::mem_fn(&TheClass::TheFunction), this, _1);

This creates a functor of 1 parameter. That parameter is passed as the second argument to the given callable, which is a functor of 2 parameters. The value of 'this' is passed as the first argument to the inner callable.

Which is exactly what we need, of course.

Now, I am aware that std::mem_fn exists, which technically allows what I originally asked for. But Boost::Bind and Boost::MemFn take this concept to the limit, which is what we're all about ;)

However, there's more here than just the niftiness of this use of C++. There are innumerable uses of Boost::Bind and Boost::MemFn outside of STL algorithms. But even within that narrow use, there are useful things that you can do which would be impossible with just std::mem_fn.

Our initial example made an interesting assumption: that there existed a function TheClass::TheFunction that took one argument of exactly the type the list consisted of. What if it didn't?

Now, we can't do anything if TheFunction doesn't take a parameter of the list's type. So one of the parameters must be of that type. However, we could have a TheFunction like:
void TheClass::TheFunction(int iTheFactor,
const TheObject *pTheObject, bool bStoreObj);

This member function is not designed to work with a standard algorithm. But we might want it to.

We'd have to use an explicit functor again if we were limited to std::mem_fn. Boost::Bind, however, was made for this stuff:
  std::for_each(theList.begin(), theList.end(),
boost::bind(&TheClass::TheFunction, this, 15, _1, false));

We know by now what this does. More importantly, it makes those standard algorithms much more useful. They don't make your code ugly anymore, and you aren't restricted to making special functions for them.

Now, there is a class of function object that can't be encapsulated in just this. For example, if you have a function that doesn't take the listed object type, but takes something similar to it, what you need is the ability to compose functions. I guess that's too much for boost::bind.

Or is it? Attend:
  std::for_each(theList.begin(), theList.end(),
boost::bind(&TheClass::TheFunction, this, 15,
boost::bind(MutatorFunc, _1), false));

Now, granted, you still have to write the mutator function. But this does exactly what we need. It calls the MutatorFunc with the argument (it could have been a member of TheClass), then passes this as the second argument to TheClass::TheFunction.

We still needed to write a one-off function: MutatorFunc. So we reintroduced that need due to the complexity of the issue. However, I'm willing to ignore that minor bit of hypocracy, simply because we solve so many other problems without the need of the mutator. And the mutator doesn't make anything worse than what we had before boost::bind; plus, the mutator can just be a bare function rather than a class that needs a 'this' pointer. So the bloat improved a little, in the worst case.

That's good enough for me ;)

Pointers to Intelligence

Memory management. That's what those Java and C# programmers hang over our heads day in and day out. You can't kill a system on garbage collection, but you can on memory leaks.

Ignoring the inaccuracy of the last statement, having completely manual memory management does cause its problems with regard to C++. Which is why we're going to show you the tools to achieve perfect, provable memory protection. We're taking memory management to the limit!

And the path to that limit begins, along with so many other good things in C++, not in the C++ Standard Library (or, not yet), but in the Boost libraries. As far as I'm concerned, if you don't have Boost on your harddrive, you're not a C++ programmer. It is the rest of the C++ Standard Library; it's that vital.

For the purposes of memory management, Boost gives us 2 classes of vital importance. In the Boost::smart_ptr library, I present to you boost::shared_ptr and boost::weak_ptr.

Boost's shared_ptr class is a fairly simple construct. It's a templated class that takes a pointer to an object in its constructor. It has overloaded the pointer dereference operators (* and ->), so that you can use the shared_ptr as though it were a real pointer to that type. And when the shared_ptr object goes out of scope, the shared_ptr class thoughtfully deletes the object that was passed into its constructor.

Fairly dry, right? Well, I did miss one feature. You can freely copy this shared_ptr wherever you wish (it has a copy constructor). And, so long as any one remains in existence, so too will the object it points to.

Just think about that for a second. This sounds suspiciously like garbage collection, yes? Of course it does; that's what it is. It functions exactly like a reference counted garbage collector. With one exception: it always happens immediately when the last reference is removed. And personally, I consider that a feature.

So, what's the catch? Well, there's two issues, one of them shared with all reference counted deallocators: circular references.

If object A has a shared pointer to B, and object B has a shared pointer to A, and nobody else has any shared pointers to either, neither one will ever be destroyed (or probably called).

We'll come back to that in a second, because there's a second issue, one that will bite you sooner or later (probably sooner).

Go back to that paragraph where I described what a shared_ptr does. What do you suppose this code would do:
SomeObjectType *pSomeObj = new SomeObjectType();
boost::shared_ptr<SomeObjectType> sp1(pSomeObj);
boost::shared_ptr<SomeObjectType> sp2(pSomeObj);

Well, this is going to be bad. See, both of these shared_ptr objects think that they own that object, and at some point, both of them will try to delete it. One of them is going to succeed, and the other is going to cause the program to explode, behave erratically, or other such things.

Basically, you can only ever use the pointer constructor for a shared_ptr once. After that, you can only ever copy shared_ptr's from one place to another. You can create new instances, but only from old instances. For any one object, there can only ever be one evocation of the shared_ptr constructor.

On the surface, this doesn't sound so bad. But it can cause real problems, especially if you're not aware of it.

The biggest problem comes from wanting a shared_ptr when you're in the class's constructor. Maybe you want this class to register itself with some external (global) object somewhere, so that others may query it by name. Or other such things. In order to do that, you need to give it a shared_ptr.

However, most users use shared_ptr's like this:
shared_ptr<SomeObjType> pThePtr = new SomeObjType;

If SomeObjType's constructor created a shared_ptr from 'this', we'd be in trouble. Which means that the only way to deal with this is through a paradigm like this:
new SomeObjType(name); //trusting it to register itself.
shared_ptr<SomeObjType> pThePtr = GlobalRegistrar.GetObject(name);

This is fool-proof... except that you have to remember to do it. And you lose a bit of performance, in that the registrar object has to do a name search (a quick check for the last-added eliminates this performance loss, of course). If you forget to do this, strange bugs can happen; the best you can hope for is that the program will crash. Worst-case, it keeps running, but becomes unstable.

The following is also a trap:
SomeObjectType *pTheObj = otherObject.GetSomeObject();
shared_ptr<SomeObjectType> pThePtr = pTheObj;

This is an error, as that pointer may have previously been wrapped in a shared_ptr. Therefore, the error isn't in the second statement, but in the first.

The moment you wrap a pointer in a shared_ptr, you have entered into a binding contract, on penalty of incredibly difficult-to-find bugs, that states that you will never pass a bare pointer to this object to anything. Ever. I don't care how tired you get of writing, "shared_ptr<SomeObjectType>"; wrap it in a typedef if you have to (SomeObjectTypePtr, for example). You must never do that.

Shared pointers are viral, and like any good virus, it must infect everything or nothing. As Yoda said, "Try not. Do. Or Do Not; there is no Try." Which is why the best time to wrap an object in a shared_ptr is immediately upon creation.

The other problem with shared_ptr's is that of circular references. This one can actually be solved. The key to solving it lies both in Boost and in your own attitude.

The reason this problem comes up is usually for one reason: you've forgotten what it means to have a shared_ptr. If an object stores a shared_ptr to some other object, you are making a strong statement about the relationship between these objects (shared_ptr's are often called "strong pointers" for this reason). You are saying, "Object A cannot function in any way, shape, or form without Object B. Object B is an intrinsic part of A, and A would be absolutely, totally meaningless without B."

When phrased that way, you start to realize that you may be overusing shared_ptr's. For example, if you're writing a game AI, it has a reference to a target. Would it be so bad if that target entity just happen to vanish off the face of the code at some point? Probably not; the AI doesn't need a target in order to function (presumably). If it was about to fire and the target's gone (say, killed by something else), then the AI simply needs to be aware of it and move on to something else.

Boost, like many garbage-collected systems, has a way to express this concept. We represent this with a "weak pointer": boost::weak_ptr. A weak_ptr lives as a conceptual wrapper to shared_ptr. While shared_ptr implements the dereference operator, you need to call a function on a weak_ptr in order to get a shared_ptr. This is for a very good reason, because, unlike shared_ptr's, weak_ptr's may return nothing.

See, a weak_ptr refers to the shared pointer (or copies thereof) that it was given at construction time. But it does not store a copy of that shared_ptr; it simply knows (via some mechanism best left in the Boost library) that the shared_ptr exists and how to generate one. However, if you have the following:
weak_ptr<SomeObjectType> aWeakPtr;
{
shared_ptr<SomeObjectType> pThePtr = new SomeObjectType;
aWeakPtr = weak_ptr<SomeObjectType>(pThePtr);
}
shared_ptr<SomeObjectType> pNewPtr = aWeakPtr.lock();
pNewPtr->CallFunc();

This code will crash, guaranteed. The weak_ptr::lock() function retrieves a shared_ptr to the object held by the weak_ptr. However, if all non-weak references to that object have disappeared since the weak_ptr was created, the weak_ptr::lock() function returns an empty shared_ptr. And, of course, dereferencing an empty shared_ptr returns NULL, which means that the function call will give rise to an error.

This is good. This is exactly what we want, as it allows us to reserve shared_ptr use for when we really mean it, and just pass around weak_ptr's for when we don't. This should cover 99% of all circular reference cases.

For the rest... restructure your code. Most code is pretty easy to structure in a tree-like fashion. Each level represents a lower-level of code. In general, siblings in this tree should use weak_ptrs with one another, while parents should have strong_ptrs (or inheritance, where appropriate) to their children. And children should not need to know about their parents at all; that's what makes them children in a code sense.

FYI: the Boost::smart_ptr library is not thread safe. And there's about a thousand ways for multithreading to screw this whole reference-counting thing all up. So, I would strongly suggest you keep your shared_ptr's in different threads or avoid threading entirely if you use this library. Alternatively, you can hack the library to use a semaphore of some sort.

Saturday, November 04, 2006

Functional Programming in C++ 1

What is functional programming?

The answer to that question is strange. If you've graduated from high school, odds are you've seen functional programming... in math class. As such, I'm going to use that as an analogy to explain what functional programming is.

In Math, you learned that a function looks like this:

f(x) = x + 4

The function f takes a single parameter 'x', and returns the value of x added to 4. Algebra tells us that the function f can legally take any value of x. That is, x can be any number, real or imaginary. And the result of the function f is also any number, real or imaginary.

Here's another example of a math function:

f(x) = 1 / x

Algebra tells us that f can take any number, real or imaginary, except 0. That would make the result of f undefined, as 1/0 is illegal. Therefore, the function cannot take that as an input. Also, the function f can return any output, real or imaginary, except 0.

Another example that's even more limited:

f(x) = log(x)

f can only take positive real numbers or 0 (we'll ignore complex numbers, as the log of a complex number is... wierd). And it will only ever return positive real numbers or 0.

Something a bit more complex:

f(x, y) = x / y

This time, f takes two parameters. The parameter x can be any number, but y must not be 0. And the function can return any number as well.

And now for something a bit weird:

f() = 3 + 4

This is a function that takes no parameters. And it returns only one number: 7.

So, what was the point of that? Well, that was functional programming.

Not convinced that this could be considered programming of any kind? Well, let's proceed.

Let's say we have these functions:

f(x) = x * 5
g(x, y) = x + (4 * y)

Two functions now. I'll skip the input/output analysis and move on to composition. Let's say you want to do this:

h(x, y) = g(f(y), x)

Well, determining what the function h is is pretty simple:

h(x, y) = (y * 5) + (4 * x)

See that? We just did functional programming. We took two functions, composed them together, and created a third function.

You can do more with functional composition. You can lose parameters:

h(x) = f(g(x, x))

Add new ones:

h(x, y, z) = g(x, y) + f(z)

And so on.

By now, you're probably kinda annoyed with this math lesson. What does this have to do with programming? Well, it's simple.

Forget the fact that what I'm about to write is not legal C/C++.

float f(float x)
{
return x * 5;
}
float g(float x, float y)
{
return x + (y * 4);
}

float h(float x, float y) = g(f(y), x);

That last line is totally bogus C++. But... what if it wasn't? What would it mean if it were real and legal?

Well, actually, it wouldn't be too amazing, because you can do:

float h(float x, float y)
{
return g(f(y), x);
}


But, what you can't do is:

void main()
{
float h(float x, float y) = g(f(y), x);

h(4, 5);
}


You cannot dynamically define functions. And that is the fundamental difference between functional programming and procedural programming.

In functional programming, functions can be thrown around, created, destroyed, composed, etc. They're no different from any other construct; they're just like an integer, a float, or an object.

So, back to that question: what would it mean for C++ if you could do this? Forget the fact that we defined this all as math so far; it's now just C++ function calls. In fact, the math stuff is C++ function calls. After all, operator +, *, etc will call functions when provided with objects for operands. So even those are function calls.

Now that you understand what functional programming is, next time we'll discuss what you could do with it if it worked in C++. In the third article, we'll discuss, well, turning this from an academic "what if" into the truth: that C++ is very capable of all of this. Once you use the right library, of course.

Friday, November 03, 2006

C++ and Taking it to the Limt

So, what's the point of this Blog?

I'm a C++ programmer, working in the programming industry. Programmers are a pretty intelligent lot. Programming is a meritocracy; if you can't hack it, you just don't get in.

And yet, you cannot find more conservatism among programmers than you do among C++ programmers. Nobody wants to touch templates at all. Or, if they do, it's only in the most basic, simplistic way as in STL containers (vector, list) and nothing more than that.

You don't get that with other languages. You don't see C# programmers categorically avoiding delegate functions and event handlers. You don't see Java programmers flee at the very mention of serialization or other such things. Lua programmers don't shiver in fear of using coroutines, meta-tables, or other language features. They look on these as just tools, to be used where appropriate and to not be forced into places when not.

Why do you get this attitude with C++? Well, it's simple really. You hear horror stories (and I've experienced some of them) of some programmer having to deal with some incredibly obtuse piece of C++ code. You know the kind; the code that uses templates far more than is needed. The kind where following the code at all is virtually impossible, even in a debugger. And heaven forbid that you have to actually debug through any of it.

It has been long said that you can shoot yourself in the foot with C, but C++ gives you the ability to take off your whole leg. So the conservatism is understood.

But more than anything, it is an anathema to programming. C++ is capable of so much, yet so many C++ programmers use C++ as though it were merely C with language support for OOP.

There are a lot of alternatives to C++ these days. C#, Java, even scripting languages like Python and Lua are considered alternatives to C++ in some circles. The primary advantage of C++ over those languages is raw speed. Yet so many programmers look at C++ for performance and believe that it just doesn't matter. That the ease-of-use of other languages outweighs these concerns. And if you need performant code, you should use straight C, because it is more standardized (in terms of compiled binaries).

I understand that point of view, but I disagree with it. C++ can get most of that ease-of-use, but not if you pretend that it's just C with objects.

For example, did you know that you can write functional programming in C++? I don't mean something kinda like functional programming; I mean just about the whole thing. Taking functions and applying functions to them to get new composite functions. You can get closures, but you'll sacrifice the look of your code. You need to download a header library to do all of this, but that's all.

You can write a parser in C++, directly translating an Enhanced Backus-Naur Form context-free grammar into C++ instructions. You can bind functions to a scripting language without any code generation; purely through the C++ template processor. You can write C++ code that is virtually guaranteed never to leak memory; who needs slow nonsense like garbage collection in the face of that?

But in order to do any of this, you have to do something that far too few conservative programmers are unwilling to do: use all of the languages features. No longer will we fear templates. No longer will we fear exception handling. No longer.

The purpose of this Blog is to teach you how to take C++ to the very limit. To the point where it stops looking like C with objects, and fully becomes the possibilities of what C++ can really do.

Undoubtedly, some of these features will frighten you. But remember back to when pointers frightened you? That didn't stop you from becoming comfortable with them (and if it did... well, did you come here by mistake?). If you don't want to be a programmer, don't code. If you're ready to take C++ to the Limit, then screw your courage to the sticking place and follow me.

Navigating This Blog

This Blog is basically divided into a number of topics, under the "Label" section. They're mostly self-explanatory, except for one thing: Limit Science.

This section lists all articles that are about advanced programming techniques. Most articles will be about these, and thus be in this section, but not all (not this one, for example).