A blog about in-depth analysis and understanding of programming, with a focus on C++.

Tuesday, November 21, 2006

Pimp my Code

C++, as a language, has a number of problems. Perhaps the biggest is the fact that, before you can really talk about an object, function, or other construct, you need to #include it. Java and C# don't impose the same requirement, and few more modern languages do.

Of course, C++ is compile-time linked too, unlike most other modern languages.

In any case, this poses a problem for a reason. Standard programming practice teaches encapsulation. That is, keeping an object's privates out of the light of day. The problem comes from something like this:

class BigBoy
{
public:
BigBoy();
virtual ~BigBoy();

void DrawBigBoy();

private:
boost::shared_ptr<Render::MeshClass> m_Mesh;
boost::shared_ptr<Anim::AnimSystem> m_AnimationSystem;
std::vector<boost::shared_ptr<Audio::Sounds> > m_SoundList;
std::vector<boost::shared_ptr<Coll::CollisionMesh> > m_CollList;
boost::shared_ptr<Physics::PhysicalObject> m_PhysicsObj;
};

BigBoy could be an entity class from a game system. It stores references to fundamental objects from 5 systems (and I could reasonably add more. It doesn't have AI yet).

If you needed to #include"BigBoy.h", you'd need to include headers from each of those 5 systems first.

The first attempt to solve this problem is to provide a forward declaration of the classes in this file. That would work, but only because boost::shared_ptr can work with incomplete types. But, then again, now people using BigBoy.h need to #include"boost/shared_ptr.h". If you're a heavy Boost user, that may be your standard mode of operation. But if you're not?

You'd also need to #include<vector>, which once again you may not need in your code. Most std::vector implementations aren't exactly short. Maybe one of those could have been a boost::array (fixed-length arrays that work like std::vector's) or a std::list; this just increases the number of different things you need to include.

What's worse is that if you change what BigBoy uses internally, you have to go to every file that uses BigBoy and change it. Kinda annoying, yes?

Imagine you have a thousand files using BigBoy (and note: this is not unreasonable for large game projects). Or worse, imagine you're writing the Unreal Engine or something, and you could have dozens or hundreds of users, each with hundreds of files that would need to be changed, just because you wanted to use a std::list or a boost::array.

What do we do? We pimp the code:

class BigBoyImpl;

class BigBoy
{
public:
BigBoy();
virtual ~BigBoy();

void DrawBigBoy();

private:
boost::shared_ptr<BigBoyImpl> m_Impl;
};

OK, what just happened?

Well, ignoring what BigBoyImpl is for the moment, I didn't really solve the problem so much as I reduced it. Since boost::shared_ptr works on incomplete types, this is compilable C++. However, I still require that people include boost::shared_ptr.

Tough. Most people using Pimpl (I'll get to what that is in a second) would use a naked BigBoyImpl *. I don't because there are times (constructors throwing exceptions) when a destructor for a class will not be evoked. But destructors for member objects will be. The use of boost::shared_ptr guarantees this. Plus, it provides for the possibility of having multiple BigBoy objects with the same BigBoyImpl object; copy constructors can work.

At the very least, I'm always going to use boost::shared_ptr. There's not going to be a sudden change to a more feature-rich pointer object (or if there was, I'd probably still have done the change for the naked-pointer version). We no longer have pressure to not change a facet of the implementation of BigBoy.

What we did was we took BigBoy and made it a nothing class. It no longer does anything; it is as empty as boost::shared_ptr. It exists to manage it's contents and provide an appropriate interface to them.

Which, if you think about it, is what you want an interface class to do: provide an interface.

This allows us to avoid the pain of including a thousand and one headers when we aren't using their contents. We get faster compile times (good, especially since heavy use of Boost can slow them down), less compiler fragility, and overall better stuff.

Pimpl stands for, "public interface, private implementation" (don't ask how they got Pimpl out of that...). And Pimpl is what we just did. We took the implementation details out of the header and put them into a new class.

So, for the implementer, what we have is, instead of a single .h/.cpp pair, we have two: BigBoy.h, BigBoy.cpp, BigBoyImpl.h, BigBoyImpl.cpp. Except that nobody sees BigBoyImpl.h or .cpp; those live in your system and nobody else needs to know they exist.

If the 4 file approach bothers you, you can even put the BigBoyImpl class definition in BigBoy.cpp.

Sounds solid: what are the downsides?

OOP, for one. OOP's more difficult to work with when you have 2 parallel object hierarchies. Plus, it's much harder for a user to derive anything from the class without needing access to the internals. If you used the 4 file approach, those internals are still available (and the user just signed a contract stating that their code can be broken by changes to the internals). Those internals can still use public/protected/private to hide greater details of the implementation.

If you used the 2 file approach, there's not much you can do.

BTW, it is entirely possible to do some forms of OOP, particularly if it is just for the user, simply by deriving from the interface class. If one is only really adding functionality to the class (in the BigBoy example, creating a BigBoyWithAI), and all crosstalk can be done with the public interface, everything is fine; it works like regular inheritance.

Another possible downside is performance. There's a level of indirection now. Allocating the interface means doing two 'new' operations. In BigBoy's case, allocating a BigBoy even in the original method, could have provoked a number of calls to 'new', so adding one more is not so bad. However, if it were a simpler class with no other dynamic allocation going on, and creating/destroying these objects was commonplace, then there could be a performance penalty.

Plus, the level of indirection means that inlining doesn't work. I wouldn't be concerned about this, as modern compilers (or, at least, Microsoft's free ones) can do global inlining and optimization. So as long as you have a good compiler watching your back, you're fine.

One other upside to Pimping your code: OOP internally is hidden. Now, people are completely unable to do a dynamic cast or figure out what the real type of the object is. This can be a big benefit, particularly if your internals need to frequently talk to one another.

No comments: