A blog about in-depth analysis and understanding of programming, with a focus on C++.

Wednesday, September 12, 2007

One RAII to Rule Them All

RAII: Resource Acquisition Is Initialization.

Perhaps the most useless acronym ever.

Why? Because it only tells you half of what it is. Perhaps even less.

The part of RAII that is in the name is the concept of wrapping all resources in objects. Specifically, the constructor of objects.

What the name misses is the basic construct of RAII.

Consider the following code:
char *LoadAndProcessData(const char *strFilename)
{
FILE *pFile = fopen(strFilename, "rb");
if(!pFile)
return NULL;

char *pFileData = new char[50];

fread(pFileData, 50, pFile);
fclose(pFile);

ProcessData(pFileData);

return pFileData;
}

The points of failure in that code are legion. 'fopen' could fail, returning NULL. 'fread' could fail, returning NULL. Even 'new' could fail, either returning NULL or doing everybody's favorite: throwing an exception.

The first point, the function handles. Presumably, the calling function will look at the return value, check for NULL, and if it sees NULL, then it will do something useful.

The second point, however, gives rise to an unhandled error. If 'fread' fails, the function 'ProcessData' will get uninitialized (or partially initialized) memory. It may fail, throw an exception, or silently succeed but have bad data. None of these are good answers.

If 'new' fails, throwing an exception means that we fail. Granted, 'new' will only throw if we are actually out of memory, which isn't a terribly recoverable error. But if it does throw, we will immediately leave the function, without calling 'fclose'. That's bad.

File handles from a process tend to be cleaned up OK by the OS, mainly because so many programs are bad about doing it themselves. But it could have been a global resource, like a piece of memory to be shared between two processes (say, the clipboard). Throwing an exception would keep us from properly filling that memory out. Or closing the handle correctly, resulting in a leak. And so on.

It is precisely here where C-in-C++ programmers give up on exceptions entirely. After all, if 'new' didn't throw, we wouldn't have a problem.

However, there is a much more useful solution to this, one that doesn't turn its back on useful language features. The reasoning behind wanting to preserve exceptions has been discussed. What we will focus on is the technique that allows us to avoid abandoning exceptions. As well as giving us so much in return.

The reason that the acronym RAII is wrongheaded is that it isn't really the resource acquisition that's the issue; it's controlling when the resource is released. After all, 'fopen' does it's job; it creates a file handle well enough. The problem has always been that 'fclose' isn't guaranteed to be called.

The C++ specification dictates that, when an exception is caught, the stack will be unwound. All stack variables will have their destructor called, and it will be called in the proper order. Since this coincides with the exact moment we want to destroy the file handle, it makes sense to put the file handle on the stack.

RAII principles are as follows:
  1. Constructors are required to do actual work. Any and all things that could be deemed resource acquisition will only be done in object constructors.
  2. Any and all resource releasing will be done in object destructors.
  3. RAII-style objects must have appropriate copy constructors and copy assignment operators, if the object is copyable. If it is not, then it should forcibly be made non-copyable (by deriving from boost::non_copyable, for example).
  4. If there is a resource that cannot be created directly by a constructor, it must be created and immediately stored in an object with a destructor that will release the resource properly.
A few simple rules create a fundamentally different coding style.

What qualifies as "resource acquisition"? At a bare minimum, a resource is something you need to have cleaned up. Hence the backwards acronym; we define resources by wanting to get rid of them, not by how we acquire them.

Immediately, it becomes obvious that the value returned by operator 'new' qualifies. After all, memory is most definitely something we need to have cleaned up. So as per rule #4, the results of 'new' under RAII programming style must be store din something that will guaranteably call 'delete' when it is finished. We have already seen such an object: boost::shared_ptr. Shared pointers are reference counted, which doesn't qualify in terms of the guarantee, since circular references will prevent the resource from being collected. But they're close enough. There are also std::auto_ptr's and a few others.

So dynamically allocated memory is a resource. Files are resources. The GUI system windows, drawing tools, etc, are resources. And so on.

However, RAII is actually quite viral. Unless your entire codebase operates under RAII principles (or the non-RAII portions exist in isolated chunks at the bottom of the stack. Leaf-code), exceptions cannot be entirely safe. For our simple example of file openning, we can easily make it perfectly safe by creating a File object that acts as a RAII wrapper over the C FILE handle. The constructor opens the file, throwing an exception on failure, and the destructor closes it. For the simple case, it works just fine.

However, if you use that object in a non-RAII fashion (allocate it dynamically and store a pointer on the stack), you lose the guarantee. Which means that your entire codebase must hold everything allocated in a RAII fashion to be perfectly safe in throwing exceptions.

There is an implicit 5th rule as well. Because constructors are the only place where resource acquisition is to happen, they also must throw exceptions if the acquisition fails. That is the only way to signal the non-creation of an object. Generally, releasing a resource can't provoke a failure, so it is not much of an issue.

This is generally where the C-in-C++ programmers who stuck around decide to leave, as it requires a new coding style.

What gets lost are the benefits. After all, if you're being forced to use shared_ptr everywhere, memory leaks will be a lot less likely. Circular references may happen, but that is what weak_ptr is for. Your code will generally be very difficult to break from a resource perspective.

Also, exceptions are a very useful piece of functionality. They can be used as a complex piece of flow control (but only sparingly, and only when it really matters). And, as mentioned earlier, they have a vital role in reporting certain kinds of errors.

In general, it is a more rigid programming style, requiring, particularly for shared_ptr relationships, a degree of forethought in the design phase. But it can save a lot of time dealing with errors, memory leaks, and other concerns.

No comments: