String handling is one of the most error-prone aspects of programming in C and C++. Errors in dealing with strings account for most of the buffer overruns that result in security problems. In many languages, a string is an elementary type, and several of the issues that cause problems in C and C++, such as buffer overruns and problems with illegal pointers, don't occur as easily in these other languages. Perhaps if C had been written with a string type, we might have fewer problems with strings.
Let's examine strings and take a look at three C library calls that can compromise the security of your code. Don't despair, I'll also introduce you to the Standard Template Library (STL) and explain how it can help you avoid some of these security vulnerabilities in your code. As I pointed out last time, the contents of this column assume that the reader has a basic familiarity with programming in C.
What's a String?
A string is a series of characters ending with a null (‘\0’) character that lets the program know where to terminate the string. A Unicode string is a series of wide characters (WCHAR) that also terminates with a null character. At the lower levels (e.g., kernel level) of Windows 2000 (Win2K) and Windows NT, a UNICODE_STRING type often represents strings. This structure maintains information about the length of the string and the maximum size of the buffer. Dealing with kernel-level code is beyond the scope of this article, but you should be aware that this approach represents another way of string handling. Almost without exception, the C library calls, which deal with single-byte characters, have equivalents to properly deal with Unicode strings, and the same pitfalls apply to both single-byte and Unicode strings. Let's begin by examining some of the available library calls, starting with strcpy().
Fun with strcpy()
The first library call, strcpy(), is defined as
char* strcpy(char* dest, const char* src);
A quick look at how C and C++ implement this function and a little thought about what parameters aren’t passed into this function gives a good view of the problems that can occur. What happens if src of dest is null? Ker-boom, it throws an unhandled exception or overwrites the stack. What if the string that src points to is longer than the dest buffer can hold? You'll overwrite past the end of the dest buffer, and if dest is a static buffer, declared on the stack like buf in the following example:
//this is the wrong way
void foo(char* inp)
{
char buf[25];
strcpy(buf, inp);
//do more processing of buf
}
Buffer overruns are the sort of thing an attacker loves to find in your code. Once inp fills up buf, it starts overwriting the stack and can usually cause your program to execute whatever code the attacker wants. Consider a related problem: What if inp isn’t null-terminated? Now, our not-very-bright strcpy() function takes everything in inp and stuffs it into buf, past the end of buf, and keeps going until it triggers an exception handler. For these reasons, many programmers ban strcpy() from their applications.
Fortunately, you can improve the situation and still use strcpy():
void bar(char* inp)
{
char buf[25];
//first check to see if inp is illegal – if you don’t do this, strlen
//call below blows up
if(inp == NULL)
{
assert(false);
printf(“Cannot process a null pointer!!!\n”);
return;
}
//use <, not <= - that way you have room for a termination character
if(strlen(inp) < sizeof(buf))
{
strcpy(buf, inp);
//do more processing
}
else
{
printf("Hey! That string is too long!\n");
}
}
The first thing you need to do is determine whether inp is a legal string; if not, you need to throw an assert (if you're in a debug build) to let the programmer know that a problem exists in the calling function. You've just eliminated one gotcha. Next, check to see whether the string is too long for your buffer, and complain if it is. Since strlen() also blows up when passed a null pointer, you'll want to check for that condition before checking the string length. Note the use of the sizeof() operator, which helps keep you from making mistakes if you later decide to change the size of buf—this operator automatically takes into account any such changes. As a last point, you need to reduce the length of the inp string to be one character less than the size of the buffer to leave room for the null character.
So, what can go wrong? The most likely problem you'll encounter is that inp really isn’t null-terminated, and as a result, the strlen() call will blow up. Another problem is that the inp pointer might not be valid—checking for NULL is nice, but the pointer might still be illegal. For example, the pointer might point into kernel space, point too low into user space (<64KB), or be complete junk. However, if you do too much checking, your code will run slowly, so you have to make some compromises. However, as Steve Maguire points out in his book Writing Solid Code (ISBN: 1556155514), if you’re running around dereferencing null pointers, execution speed is the least of your worries.
So, what does this code do right? If you get any obvious errors or enter a string that's too long, you’ll fail gracefully, note the error, and return execution to the caller. A more complete example would return unique errors to the caller, but I've simplified the code in this example.
Is strncpy() Better?
The second library call, strncpy(), is defined as
char *strncpy( char *dest, const char *src, size_t count );
On the surface, this one looks better than strcpy()—at least it wants to know how many characters you’d like to stuff into the buffer. However, when you take a closer look, you see that it still has problems. For example, strncpy() still doesn't address the problem of dest or src being null, and if you lie to it about the character count, things can get ugly fast. Let’s look at some code to illustrate its usage:
void baz(const char* inp)
{
char buf[25];
//always check the validity of your inputs
if(inp == NULL)
{
assert(false);
printf("Yuck! You're passing a null pointer!\n");
return;
}
strncpy(buf, inp, sizeof(buf)-1);
//you always have to remember to null terminate
buf[sizeof(buf)-1] = ‘\0’;
//do more processing
return;
}
On the face of things, this function looks better. You don’t have to determine whether inp is too long, and you won’t overwrite the buffer—you'll just write one less byte than the buffer can hold. It also has the advantage of dealing properly with the case where inp isn’t null-terminated. Some people will argue that you should always use this function and never use strcpy(), but strncpy() has a few catches.
First, you have an additional step of ensuring that your buffer is null-terminated. Many programmers don’t read the fine print and forget this important step. If inp is too long, the function won't write the string's terminating null character in the buffer. If the function does write the terminating null character, you've just wasted an instruction.
Second, you need to consider the return of this function—all it gives you is a pointer to the destination string, and it doesn't reserve a value for an error. Imagine you've decided that if inp is longer than what can fit into the buffer, inp is junk and you should return an error (which I recommend in most cases). Using strncpy(), you can't easily determine this error, although I've seen various tricks that work, such as
//do this first
buf[sizeof(buf)-1] = '\0';
//tell strncpy that it can write into the whole buffer
strncpy(buf, inp, sizeof(buf));
//if the string was too long, this will be overwritten
if(buf[sizeof(buf)-1] != '\0')
{
printf("Inp string too long!\n");
return;
}
With this modification, you can armor the string handler against everything except some fairly unusual pointer errors. If you think this function seems like a lot of work to get a few characters safely into a buffer, you’re absolutely right.
_snprintf() to the Rescue
The third library call, _snprintf(), makes a lot of the code we've been examining easier to write and less error-prone. _snprintf() is defined as
int _snprintf( char *buffer, size_t count, const char *format [, argument] ... );
_snprintf() is also more versatile than the other two library calls, and you can do a lot of otherwise tricky string handling here. For example,
void foobar(const char* inp)
{
char buf[25];
//check for illegal inputs
if(inp == NULL)
{
assert(false);
printf("Yuck! You're passing a null pointer!\n");
return;
}
if(_snprintf(buf, sizeof(buf)-1, "%s", inp) < 0)
{
printf("Input string too long!\n");
return;
}
else
{
//always null terminate
buf[sizeof(buf)-1] = '\0';
}
//do more processing
return;
}
Note that you still have to determine whether inp is a valid pointer, and you always have to remember to handle the case where inp is exactly the size that you can place into buf and use sizeof(buf)-1, not the entire size of buf. I find this code a lot easier to read and understand, a fact that other programmers who have to work on your code will appreciate.
However, none of this is free. _snprintf() is more versatile (e.g., you can use it to convert Unicode to and from single-byte), but it comes with more overhead. For example, if performance is extremely critical, such as in an embedded system, you might not want to use _snprintf(). Another problem with this library call is that it isn’t ANSI standard; as a result, the implementation varies between Windows-based and UNIX-based platforms. If portability is a concern, this problem can be sticky because not all UNIX (or Linux) systems offer this function, and those that do implement it in different ways. Some implementations return the number of bytes that you need in your buffer if an error occurs, and some implementations always null-terminate. If portability is a concern, verify how every OS you’ll support deals with this concern, and consider wrapping it. When you wrap a function, you create a function that behaves the same to the outside world, but hides the differences between OSs. For example,
int My_snprintf( char *buffer, size_t count, const char *format [, argument] ... )
{
#ifdef WIN32
Do things the Windows way
#else
Do things the UNIX way
#endif
}
When you compile this code under NT or UNIX, it works as it should—the rest of the application doesn’t have to include the #ifdef stuff everywhere we need to do the same thing. In this case, we’d create a My_sprintf(), which is actually quite difficult because of the variable number of arguments to _snprintf.
STL and String
As it turns out, C++ and the STL are a great help because under the new ANSI C++ specification, string is a standard data type and many common jobs related to strings have well-implemented methods that help. Let’s look at the code:
void barbaz(const char* inp)
{
string str;
//check for illegal inputs
if(inp == NULL)
{
assert(false);
printf("Yuck! You're passing a null pointer!\n");
return;
}
//this is easy!
str = inp;
//now check to see if it was too long, or had nothing in it
if(str.length() > 25 || str.empty())
{
printf("Input string invalid\n");
return;
}
//do more processing
}
Although this code doesn’t address the case where inp isn’t terminated, the following line will:
str.copy(inp, 26);
How your code handles long strings depends on whether you want to enforce a 25-character limit or just selected this limit because it was convenient.
I’ve shown you some of the perils of string handling, and the compromises you encounter using three C calls and a portion of the STL. Improper string handling frequently results in security problems, and I hope this information will help you avoid letting your code become part of an attack on someone’s computer.