Thursday, March 3, 2011

A Review of Error Messages


Error messages, if they're posted at all, should convey helpful information and advice--not only for the user, but also for tech support and maintenance programmers. Here are a few things to think about when coding your error-handling routines and designing your error messages.
Error Message Basics

Error messages are displayed by programs in response to unusual or exceptional conditions that can't be rectified by the program itself. A well-written program should post very few error messages indeed; instead, absolutely whenever possible, the program should cope with the problem gracefully and continue without bothering the customer. By this yardstick, of course, most programs are poorly written.

For the purposes of this discussion, there are two classes of poorly written programs. First, there is the program that can't remedy things on its own, or that needs so much hand-holding that it bothers its customers unnecessarily. Second, and the focus of this discussion, is the kind of program that encounters some real problem, but confuses or offends the customer by providing an inadequate error message.

Of course, the best error message is no error message at all. In the case where something has gone awry, a program should do everything within its power to remedy the situation at hand. For example, a program should never post a dialog saying that a file cannot be found unless the program has actually bothered to look for it. At a minimum, a program (that is to say, a programmer) should search all local hard drives for the missing file. If the program finds the file in an inappropriate place, the program should either update its own records to point to the file, or make a copy of the file in an appropriate place. There should be no need to disturb the customer in either case.

If your program has to post an error message, don't waste the customer's time either before or after the error condition is detected. For example, an installation program should not begin copying files unless it is certain that the files will fit onto the destination disk. A simple set of calculations can determine whether there is adequate disk space, but most programs don't even bother with this basic check. Just as bad, installation programs frequently refuse to proceed, even when already-existing files are going to be overwritten.

Don't depend on the operating system to handle things properly. Amazingly, after almost twenty years in the field, the DOS COPY and XCOPY commands don't bother to check for disk space before the copy starts; instead, they begin copying blindly and hope that the destination disk doesn't fill up before the operation is complete. Windows is no better; like DOS, it stupidly fails to check for sufficient disk space before performing a file copy. Worse, if you are copying a set of files, Windows will stop the process on the first error, will refuse to continue, and will forget your selection.

When you write code, anticipate the error conditions and code around them. Try to fulfill the user's goal to the greatest degree possible, and don't view error conditions as catastrophic unless they are. Remember the program's state at the time that the error occurred, and permit the user to restore that state easily. Always write functions that return status codes, and return a unique error code for each error condition. At the point the status code is returned, there is typically quite a bit of information available that you can relay to people who are going to need to identify and fix the problem. On the other hand, remember that your program's internal errors are not the customer's concern, so don't overload or intimidate the customer. Make it clear that some information is for the customer to act upon, but that other information is there only to help the person that is helping her.
What Does a Good Error Message Look Like?

A well-constructed error message * should identify the program that is posting the error message
* should alert the customer to the specific problem
* should provide some specific indication as to how the problem may be solved
* should suggest where the customer may obtain further help
* should provide extra information to the person who is helping the customer
* should not suggest an action that will fail to solve the problem and thus waste the customer's time
* should not contain information that is unhelpful, redundant, incomplete, or inaccurate
* should provide an identifying code to distinguish it from other, similar messages

A Good Example

One of the best error messages I have ever seen went something like this:

This was an error message from an applicant tracking system (called "Applicant Tracking System") that was designed for a personnel agency by an independent consultant in 1988. The message looked almost, but not quite exactly, as I've rendered it above. A significant difference is that the original message did not have a Windows look and feel, because this message came from a DOS program. I mention this because the author provided this detailed message even in the days of the 640K memory limit. The customers of this system were not experienced with computers, but even if they had been experts, the message would have been helpful.

Let's look at this error message and compare it with the list of requirements above:

* This error message clearly identifies the program from which it is coming. The title bar gets extra points because it identifies the type of error.
* The message says that the program has lost communication with the printer. The message does not say that the program "is unable to print", nor does it say "LPT1: Error", nor some equally vague text relayed from the operating system. Most operating systems provide notoriously terse—and usually poor—error messages. This message is in terms that the customer can understand.
* The message scores top marks for giving the customer constructive steps that are within his power to perform, regardless of his skill level or experience. The program does not offer a vague guess as to what the problem might be. The steps are ordered from simplest to most complicated, and they're also ordered in terms of probability. Part of this is due to luck—the most common problems are not always the easiest to solve.
* The program does not offer a foolish suggestion to the customer that is likely to waste his time. ("Try restarting the application", or worse, "Try re-installing the application".)
* The error message is carefully worded. Each item in the message is worth checking. Nothing is restated pointlessly. There is no attempt to blame another application for the problem. The message is accurate and helpful.
* Best of all, there is specific tech support information right in the message, for the customer, the technician, and the developer. If there is a defect in the code, the error message suggests clearly to the programmer where in the program the error can be found, and the type of error involved. As an added plus, there's the name of—and an invitation from—a real person. Apart from the pleasant feeling that the customer gets from dealing with a person, rather than a corporation, the programmer's name suggests pride in the work.

Ten Rotten Error Messages

Now, by contrast, here are some examples of the very worst kinds of error messages. You'll see that my examples are all from Microsoft software. Microsoft is not the only company that releases software with lousy user interfaces, but it certainly seems to have perfected the art of the irritating error dialog.

Duh. This message states something that is entirely obvious, and fails to state anything at all that is helpful. There is nothing here to remedy the customer's problem or to help him through it. There is no information that would help even an imaginative tech support person to work through some possible solutions with the customer. The developer responsible for maintaining this code--typically not the person who wrote the original program--is not offered even a hint of what the problem is, or the error code returned by the called function. If more than one error condition posts this dialog, there's no way to tell which one caused the problem.

I have no comment on this message.

I have no comment on this message either. Although somehow this looks a little less severe than the last one.



You know more than you're saying, don't you? And by the way-- restarting Outlook will help how, exactly?



Which applications? How will it be incompatible? Why didn't you fix the problem? Thank God it doesn't seem to be incompatible with non- existent applications.



"May" again. Is a component busy or missing, or is it neither? If a component is involved, which component? Is it busy? Or is it missing? And what is a component anyway? A file? If so, could we have the file name please?



Really? Really? Which action? Which action? What should I do to fix the problem? What should I do to fix the problem?



Nope, I don't. I want you to find it.



Still won't look for it, eh? In fact, I've forgotten the context in which I got this message, and so I've forgotten which application is involved. However, I do remember that it was unclear to me even at the time which application needed to be reinstalled.



Why Are Error Messages So Poor, and How Can They Be Improved?

Our systems for teaching programming almost never discuss error messages, or even error handling. How many programming books emphasize the importance of checking return codes from operating system or library functions, and handling errors gracefully? How many source code examples show even minimal error checking or commenting? How many programming books discuss even the most basic user interface issues, such as how to construct a useful error message?

Let's start with what is displayed to the world outside your program. Error messages are often less than helpful or useful because they're written by people who have an intimate knowledge of the program. Those people often fail to recognize that the program will be run by other people, who don't have that knowledge. Thus it is important that you consider the customer's plight carefully when writing error-handling routines; that you involve someone other than yourself with the design and testing of the program; and that you provide each and every error message to someone else for review. The reviewer should not be an expert in the program. Your messages should be detailed and polite. They should not offend or frustrate the customer.

Write and test your program so that it will have to display as few error messages as possible. If your programming language provides debug-build validity checking like the C ASSERT macro, use it; if you have to hand-roll validity checking yourself, do it. Walk through code in the debugger. Include features in the release version of the program, such as log files or verbose modes, to help with troubleshooting. Each condition in the program that has a chance of failure should return a distinct error code, and should display this code as part of the error message. The error code will not only help to narrow a problem down, but is also good internationalization strategy; error codes will form a useful cross- check when the program is translated. Comment each status code as thoroughly as you can to make life easier for the maintenance programmer and for documenters, and use the header to help define a table of error status codes for technical support. Make sure that there is a mechanism to identify missing files, registry entries, and the like. Create error handling classes and functions to supply consistent, well-formatted error messages--and reuse them consistently. Use code review and walkthroughs with other developers and quality assurance to make sure that your program is readable, consistent, maintainable, and free of defects. Provide testers with tools or a test program that will allow them to view all of the error messages displayed by your program.

Façade programming is a useful construction strategy. As the program is being constructed, write skeletons of each function. Until you have the internals of the function coded, simply have the function do nothing and return a positive return code. Define the return codes as symbols—constants or enumerated values. Later, when you begin to flesh out the function (and as you check return values at each stage), define distinct symbolic codes for each type of error.

Programming is, of course, more complicated than ever. There are more technologies, more languages, and more different disciplines to master this week than there were last week. Developers are pressured to design too little, and to code too quickly. Each step of the development process is squeezed so that products can be released as quickly as possible. However, neither programmers nor managers should kid themselves; other parts of the company are not likely to take responsibility for a program that is sent to testing (or worse, to customers) laden with obvious defects and opaque error messages. Developers and development managers must therefore learn to include design and debugging time in planning estimates, and must argue effectively for more time and more help, especially in areas that don't require coding, such as user interaction design.

It's rational to assume that help won't arrive immediately, so walk a mile in the customer's shoes and program defensively. When you're constructing an error message, the important thing to remember is that your message must convey useful information. Useful information saves time. Remember that the message will be read not only by the customer. The message must also be interpreted by the tech support person who handles the call, the quality assurance analyst who helps to track down the problem, and the maintenance programmer who is charged with fixing the problem in the code. Each person in this process represents a cost to your company. What's more, while the error-handling routine need be written only once, the support path is typically followed many times--tens, or hundreds, or thousands of times. Form alliances with technical support, testing, and documentation; ask questions, do the math, and put dollar amounts on what it costs to solve (or sandbag) a problem after the product has been released. Don't forget future lost sales in your calculations. If senior management at your company wants to rush the product to market without leaving you time to code proper error handling, remind management politely of the cost of such a policy.