Is it plugged in?

I love coding. It is fun and exciting. Even though I have been doing it for quite a while, I still get a kick every time a program runs as it should. Along with the rush, there is a bonus surprise when a piece of code runs at it should on the first try. It is surprising because, more often than not, there will be some debugging to do before a program runs correctly. As one gets more experienced with a specific language and programming in general, the number of initial bugs dwindles. On the other hand, and somewhat ironically, their severity or subtlety, or both, increases. Debugging is less fun than programming. Mainly because it can be frustrating, time-consuming, and sometimes downright hellish. But it does not need to feel like an exercise in futility.

Do what I mean, not what I say!

There are two types of errors that can cause bugs: syntax errors and semantic errors. Syntax errors are usually the easiest to find and fix. Syntax errors occur when the rules of a computer language are violated. For example, a keyword is misspelled, or a symbol is not in the right place. Modern compilers and interpreters are very helpful in pinpointing the location of those errors. Even C++ template preprocessors, infamous for spewing impenetrable error messages, have become much more precise and helpful.

Examples of syntax errors include omitting a closing brace or not ending a statement with a semicolon in a language like C, C++, or Java. Or not indenting the code inside a function in Python. Or misspelling a keyword, for example, deter instead of defer in Go.

Semantic errors are a different story. A semantic error happens when the program’s meaning, or actual logic, is other than what the programmer had in mind. The syntax is technically correct because the program runs, but the behavior is incorrect. For example, the program may produce the wrong result or crash with a runtime error.

Some semantic errors are caused by poorly constructed proper syntax. Take a look at the following line:

int i = 0;
while (i++ < 10);
    print i;

This is a very contrived example, but more subtle variations are classic. The syntax is correct, but the code will not do what the programmer intends. Instead of printing 0 through 9, the program will print 10. Furthermore, the indenting will help bias the mind into thinking the loop is correctly coded. That semicolon at the end of the while statement might as well be invisible. There are traps like this in just about any language.

Other semantic errors include incorrect pointer usage in languages like C/C++, not handling exceptions, or silently trapping them. In general, unexpected or unforeseen events can be sources of semantic errors, for example, using uninitialized variables, unexpected changes to variables, unexpected side effects, or unaccounted-for external influences.

Of course, incorrect program behavior may be due to a design flaw or a misunderstood requirement. In this case, the syntax and the semantics are correct, but the design itself is incorrect. This type of error is sometimes easier to flag down. However, sometimes the wrong behavior may not surface until a particular set of circumstances is true. This kind of problem can often be caught by well-crafted acceptance criteria. But that is a subject for a different blog post.

Squashing the bugs

When one hears the term bug, it refers to a semantic error in the vast majority of cases. They range from easy to spot or incredibly hard to find. Regardless of how hard they can be, some basic principles are always applicable.

Start with the basics

As the title of this post implies, a great starting point is verifying that the basics are still valid. If you try to turn on the TV and it remains dark, a basic test is to check if it is plugged in. If it is, is the outlet working? It may be worth plugging in something else to verify. Other essential tests would be to check that there is power in the house and that the circuit breaker is not tripped. These tests tend to give a general insight into where the problem could be.

For a program, the basics could include making sure you are compiling the correct units. That the program starts executing. That the main elements of the program are working. A great help for this is using logs or printing statements that leave a crumb trail of the general flow of the program. Other things to look for are whether the correct inputs are being supplied or whether there is enough memory to run the code. Checking the versions of libraries and tools is also important if the development environment or language has that type of ecosystem (for example, Java, node, or Python).

Check your assumptions

This principle, somewhat related to the prior one, can also be stated as “keep an open mind” or “beware of your biases.” Sometimes it is hard to see, but we all develop biases or assumptions about everything. When developing code, we have a built context of what we have implemented, and it may be tough to step outside of that mind frame. This is one reason to have peer code reviews and separate QA teams.

Once the basics have been verified, step outside your mindset and look at the system with fresh eyes. Are your test inputs crafted so that they avoid triggering possible bad behaviors? Is the code lax on error checking because you assume a particular situation “could never happen?” Are you assuming a global variable must have a specific value at a particular point in the execution? Or that a pointer is still valid, or is pointing to the correct value or object?

Recognizing your own assumptions or biases is an acquired skill. It takes practice. An excellent way to help elucidate them is to write unit tests since they tend to force you to think of what can break the code.

Divide and conquer

The purpose of verifying the basics and checking your assumptions is to separate areas of the program or system that are working correctly from those that are not. The distinction is not always clearly demarcated. When systems are complex, it may be hard to tease out the troublesome areas. The key is being consistent. Start from the general and work your way to the specific. The method is similar to the binary search algorithm. If you have a sorted list of values, you can split it and find which half is likely to have the value, then split that half and check again, and so on. When looking for a bug, we want to find seams in the code that can divide the areas that work reliably from those that don’t, then focus on the faulty ones and find again which subparts work and which don’t. The idea is to continue doing that until the problem is found.

Be scientific

You may have noticed that I favor a systematic approach to troubleshooting. In my experience, this is key. For example, when running experiments to understand what the program is doing, change one thing at a time. Stop, think about what is happening, develop a hypothesis, and run a test to verify it. Takes notes. Repeat. Otherwise, it is easy to miss something or get lost and waste lots of time going down blind alleys and backtracking.

I have seen developers use a shotgun approach, making changes everywhere and seeing what happens. In the end, if they are lucky to fix the problem, they have no idea what the cause was. Doing that invites disaster down the road either because they may have introduced other (possibly subtler) errors or because they only masked the bad behavior.

It comes with the territory

Bugs happen. We cannot escape it. Computers are extremely literal, while we are fussy and prone to figurative thinking. Inevitably, we will be tripped by this chasm. Many times. Over and over. Introducing bugs and then finding and eliminating them is part and parcel of the software development trade. Thus we might as well develop the skills necessary to deal with them. Even though the hunt can be frustrating, finding the root of a problem and fixing it can also be a source of excitement and a confidence boost. Similar to the rush we feel when a program works correctly on the first run.

Eventual Consistency

Ruminations on the Software Development Process