Staying clean

There are lots of articles on the web about writing clean code. Searching for “writing clean code” in Google produces more than 300 million hits. Even when limiting the search to last year, there were dozens of articles listed. It is safe to say that this is a popular topic. But what is clean code?

Clean code is, arguably, readable code. Robert Martin spends a chapter of his book “Clean Code:  A Handbook of Agile Software Craftsmanship” surveying the opinions of influential (or maybe even seminal) developers on the matter. To describe clean code, these experienced programmers use nouns like readability, clarity, elegance, focused purpose, brevity, efficiency, and expressiveness (Martin, 2009). Out of all those, readability is the word most often used.

Whenever one says readable code, it is in the sense of humans reading it. However, what the heck is readable code? Well, readable code is clean code. Yeah, I know, not funny. The thing is, defining what makes code readable or clean is not trivial. It is a bit like describing what art is. You know it when you see it. Or, more likely, when you don’t see it. In fact, Martin expands this discussion into a whole book. I strongly suggest reading it. A great companion is Martin Fowler’s Refactoring book (Fowler, 2000), which shows ways to identify “bad” code and how to tackle it.

I have been a victim of messy code written by me. I doubt that are any developers that have not experienced coming back to a piece of their own code three or six months after they wrote it and had absolutely no idea of what it does or how it does it. This is a sign of poor code. Clean code should be explicit, self-explanatory, and almost boring in its simplicity.

“Don’t be clever”

Over the years, I have developed several personal rules from principles I’ve read or battles I’ve fought. One of them is not to be clever. I avoid that like the plague. The fireup.pro team puts it nicely:

Don’t try to be too clever! Code can’t be a complex riddle that only its author can solve. Some say that clean code is the one that doesn’t stand out, and it actually should be a bit boring

(Team, 2022)

It feels cool to play acrobatics with a language. However, readability trumps it all. Why? Because when someone else comes back to read the code, the clever bit of logic will become an obstacle. Any time wasted trying to understand code affects the bottom line. And, in the world we live in, the bottom line reigns supreme.

As an example of code that is clever, let’s look at the the quitessential basic strcpy function from the C library taken from the Android source code (Google, 2005):

char *
strcpy(char *to, const char *from)
{
	char *save = to;
	for (; (*to = *from) != '\0'; ++from, ++to);
	return(save);
}

This function has nice names and is very simple. However, it is not very readable. Sure one can spend a couple minutes and eventually discern how the language features are exploited. Still, few could glance at it and divine what it is doing unless they are familiar with it. This is a protracted example since efficiency is paramount in this case, and the function is well-known and understood. But is serves as an example of how clever code can reduce readability.

And here is another version picked from a StackOverflow thread:

char* strcpy(char *a, char *b)
{
   while ( *b++ = *a++ ) {}
   return b;
}

Ironically, this function is easier to read despite using single-character variables and the implied use of return values for an expression (it also reversed the signature, but that is a different story).

My point is not to criticize the implementation of strcpy. That has been done ad-nauseam. What I want to highlight is that cleverness leads to poor readability. With today’s advanced compilers and interpreters, efficiency is rarely improved by clever implementation.

Yet, stay concise

There is, though, a delicate balance between cleverness and succinctness. Consider, for example, list comprehensions in Python. Are they readable? To a person new to the language, a list comprehension may seem weird, but to someone that knows the language, it is as expressive as an explicit loop but much briefer. Below is a snippet from some code I wrote a couple of days ago:

def map_header_text(incoming_sto: PurchaseOrder) -> Optional[List[StoHeaderText]]:
    if not incoming_sto.purchaseOrderHeader.text:
        return None

    return [
        StoHeaderText(
            languageName=text.textLanguageCode,
            longTextId=text.textIdentifier,
            longTextIdDescription=text.textDetails,
        )
        for text in incoming_sto.purchaseOrderHeader.text
    ]

The function returns a list of objects of type StoHeaderText that map certain values from a list of similar objects. This function uses a list comprehension. Now compare with the loop version:

def map_header_text(incoming_sto: PurchaseOrder) -> Optional[List[StoHeaderText]]:
    if not incoming_sto.purchaseOrderHeader.text:
        return None

    headers = []
    for text in incoming_sto.purchaseOrderHeader.text:
        header = StoHeaderText(
            languageNm=text.textLanguageCode,
            longTextId=text.textIdentifier,
            longTextIdDesc=text.textDetails,
        )
        headers.append(header)
        
    return headers

The difference is slight. A few more lines to create the list, generate each object, and append it to the list. Which one is more readable? I’d argue that the extra lines in the second version do not provide additional information. They are boilerplate, while the first version is just as readable but more concise. Could the first version be characterized as clever? I’d say no because it uses a language construct precisely as it was designed to be used. But I will leave it up to you to decide.

All this is to say that writing clean code is not easy. It takes effort and a lot of thinking. Just deciding on how to name things can be a challenge. Especially as the code base grows and you have an explosion of entities. Other aspects of clean code, like coupling, cohesion, expressiveness, etc., also require careful consideration.

How to Stay Clean

Because of how hard it is, we will inevitably add blemishes even when we, as developers, intend to write clean code. It is human nature to be inconsistent. Or we may take a shortcut here and there, thinking we will return and fix it later. Yet later rarely comes. This gradual deterioration of the code as it grows usually is called rot. Code can rot to the point that it becomes an impenetrable, brittle mess. And the only way to avoid it is to be proactive in grooming it, pickling out the nits as frequently as possible.

This is where code refactoring comes in. As Fowler puts it:  “Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence when you refactor you are improving the design of the code after it has been written.” (Fowler, 2000) Thus, the key to clean code is relentless refactoring.

Note, though, that a crucial aspect of refactoring is that the code does not alter its external behavior. From the outside, the software’s functionality remains identical, even if the innards are thoroughly shuffled. But how to guarantee that the program behaves the same after altering it? By testing it! This means that before doing any refactoring, we need to have a suite of tests that let us characterize the system’s behavior. These can be automated tests like unit tests, integration tests, end-to-end tests, or manual tests, where a user verifies the expected functionality.

The implication here is that since software rot is inevitable, implementing tests is as important as implementing the software itself. The tests are the scaffolding enabling the developers to aggressively groom the code to remove any rot. With a battery of tests, developers can be confident that any regressions caused by refactoring will be discovered quickly. Having a suite of tests is liberating. I know. I’ve been there.

Nonetheless, this is one of the many aspects of my career where I need to improve. Although I understand the importance of writing tests, I find it tedious and difficult. In fact, most developers I know share the same feeling. What is worse, spending time writing tests can, at times, be a hard sell to management. Yet, tests are essential for the long-term success of any project. It behooves me to champion them and make an effort to always write tests for the code I produce.

References

Fowler, M. (2000). Refactoring: improving the design of existing code. Addison-Wesley.

Libc/string/strcpy.c – platform/bionic – git at google. Google Git. (2005). Retrieved January 23, 2023, from https://android.googlesource.com/platform/bionic/+/ics-mr0/libc/string/strcpy.c

Martin, R. C. (2009). Clean code: A Handbook of Agile Software Craftsmanship. Prentice Hall.

Team, F. (2022, August 10). Clean code – how to code like A pro? Clean Code – How to Code like a Pro? Retrieved January 23, 2023, from https://fireup.pro/blog/how-to-keep-your-code-clean

Is it plugged in?

I love coding. It is fun and exciting. Even though I have been doing it for quite a while, I still get a kick every time a program runs as it should. Along with the rush, there is a bonus surprise when a piece of code runs at it should on the first try. It is surprising because, more often than not, there will be some debugging to do before a program runs correctly. As one gets more experienced with a specific language and programming in general, the number of initial bugs dwindles. On the other hand, and somewhat ironically, their severity or subtlety, or both, increases. Debugging is less fun than programming. Mainly because it can be frustrating, time-consuming, and sometimes downright hellish. But it does not need to feel like an exercise in futility.

Do what I mean, not what I say!

There are two types of errors that can cause bugs: syntax errors and semantic errors. Syntax errors are usually the easiest to find and fix. Syntax errors occur when the rules of a computer language are violated. For example, a keyword is misspelled, or a symbol is not in the right place. Modern compilers and interpreters are very helpful in pinpointing the location of those errors. Even C++ template preprocessors, infamous for spewing impenetrable error messages, have become much more precise and helpful.

Examples of syntax errors include omitting a closing brace or not ending a statement with a semicolon in a language like C, C++, or Java. Or not indenting the code inside a function in Python. Or misspelling a keyword, for example, deter instead of defer in Go.

Semantic errors are a different story. A semantic error happens when the program’s meaning, or actual logic, is other than what the programmer had in mind. The syntax is technically correct because the program runs, but the behavior is incorrect. For example, the program may produce the wrong result or crash with a runtime error.

Some semantic errors are caused by poorly constructed proper syntax. Take a look at the following line:

int i = 0;
while (i++ < 10);
    print i;

This is a very contrived example, but more subtle variations are classic. The syntax is correct, but the code will not do what the programmer intends. Instead of printing 0 through 9, the program will print 10. Furthermore, the indenting will help bias the mind into thinking the loop is correctly coded. That semicolon at the end of the while statement might as well be invisible. There are traps like this in just about any language.

Other semantic errors include incorrect pointer usage in languages like C/C++, not handling exceptions, or silently trapping them. In general, unexpected or unforeseen events can be sources of semantic errors, for example, using uninitialized variables, unexpected changes to variables, unexpected side effects, or unaccounted-for external influences.

Of course, incorrect program behavior may be due to a design flaw or a misunderstood requirement. In this case, the syntax and the semantics are correct, but the design itself is incorrect. This type of error is sometimes easier to flag down. However, sometimes the wrong behavior may not surface until a particular set of circumstances is true. This kind of problem can often be caught by well-crafted acceptance criteria. But that is a subject for a different blog post.

Squashing the bugs

When one hears the term bug, it refers to a semantic error in the vast majority of cases. They range from easy to spot or incredibly hard to find. Regardless of how hard they can be, some basic principles are always applicable.

Start with the basics

As the title of this post implies, a great starting point is verifying that the basics are still valid. If you try to turn on the TV and it remains dark, a basic test is to check if it is plugged in. If it is, is the outlet working? It may be worth plugging in something else to verify. Other essential tests would be to check that there is power in the house and that the circuit breaker is not tripped. These tests tend to give a general insight into where the problem could be.

For a program, the basics could include making sure you are compiling the correct units. That the program starts executing. That the main elements of the program are working. A great help for this is using logs or printing statements that leave a crumb trail of the general flow of the program. Other things to look for are whether the correct inputs are being supplied or whether there is enough memory to run the code. Checking the versions of libraries and tools is also important if the development environment or language has that type of ecosystem (for example, Java, node, or Python).

Check your assumptions

This principle, somewhat related to the prior one, can also be stated as “keep an open mind” or “beware of your biases.” Sometimes it is hard to see, but we all develop biases or assumptions about everything. When developing code, we have a built context of what we have implemented, and it may be tough to step outside of that mind frame. This is one reason to have peer code reviews and separate QA teams.

Once the basics have been verified, step outside your mindset and look at the system with fresh eyes. Are your test inputs crafted so that they avoid triggering possible bad behaviors? Is the code lax on error checking because you assume a particular situation “could never happen?” Are you assuming a global variable must have a specific value at a particular point in the execution? Or that a pointer is still valid, or is pointing to the correct value or object?

Recognizing your own assumptions or biases is an acquired skill. It takes practice. An excellent way to help elucidate them is to write unit tests since they tend to force you to think of what can break the code.

Divide and conquer

The purpose of verifying the basics and checking your assumptions is to separate areas of the program or system that are working correctly from those that are not. The distinction is not always clearly demarcated. When systems are complex, it may be hard to tease out the troublesome areas. The key is being consistent. Start from the general and work your way to the specific. The method is similar to the binary search algorithm. If you have a sorted list of values, you can split it and find which half is likely to have the value, then split that half and check again, and so on. When looking for a bug, we want to find seams in the code that can divide the areas that work reliably from those that don’t, then focus on the faulty ones and find again which subparts work and which don’t. The idea is to continue doing that until the problem is found.

Be scientific

You may have noticed that I favor a systematic approach to troubleshooting. In my experience, this is key. For example, when running experiments to understand what the program is doing, change one thing at a time. Stop, think about what is happening, develop a hypothesis, and run a test to verify it. Takes notes. Repeat. Otherwise, it is easy to miss something or get lost and waste lots of time going down blind alleys and backtracking.

I have seen developers use a shotgun approach, making changes everywhere and seeing what happens. In the end, if they are lucky to fix the problem, they have no idea what the cause was. Doing that invites disaster down the road either because they may have introduced other (possibly subtler) errors or because they only masked the bad behavior.

It comes with the territory

Bugs happen. We cannot escape it. Computers are extremely literal, while we are fussy and prone to figurative thinking. Inevitably, we will be tripped by this chasm. Many times. Over and over. Introducing bugs and then finding and eliminating them is part and parcel of the software development trade. Thus we might as well develop the skills necessary to deal with them. Even though the hunt can be frustrating, finding the root of a problem and fixing it can also be a source of excitement and a confidence boost. Similar to the rush we feel when a program works correctly on the first run.

Say it!

Today, most software systems are so complex that no single person can fully comprehend them. Whether we like it or not, the norm is to have teams of software developers. No one can disappear into their corner and emerge with a complete system implemented a few weeks later. Work is divided into small units, like, for example, stories, with narrowly defined scope and deliverables. Often, these units are interdependent, so the work of one developer is usually intermingled with that of other team members.

It doesn’t take a giant leap to see that coordination is essential. Many times, developers will work on the same code base and even on the same file. Hence, it is easy to trample over someone else’s code if one is not careful. Version control systems assist to some extent in avoiding this situation, but that’s not enough.

The norm in software development is working in teams.

The role of communication

Tools can help, but communication is one of the critical elements of successful teamwork. Communication needs to happen early and often. That’s the only way to ensure everyone is aware of what is happening at all times. This is a common trait of teams with critical missions. We can see this in the military, team sports, and even video games. There should be constant updates and lots of back and forth about what every member is doing, what their status is, and what problems they may be facing.

In the realm of software engineering, agile methodologies incorporate frequent communication; for example, daily standup meetings are a common practice. Nonetheless, communication remains important whether a team is following agile or not.

The fear of failure

It isn’t hard to grasp the importance of communication when things are going well. However, when problems arise, things become dicey. Suppose we find ourselves stuck, or maybe we have no idea how to solve a problem, or don’t even understand what we are dealing with. In that case, the tendency is to shy away from telling others about it. Some say they prefer to research until they know enough to ask intelligent questions. Or they will try solving the problem until they present what they have tried. In my experience (including when I find myself in the same situation), these are just excuses to delay having to “confess.” There is a fear of appearing inept or ignorant. While understandable, that is a serious mistake.

I know this because I used to do that. Whenever I faced an engineering problem, I would double down and try to find the solution and fix it on my own. I would not tell anyone because I did not want to seem “less able.” It sounds like a good idea, right? You go to your corner and work furiously on something for hours, days, and even weeks. Then you emerge from the cave with the solved problem, very proud of your accomplishment, very aware of how hard it was to produce your masterpiece. You tell your boss or teammates about your adventure, expecting recognition and praise.

Unfortunately, what is most likely to happen is that your workmates and, worse, your boss will wonder why it took so long for you to produce your deliverable. Why? Are they jerks who think they could do better? No, not really. The problem is that they were not aware of your battle. To everyone, you were working on your task without hitches. They never heard of any issues or setbacks. So, logically, everything you did was within your expertise. You just took too long to finish, probably even blocking others or impacting deadlines.

Say it. Say it early. Say it often.

Here’s the thing. As we saw earlier, software systems are so complex, and there are so many areas of knowledge and technologies that no one can master them all. The hero, the rockstar developer who knows it all, is more of a myth.

Over time, I learned that being candid about what I don’t know and what is giving me trouble actually works in my favor. First, people with more knowledge are usually happy to share it. Bouncing ideas and questions with others can be faster and more satisfying than spending hours googling or running into blind alleys.

Second, letting others know the challenges you are facing gives them an idea of what you are working on and, more importantly, what you are doing to solve the problem. They will understand that you are busy, even if progress is slow. Chances are that some will have valuable ideas and suggestions or know the solution outright.

Third, the boss or manager will know what is going on. Managers do not like to be in the dark and do not like surprises. By communicating often with them, they will learn the status of the project: its progress, risks, and obstacles that may not have been considered during planning. Armed with this knowledge, they can re-prioritize, drop, or move out tasks.

My suggestion is then to communicate. Be proactive. If no one is asking how’s it going, take the initiative and volunteer your status report. Be frank. Report successes and failures. When faced with challenges, be quick to report them as well as what you are doing or plan to do to overcome them.

(All images are properly licensed.)

It’s Always an Iterative Process

My name is David Mora, and I am a student at the School of Engineering and
Computer Science at OSU. I am also a software engineer by trade for about 25
years. So, yes, I am old compared to the general student population at the
University.

People often ask me why I am trying to get a degree in a field where I already have an established career. I am not entirely sure, but there are several reasons. For one, when I was young, right after high school, I went to college to study electrical engineering but dropped it when I was a junior. I had better things to do, apparently. Sigh. This is one big regret and part of why I am now back in school. Another reason is that I learned most of what I know about software development by reading or experimentation, but I could sometimes tell I lacked fundamentals. Finally, I love learning. Of course, university courses are more demanding than watching YouTube videos or reading tutorials, but the instruction is often much more solid.

At any rate, experience has taught me that learning is an iterative process. New technologies or fields can be confusing or even overwhelming at first, but gradually it all falls in place, and things become clear. Same with ideas, work, expertise, etc. Thus the name of my blog. Eventual consistency, in distributed computing, refers to the idea that when information is received and replicated by several nodes in a network, some nodes could be outdated for a while but, eventually, all of them converge1. Thus the network tolerates a state of temporary “confusion” or conflict because it knows that it takes time for it, as a whole, to digest updates. The same happens, at least in my brain; often, new information causes temporary confusion or fuzziness, but I have learned to embrace it and let it simmer until it is digested.

Likewise, software engineering is an iterative process. There are lots of uncertainties, conflicts, and surprises. The key to success, in my opinion, is to embrace that chaos and work on constantly improving it, keeping track of the changes required and the knowledge acquired until it converges into order. That is part of what makes engineering fun, problem-solving.

And this is going to be my focus in this capstone blog: to highlight, every week, areas where our project has encountered difficulties (there are always difficulties), and how we overcame them, what we learned, or how we worked around them.

References

  1. Eventual consistency. (2022, September 19). In Wikipedia. https://en.wikipedia.org/wiki/Eventual_consistency