Software Engineering

by Greg Alexander

Abstract

This is meant as a treatise on the software engineering process, common mistakes, and solutions to common problems. You will get more out of this if you accept that the author is smarter than you. He's made these mistakes already. You don't have to. Not again!

The Process

There is a basic process of every software project, small or big. Probably many non-technical endeavours involve similar processes. If you are involved in one of those endeavours, kudos. I'm not writing this for you, you've probably figured this out by now. Somehow the fact that programs don't actually exist have lead programmers to ignore basic engineering facts.

So without further ado.

Stage 1 - Grand Design

All projects start out as a twinkle in someone's eye. The goal of stage 1 is to come up with an idea of how a computer could do something useful. This stage can be performed in the shower, or it can involve committees and charts. But the goal here is not cleanliness or completeness or any of the tedious ideals that drive the code itself. At this stage you should not write any code (except maybe a feasibility study of some sort if you're that kind of guy). Your ideal product will be a presentation of some sort that will make your intended user base go "gee whiz! that would be cool!"

This is entirely a mission-focused stage. Your goals should be somewhat timeless. Don't tie yourself to protocols or operating systems or whatever. You do not want to make the next great UDP/IP-enabled x86-platform nVidia-accelerated game. You want to make a great game in which the lead character slaughters fairies and can join with other fairie-haters around the world using some sort of appropriate technology. Leave implementation details to later stages. No need to uselessly tie yourself to any particularities until AFTER you've elicited a "gee whiz" response from someone. It would just distract you from the creative process.

Obviously these sorts of rules don't apply for computer science research projects. Many of my personal projects turn out that way, where the fundamental technology I'm using in the implementation is the entire goal of the project. This is great for learning, but unless you are actually performing research into the technology itself, there's really no reason to tie yourself to it at this point.

XXX - add an example grand design here

Stage 2 - Implementation Decisions

Now that you've got a goal, you need to pick from the tools available to you. You need to make choices about hardware/OS platform, programming language, protocol compatibility, programming techniques.

At this stage you might start laying out code or drawing detailed charts. When I'm programming in C, I often write out the .h files that contain my data structures and some basic access functions. I'll start to worry about how data is being moved around. C++ programmers may lay out classes with a rough description of how they relate to eachother, and a bunch of empty placeholder functions. What network functionality do I need? What filesystem functionality do I need?

You should never be writing actual do-something code at stage 2 though. At the end of stage 2 you should be able to envision, at least in your head, how the entire program will operate when it is finished. Code you write before you have that vision will likely not integrate well into the whole. There is a very subtle art to being able to tell when you have completed stage 2 and are ready to move on. Too soon and you risk writing a large volume of totally unusable code. Too late and you waste time second-guessing yourself.

This is the stage where the expensive mistakes are made. This is the stage where a latecomer to the digital revolution might wind up writing a ten million line banking program in COBOL in the late 1980s. This also tends to be the stage where really great programmers shine. It gives the artists that blank canvas feeling. This stage is a favorite because you aren't generally encumbered by previous mistakes.

XXX - example of maybe 30 lines of C code

Stage 3 - Core Implementation

This is the stage where code that actually does anything starts to appear en masse. Since all of the decisions have been made, this is almost entirely a matter of converting your human language ideals for how the program ought to work into computer code. Most of us learn rather young that we can emit stage 3 code at between 500 and 2000 lines per day, often with only shallow bugs. If you are having a hard time, you probably didn't finish stage 2.

Since it's essentially a translation task, the majority of your time may be spent spelling out idioms. For example, most of my programs about 10% of the stage 3 work is, in one way or another, accessing link lists. Lots of people think that OOP will save you from this implementation nightmare. That's hogwash. Remember, you can spit this code out at up to 2000 lines a day without difficulty. The average productivity for an American programmer is about 2 lines a day. Clearly, this high-volume stage is simply not a significant part of the process, even though it might feel like it at the time. You might save 10% of 10% of the development process by using C++ if all you're using it for is simpler idioms (i.e., STL). Is that 1% even worth it when you consider the tradeoffs? Hopefully you decided this for yourself back in stage 2.

XXX - example with maybe 1000-2000 lines of C code.

Stage 4 - User feedback

Have you ever heard that the last 10% of the work takes 90% of the time? That's because of stage 4. Here's where you run your program and find out that your careful consideration of every contingency doesn't even come close to measuring up to the most rudimentary requirements.

User-suggested changes mostly revolve around bugs or features. The user perception here is a huge variable that the successful contractor learns to control. Your user needs to know which changes are trivial and which are complex before he can finalize his requirements. Poor communication skills on either side can cause many useless iterations of back-and-forth "that's not what I wanted at all." This can be killer if your user is impatient or has inappropriate expectations.

If the customer is not terribly technically skilled, odds are this stage is the first sign the customer has that you are an incompetent hack. This is a major problem, as it results in stages 2 and 3 of many important projects being implemented completely by incompetent hacks. This is the key factor supporting off-shoring of software development. Bizzarrely most Microsoft products have been at stage 4 for years, yet customers still seem to believe the stage 1 vision.

How much work you face as a stage 4 programmer almost entirely depends on how wise the stage 2 programmers were. If they made a framework that anticipates all of the user changes, adding new features is as easy as pie. It also depends, of course, on your knowledge of the code. If you are not familiar with the entire stage 2 design then you will not be able to recognize the simple changes from the hard ones.

XXX - example irate customer email full of details, and then the source code change which implements the details but does not pass them on in a way future failures^H^H^H^H^H^H^Hreimplementers will ever be able to understand

The Rule

The process as I've described it so far will get you to version 1.0, but what about version 2.0? Odds are by the time you are done with version 1.0, you have a list of important changes that didn't make it. Maybe they were difficult to implement, or maybe you just didn't have time to even look at them. Maybe they'll require a complete revisit to stage 2. Now you have some decisions to make.

To help you make these decisions, I can offer one rule: If you revisit a stage, you will have to revisit every following stage.

Let me repeat that for the raw beginner: If you revisit stage 2, you will need to revisit not only stage 3, but also stage 4.

One more time: You cannot change the architecture of the program without needlessly redoing the 10% of the work that took 90% of the time.

What generally happens is eventually some programmer decides that all of the programmers to precede him were incompetent. He is probably right, but perhaps his lust for the blank canvas simply got the better of him. So he decides to make a major stage 2 change. Perhaps he switches the language/platform. Perhaps he just reimplements, from scratch, all of the core data types of the program without concern for the minute details of the originals.

It doesn't matter. Inevitably the cocky fool makes a stage 2 change after the program has already made it through all four stages once before. From then, you need to proceed to stage 3 - reimplement everything to use your framework. This is easy and can be performed either from scratch or by translating idioms from the original code.

When a programmer sees a huge popular project and says "that's only 3 days' work", he's talking about the stage 2-3 part. And odds are he's right. There's almost no project that a competent programmer can't take from idea (completed stage 1) to functioning prototype (completed stage 3) in less than a month.

But the user doesn't want a functioning prototype. He doesn't want a system that can be used to sell stuff to the highest bidder using the internet, he wants Ebay. Ebay isn't just some thousands of lines of code, it is years of operating practice codified into hundreds of thousands of lines of code. Most of the time when you are brought in to a project to make a stage 4 change, odds are dozens of people have come in and made stage 4 changes before you. If you reinvent stages 2 and 3, you will have to reinvent or at the very least retest every single stage 4 change that came before you.

In real life, you would be lucky to even be able to get a log of every stage 4 change that has been completed. Odds are most of them were never documented and would look like an accident or a coincidence if you were to look at the code.

Even the most technically competent programmer is vulnerable to this problem. You may be ready to do stages 2 and 3 single-handedly overnight and wow management. You may be ready to fix every single bug in the next night and thus complete stage 4. But you aren't ready to deal with the real cost of stage 4: user testing.

When you needlessly force a complete revisiting of every stage 4 issue forever and ever (and that never goes away), you are forcing the customer to retest the program. This means the customer will access functionality that hasn't worked in years and tell you to fix it. This means the customer will whine that all of his quirks and shortcuts (one man's bug is another's feature) don't work anymore. This means the customer will watch over your shoulder as you reimplement this stuff and remind you of all of the version 2.0 changes that they already abandoned but now feel would be worthwhile "while you're editing that file." This means the customer will see your mockup before you put the art in and say "that is so incredibly ugly!" and you won't be able to convince them otherwise even after you reintegrate the original art.

These costs are not just frustrating for the programmer, but they take Real Time. I'm not talking about mythical man months, I'm talking about calendar months disappearing as you wait to hear back from the only guy in the company who uses the control-uparrow shortcut. I'm talking about waiting two months for the graphic artist to come up with the design, and then another two months for management to hire another one. This is Expensive.

Consider the simple stage 4 change of an off-by-one bugfix necessitated by two different programmers using two different indexing techniques. Odds are if this program has existed for very long, most of the off-by-ones where these programmers interacted have already been solved. So you're presented with a choice: go back to stage 2 and redesign the code to use a single uniform indexing technique or track down the remaining place where the two indexing techniques interfere with eachother and add the missing "-1" to the code.

If you go back to stage 2, you eliminate future off-by-ones in this code. Bzzt! Wrong! The code is already written to (mostly) handle the off-by-ones it introduces. If you change the indexing, you have to visit absolutely every single piece of code that uses it and determine how to correct it. If you start from scratch, you'll mess up the finely-tuned functionality. If you try to revisit the code, you'll miss some and foul up others. Go ahead, try it. Then go polish your resume.

Because here's the point. If I am the senior programmer and I catch you needlessly and stubbornly revisiting stage 2 over and over again, I will get you fired. If I am the junior programmer and I catch you needlessly and stubbornly revisiting stage 2 over and over again, I will quit and you will get fired for being late and over budget.

I don't generally go over budget on a project, and it's because I am a competent stage 4 programmer. If I only took on stage 2 projects, I wouldn't be very valuable.

The Application

So now you know the rule: how do you apply it?

It's pretty obvious that the answer is to consider all feature requests to be stage 4 issues only to the greatest extent possible. You may only revisit stage 2 if your change is extraordinarily isolated. You may say "or if ...". You would be wrong. There are no "or ifs." If you revisit stage 2 for anything but an extraordinarily isolated change, you will have to revisit stages 3 and 4 and I will get you fired.

The devil is telling you to reimplement stages 2 and 3 so that you have a framework in which it is easy to implement your stage 4 change. The angel on your other shoulder is telling you to internalize the existing stages 2 and 3 solutions and then use that knowledge to implement your stage 4 change.

Here's a list of some of the factors working against you as a stage 4 programmer:

Remember: you will overcome. And you have one tool at your command that isn't in the beginning stages that you must use: reading. The old code will mostly survive to haunt the programmer who replaces. Accept this. You will not rewrite it all. Crikey! Are you listening?

Take a little break, repeat after me: I will not revisit stage 2 on a completed program. I will not revisit stage 2 on a completed program. I will not revisit stage 2 on a completed program. I will not revisit stage 2 on a completed program. ...

So now that you know, 99.99% of the code that was there will always be there, you realize that learning this code will be tremendously valuable to you. Since deleting it is impossible, the value you gain from learning the code will be with you until you stop working with it. So go learn it. Then figure out how to make the change in less than a hundred lines of code.

You heard me -- only change a hundred lines of code. You may add thousands of lines of freshly-architected code for a new feature, but you must see how to implement this code into the existing framework without changing more than a hundred lines of old code.

If you figure out a way to solve the problem but it involves a major revisit? Read more. Maybe if your eyes have fogged over, fold up a paper airplane and try to get it to fly all the way out your office window. Go jerk off. But do NOT go back to stage 2. That is not the answer you're looking for, remember?

Here is the exercise that taught me this artform.

When I was a wee undergrad I spent a summer as a research assistant for a professor with a rather ambitious compiler-related project. He co-authored some seminal Scheme papers, so it goes without saying I had to maintain Scheme code. I was replacing an Asian grad student who got hired by Industry, so it goes without saying that she had written about twenty thousand lines of Scheme code using entirely un-Scheme-like idioms (for loops, set!, etc.) that mostly worked.

Once I started digging in the code I was immediately torn between respect and loathing. Clearly she was not sufficiently equipped in terms of experience with Scheme or experience with stage 2 programming to have possibly completed this project. And indeed, there were a handful of bugs that were total nonstarters for her. But I couldn't dismiss her that easily, because I know that if I had not had any stage 2 design experience, I would have made the same mistakes and I would not have had the diligence to make it even come this close to working -- I'd have gone back to stage 2 several times trying to get it right. But she stuck with it, small change after small change, until it almost approximated a functioning program. I couldn't believe her tenacity.

At the time I considered the project to be very ambitious, so rewriting it from scratch wasn't the first thing on my mind (for once). As the grad student was leaving she showed me some infelicity in the program that she absolutely couldn't work around. She had done an excellent job of characterizing the problem, though, so when she explained it to me, without much thought I came up with a simple solution (reversing a link list at some stage in recursion). She thought I was brilliant for seeing what she couldn't, but what I learned was that this code is not unusable. It might be a little hacky, but I could totally avoid revisiting stage 2 through a series of ten-line hacks.

So my work process for the rest of the summer fell into a pattern. The professor would come up with some change, I would say "that is impossible," he would explain to me the mathematical ideal of how it would operate, I would grudgingly admit that it is in fact quite simple, then I would spend a day implementing it. The interesting part is that day. I would get in and spend my morning reading the existing code. Then I would fold up some paper airplanes until time for lunch. When I came back from lunch, I'd usually have a pretty good idea how the change would look, so I'd look at the code trying to see exactly how it would fit in, and it wouldn't. So I'd go back to playing with my hair until it came to me, then I'd do the ten lines and go home. The real learning experience here for me was how spending several hours daydreaming allowed me, IN EVERY SINGLE CASE, to find a very trivial change to the existing program that introduced, often, radical new functionality that was never planned for.

So it was that when I began my first "real job" I spent most of my first week manually rendering DOOM scenes on my white board (I worked for a compiler company). It's not that I had some crazy preconception about what sort of shenanigans go on in industry, it's that I had found a technique for working with large pre-existing programs that worked. People think programmers need "off time" to be creative, but we don't remember that the most creativity is needed when you are trying to modify a poorly-designed system.

This is somewhat reminiscent of Worse is Better -- is there no place for excellence in stage 2 design? And my response is no -- of course better is better. But if the only way to achieve stage 2 excellence is to keep on revisiting stage 2 over and over again, well, your better just became worse.

Not that there isn't a role for rewriting the same piece of code over and over until you get it right at all levels. That's a wonderful exercise when you need something to do while you're on vacation or in between jobs (perhaps because I got you fired). But it's just not practical to put the customer through the whole process over again just because you're too impatient to read some code written by stupid people.

Failure

Sometimes a program is in such an awful state that you have no choice but to go back to stage 2 to satisfy user requirements. Perhaps there are ten thousand outstanding bugs, ten million lines of COBOL code, and yet an extremely simple stage 1 design. I suppose it must happen, though I've personally never seen it.

If this happens, remember: they've spent the entire time between the first stage 3 prototype and today working on stage 4 changes, and that burden isn't going to get smaller. So when you deliver an estimate to management, knowing it will take you two days of stage 2 work, and maybe a week of stage 3 work, your estimate must reflect all of the stage 4 work that has gone before. The only effective way to do this is to consider the amount of time it took in stage 4 work before, and double it, because the customer isn't going to accept something that is only as good as what he has now.

What this means is if you work at a bank and there is a ten million line COBOL program you are responsible for that has been tweaked continuously for 30 years, your quote for replacing it will be, at a minimum, 60 years. So I recommend you learn COBOL.

A lot of times the large complex program really is trying to solve a simple problem. But if business practices have grown up about it, you still can't change it. You will have to start a new company if you want to use new technology.

What this means for OOPheads

The entire OOP revolution has focused around reducing stage 3 and 4 implementation time through the use of excessive cleverness in stage 2 and the ensuing reuse of stage 3 units.

Look at the STL library. The idea is to use extremely clever stage 2 design to leverage one stage 3 implementation of your functionality into something reusable in all future stage 3 and stage 4 work. I.e., at stage 2 the library designer wants to design one interface that will work for all uses of all datatypes (like the iterator). Then at stage 3 the library author wants to write one implementation of link list traversal or binary searching and then no one will ever have to write that code again.

But if you've been paying attention to this essay, you now know that 90% of the work is stage 4 work. What we need is a way to rearchitect a program, for a new database methodology or a new operating system or a new programming language, without losing all of the stage 4 details that have been accumulated over the decades. What we need is a way to rewrite the 10 million line COBOL program in a way that doesn't force us to rethink every single time-tested heuristic.

C++ will not help you there.

C++ just gives you more and more incentive to cling to existing stage 2 and stage 3 implementations, without doing anything to make it easier to abandon them. If your change is simple enough that you really can just change the underlying data format and not change any of the code that accesses it, odds are your change is simple enough to be treated that way in any language.