Saturday, February 21, 2009

Building in Quality

Today I'll try to be terse. Why? I don't really know.

Computers have only two things: data and code. Whatever is one, is not the other.

Code is more than just the primary programming language. It is any type of instruction, explicit or otherwise that instructs the computer to do something. Sometimes it's not obvious. Configuration files, for example, are an implicit programming language to load bits of embedded data into a running program. The syntax wraps the data and tells it how to load.

Shell scripts, document layouts and micro-languages are more examples. Including design, development, testing, packaging and deployment there are a huge variety of different coding types involved in even a trivial system.

Quality is misunderstood. Perfect is the highest quality, from there it goes down past bad. Quality comes from materials and manufacturing. Quality comes from syntax and semantics. Bugs lower quality, but so does bad graphic design or poor usability.

Increasing effort, increases quality. Always. With humans, it is more obvious. If you take your time and do it well, it most often it will be better. If you go back over it a bunch of times, like 'editing', that will help too. With robots it's subtle. The work is done once, and it is either right or a defect. Since nothing in the world is consistently perfect, the robotic work varies somewhat. Quality control weeds out the imperfect items. Better QA, more items are dumped, quality goes up, but so does the cost. Increasing effort (weeding out more), increases quality.

Bigger pieces have smaller errors in between. If you build a watch with 600 pieces, there will be more errors than if you build one with 3. Overall the quality of the three-piece watches will be better. Assembler programs -- a set of smaller abstractions -- has larger bugs than Java. Corrupt stack, bad pointers and leaking from memory management are common. As the abstraction gets larger, the size of the bug gets intrinsically smaller.

Abstract programming is trying to build bigger pieces out of the original pieces from the language. The "abstraction" is a bigger self-contained piece. A bunch of them will increase the quality of the system.

Brute force programming, by pounding out all of the domain level code, as non-abstractly as possible, will be tied to the intrinsic level of quality associated with the language. With way more code, it will also have way more bugs, and be way more inconsistent. More code, and smaller pieces is always bad.

The more you use something, the more it's flaws will become obvious. If you write some code that is used only once in the system, it will be harder to find the bugs. If it is used hundreds of times, it is easier.

If code is leveraged, it is implicitly tested. We know strcat works correctly in C, because it's used in millions (billions?) of lines of code.

Twice as much code is more than twice as much work. It doesn't matter what type of code it is, it has inter-dependencies that make it increase in a non-linear way. Duplicated code, even if it is different types of code, are still duplicated. In fact duplicate, but different types of code, is worse.

The significant bugs in a system come from the gaps between pieces, not from the abstract pieces themselves. If you've decomposed your system into fifty consistent 'things' that need to be written, the things themselves are well-defined (and easily provable). It's the space between them that has most of the bugs (and all of the really bad bugs). Techniques that heavily focus on inner-piece quality (for example, unit-testing) focus on the easier of the two problems. The significant errors come from what is in between the units, and only show up during integration. Testing at higher levels is more effective.

A QA department that always receives good quality code, will never get tested itself. Will eventually let a bad bug through. Will be a waste of money. Testing, after all, is a less rigorous form of code. Manual tests are still programming, even if the 'machine' isn't, and the execution is more hopeful than deterministic. Can't tell if testing is working, if no results produced. Testing that produced no bugs is a waste (or defective).

All programs have bugs and interface problems. All programs will always have bugs and interface problems. Short of mathematics, nothing in this world can be perfect (by definition). Plan on it. Increased quality is good, but better support is way better.

Increasing the size of the pieces, is the only reliable way to decrease the amount and size of the bugs (increase the quality). Programs are usually too big to benefit from 'editing' (code reviews or recoding twice with test files).

Some of the pieces should grow, and some should not. If you've expressed the domain problems in terms of underlying pieces, some of them form a consistent level of abstraction, and thus are fixed. Some of them form knowledge chunks, and as they grow larger, the upper layer can be simplified (and thus indirectly made to be of more quality).

The cost of programming a line of code in a system is similar to pricing a financial bond. It's the initial time it took to create, plus all of the times that the line needs to be revisited over the years. If the line has other dependencies, they must be revisited too. Thus the actual cost can only be estimated as a series of code-visits through-out some extended period of time. The series ends when the line is finally deleted. You do not know the full cost (accurately) until the code is no longer in service, although you could estimate out the flows over a period of time.

The more you visit the code, the more expensive it is. It's cheapest if you could just create it, then get it right into production. No fuss, no muss.

Fixing the quality of the code, doesn't fix the quality of the design. Just because there are less bugs, doesn't mean the program sucks any less. A good program with a couple of bugs, is better than a badly design one.

Users know what their job is, not what they want from a computer. If they had the answers, they'd be software developers. Software developers are the experts at turning vague notions into data and explicit functionality. The users should layout a rough direction, but its up to the developers to make it consistent and usable.

Listening -- too much -- to the "stakeholders" is the same as "design by committee", and will guarantee that the system is poor. An expert software developer is someone who can take (with a grain of salt) all sides, massage them, and produce a consistent design that really solves the problems (or at least increasingly automates them).

Pushing all of the choices back to the users is a way of mitigating risk only. If they told you to make it red, then you can't be blamed for making it red, can you? if you stop caring about blame, then you might endeavor to find out 'why' they want it to be red.

Shorter development iterations increase cost but mitigate risk. If you develop the pieces a bit at a time, then if you're constantly checking that you've not gone off the rails, when you do, there is less work to unroll. Less time that is lost, but way more cost. Short iterations are a form of insurance (and depending on the ability to absorb risk, are more or less vital).

The system is 'rapidly changing' only from the developer's perspective. Most users have been doing their job, mostly the same, for years. More or less, they don't change that much, that often. Most of the changes experienced by software developers, and there are always a lot, come from not really understanding the domain to begin with. Developers rush to judgment, and often make huge mistakes, compounded by a stubbornness to not want to admit it.

The ability to implement rapid changes inspires frequent changes. It's not good. If you make it easy to change things, people will make more random choices. If they don't need to think hard about it, they won't. People will always follow the easiest path. Mainframes are the most successful software platform, because they are the slowest and hardest to change.

There is more artificial complex in most computer systems, then there is real complexity (domain or technical). Mostly, in the current state of the industry, we've been responsible for shooting ourselves in the foot an incredibly huge number of times. One could easily guess that there is some equivalent system that is at least one tenth of the size of most existing systems. That is, 90% of most systems comes from the ever increasing mountains of artificial complexity that we keep adding to our implementations. Mostly it's unnecessary. And it's origin may be so deep in the technical foundation, that it is impossible to remove or fix (but its still artificial).

Solving technical problems is easier than solving domain ones. That's why areas like OpenSource focus their efforts on technical problems (it's also more fun).

New technical solutions require prototyping. New domain ones do not, but they may be a good reason to increase the size of some of the underlying pieces. Code is just code, unless you've never tried that technical approach before, but if you have it's easy (and estimatable).

And finally. Most of the quality issues that a regular programmer can solve come from consistency, effort and self-discipline. Increasing quality for most code, comes directly from just "editing" one's own work. The more you edit, the nicer it will be. Once you've gotten past that, abstraction is the next big leap into better quality. Simplified, six normal form designs, have intrinsically better quality built right into their construction. They are harder to build, and slower, but they are cheaper overall. Quality at an industry level can be improved, but it will take a new generation of higher level abstractions to get there. Languages and tools that hide "things" like threads and data-types for example. Programming paradigms that don't depend on non-isomorphic mappings to reality, but offer syntax closer to the real domain specific problem descriptions. A new way of thinking about the problems, one that reflects our understanding of the sheer size and scale of coding.

OK, I do know why. But it would take too much space to explain it, and way too long to edit it into something with enough quality to be readable :-)

4 comments:

  1. It sounds like you blame individual developers and their tools for quality problems. I don't think that is valid, any more than it is valid to blame a worker in a manufacturing plant for systemic poor quality.

    When things go wrong, the cause is almost always inherent in the system. The problem is one of process and process-improvement, not individuals or tools.

    ReplyDelete
  2. Hi Steve,

    Thanks for the comments. I was about to deny that I blame the developers, but the truth is that an awful lot of the WTF circumstances come directly from the programmers themselves. I really think that if the tools are way better, the problems with the developers would lessen. The tools are key.

    A messy Assembler programmer is always going to be worst than a messy Java programmer. The former can create workable programs that are significantly more complex to understand.

    The converse is also true: a gifted Java programmer can accomplish way more than a gifted Assembler one, it's just a practical matter of having to do way less work (at the same level of abstraction).

    If you use higher level tools (Java vs. Assembler), you need less knowledge, can create more, and the bugs will actually be smaller.

    What we really need is to go up a few more levels.

    A good process, by itself won't get you any better quality than the tools (and the people) will allow. Unruly people wielding bad tools have can only accomplish so much. Process can polish the system, but you have to start first with the tools, then add trained people and finally direct them to utilize their environment.

    Paul.

    ReplyDelete
  3. I guess what I'm saying is that your diagnosis is correct, but the prescription of "train people better" and "wait for better tools" is not helpful, because it is not actionable by you or me.

    You can train people all you like - they will still make mistakes. You can give them more powerful tools, but if the tool still makes it easy to make mistakes, then they will - just different ones.

    I would prefer to "mistake-proof the process" and "continually improve". These things pro-actively *cause* new, better tools to be created, and they minimize the need for so much training (think old-style user manual vs non-existent-because-its-so-usable website manual).

    The diagnosis is the same, but the "process" twist on the thinking leads to a way forward.

    ReplyDelete
  4. Hi Steve,

    There are definitely things we can do to improve quality with our current tool set. We just need to take advantage of what we know.

    For instance in code, bigger pieces are better. Separating out the technical problems from the domain ones helps. Splitting the domain problems into two layers (see Code Normal Forms), one thin and one very thick really (really!) helps. The language abstraction may still be low, but that doesn't mean we have to live with it.

    For the process, embedding in 'editing' stages always helps. Before I start new work, I always schedule a couple of weeks for cleanup first. It's a great way to procrastinate on the coding, while doing some nice refactoring. I often keep a list of things to cleanup from the last release.

    In testing, the hardest bugs to find and fix fall between the units. In system testing, that means that tests that look at the interaction between components are more likely to be successful, although they are harder to write. Testing is always too limited, so deploying resources to catch the worst bugs first, is more effective.

    Understanding the basis of quality helps us in working backwards to find useful ways to improve the outcomes.

    Paul.

    ReplyDelete

Thanks for the Feedback!