Monday, October 29, 2007

Repeated Expressions

I'll apologize in advance. This blog entry gets a little weird, but it is the type of conceptual weirdness that comes directly from the underlying problems.

You may have to squint your eyes a bit to get a really fuzzy abstract view of reality in order to understand the basis, but I think that you'll find it is worth the effort.

Jumping right in, we start with the notion that users create piles of information that they beat on with their software tools.

That's not too hard to handle, but if you dig a bit deeper into the idea of how to "express" the tools themselves things can get interesting.

In a very abstract sense, a software design specification for a system defines a bunch of functionality operating on a set of data. Generally it is very imprecise, but it is a way of actually "expressing" a solution in one or more problem domains. The specification, which consists of various design documents with specific syntax represents a "language" capable of defining the capabilities. While it doesn't run directly, and it is generally very sloppy, it does define the absolute boundaries for the problem.

The code itself, once it is created is another way to express the solution to the same problem. It is far more specific and needs to be more correct, although rarely does it actually need to be perfect. Different versions of the system, as they progress over time are different ways of expressing a similar solution. Generally, for each iteration the depth of solution improves, but all of the solutions implemented were acceptable.

Testing -- even though in a philosophical way it is the reverse -- bounds the same problem domain, so it to is yet another way of expressing the same solution. In pencil drawing courses, they often do exercises with "negative space" by drawing the shapes where the subject is not located. Specifying testing is similar; you bounding the problem by the things that it should not do. Testing occurs at many different levels often providing a great degree of overlap in the expressiveness of the solution. Generally layer upon layer of tests are applied, regardless of how effectively they work.

In an ideal world, for a perfectly defined system the specifications, code and tests are all just different, but complete ways of expressing the same solution.

Given that hard to fathom perspective, we essentially solve the same problem at least three times during development, in three different ways. This gives rise to various speculations about what work we really don't need.

For instance, Jack Reeves felt that the code was the design not the specification. That perspective could lead to spending less effort on the initial design; correcting the problems only at the coding level.

Newer practices such as test driven development (TDD) imply that the testing could be used as the design. You layout the functionality by matching it back towards the tests. These are interesting ideas indirectly arising from the repetitive nature of development.

One easily senses that if could we only expressed the design twice instead of three times, we would be saving ourselves labor. If we only expressed it once, that would be amazing. It would seem as if we might be able to save ourselves up to two thirds of the work.

It is an intriguing idea, but highly unlikely for any thing over a medium sized project.

Some individuals can get away with solidifying their designs initially entirely within their consciousness. We know this. There are many programmers out there that for medium to small systems don't need to write it down on paper. Interesting, but we need to remember that such instantiation is just another valid form of expressing the solution, even if the original programmer cannot necessarily share that internal knowledge and understanding.

Even the small systems need at least two distinct versions to exist.

For large projects, the work needs to be distributed over a big group of people so it needs to be documented in a way that is sharable. Without that, all of the developers will go their own way which will always create a mess. As well, all of the testers will make their own assumptions which again will leave huge gaps in the quality of the testing.

Without a common picture chaos will rule and the project will fail.

Even though it sounds like a waste, specifying the same design in three different ways actually helps to insure that the final output is more likely to be correct. In the same way that you could enhance the quality of data entry by having three typists input the same file, iterating out the problem as a design, code and tests gives three distinct answers which can all be used to find the most likely one.

Even though we probably can't reduce the number of times we express the solution, understanding that we are repeating ourselves can lead to more effective ways of handing it.

Ok, so it is not all that weird, but it does put a different perspective on the development tasks.

In the end, we are left with a pile of data, some functionality that needs to be applied to that data and a number of different ways in which we have to express this in order to insure that the final work meets its initial constraints. We put this all together, expressing it in various ways until in the end it becomes a concrete representation of our near idea solution to the specific set of given problems.

Call it what you will, it all comes down to the various ways in which we express our solutions to our problems.

5 comments:

  1. OK, pretty abstract.

    But relevant.

    Here's a thought, though. Redundancy is not the same thing as waste. Redundancy is a way to get reliability and stability.

    So by expressing something three ways, you're less likely to miss important details. Let's say that each of the three passes covers fractions of what's needed: P1, P2, and P3. Let's further assume that these are randomly and evenly distributed, and independent. (Bogus assumptions, to be sure). We'd have the probability of missing some point be (1-P1)*(1-P2)*(1-P3). Three chances to not miss something important.

    I don't mean this to argue against agile methods, or even against reducing the costs associated with redundancy.

    But in addition to recognizing the costs of redundancy, we should recognize its value, and use it where appropriate, and to the degree needed.

    ReplyDelete
  2. Hi Bob,

    Thanks for the comments, I keep thinking that there is 'something' really important buried under all of this. In a sort of vague haze, I think that if you know you going to express something over and over again multiple times, knowing that, should change the way you actually express it.

    For instance, do I really need to fully specify all of the error messages in the design, if I'm going to instruct the testers to catch any spelling or grammatical errors and to comment on the error text relevance. In that case, while coding could I also just put in base strings for the messages and replace them during the 'testing' stage or in this case the 'refinement' stage.

    I know the testers must 'express' what the right error messages should be, so can I take advantage of that earlier to not focus on them? Or am I just getting to abstract again :-)

    ReplyDelete
  3. You specify the error messages in your design?

    I think what belongs there is the requirement that error messages be understandable to their intended audiences -- typically, user, support, operations, and developers.

    It's not something you just leave to the testers to comment on, however. Coming up with good error text is an important task that can really impact the usability, testability, supportability, and yes, developability of a product.

    But it's a task, not unlike documentation.

    At the design stage, you won't know all of the error cases anyway. Some will be inherent in the design, and need to be called out as part of the design, while others come out of the implementation, and the need to defend against the unforeseen.

    And you'll need to do it over for every language you support!

    On the other hand, error handing, and error reporting strategy is something you can and should design in from the beginning.

    You need consistency in presentation and behavior. Protection from getting into bad states, data loss, or making a bad situation worse.

    You need to design in ways to get consistent error handing -- a base set of facilities, etc. Including localization/internationalization if appropriate.

    This comes back to the abstract concept of repeated expressions. You will be repeatedly expressing aspects of your error processing. What's needed is to properly abstract it. What's relevant to the design -- to enable the developers to produce a structurally-sound application?

    And what's the decor, needed for the end user to feel comfortable, but having little to do with construction?

    So I'd say finalizing the error text is a task done during development, probably visited more than once, often with the documentation team AND the testing team lending a critical eye.

    Especially if you're documenting error messages.

    While I'm wandering on and off the topic, let me comment on some error message principles I see violated all the time.

    * Error messages should be traceable to a specific event in the code, and not be lumped under a single "error code" or even common error message. (Consistent text is OK, but you need something to differentiate different cases, even if it's just a code).

    If you don't do this, you're likely to be stuck investigating irreproducible problems that also happen to be unidentifiable...

    * Error messages should indicate to the user what he can do about it. Even if it's on the order of "reformat your hard drive and reinstall everything", or "call support". Leaving the user wondering what he should do next is bad.

    * Further, error messages should where possible, empower the user to take the appropriate action himself.

    * NEVER make a user look up an error code. Error codes should exist only to be unique. Otherwise, you need real error text. 32-bit (or 16-bit!) error return values have been obsolete for a quarter century!

    * Don't make error codes cryptic numbers, make them readable, memerable identifiers.

    I'd better stop before I get on to error handling!

    To get back to the topic again, design can help in achieving the above guidelines. But the actual errors, I would definitely regard as an implementation activity, pursuant to that design.

    And definitely empower your testing team to critique not just the end details, but the design -- every step of the way. Is it testable? Will it be usable? I the model understandable? Will the errors be confusing, no matter what text you use?

    ReplyDelete
  4. Admittedly, using error messages was a misleading example :-)

    They have ended up in the design, at times, but in one case it was because the interface design was being completed by another company. They wanted to explicitly specify all of the behavior, which included how the user interacts with many of the common errors.

    Another time, the long lag for language translation (>10 languages) meant that all of the system text needed to be completed months before the product was scheduled to finish. As part of the design, all of the screens were assembled for documentation and all of visible text was sent to an editor, then to the translators. Because it was in parallel, the programmers had to stick tightly to their designs. It was an interesting way to work, but it did seem awkward. But it does shorten the time for full international releases...

    I agree that it is important to design error handling, not only for the user, but also for the entire software support handling. A 'wholistic' approach. Error handling is the one common feature that all programs always have and it's also the one feature that most programmers leave toward the very end to jam into place.

    My rambling last comment was more about empowering the testers to have a larger role in their work and spending less time up front agonizing about the details that will need to be fixed later anyways. Not to really allow the testers to "change" the design, but to try and let them "refine" it in a series of later test stages. I've played around with the idea of fully independent testers. It is one of the things I added to the methodology I am "slowly" working on.

    It is often easier to see design inconsistencies at the end, then in the beginning. We rarely exploit testers to their full value; they could offer more assistance. Can we mix these two things to save ourselves time?

    ReplyDelete
  5. Yes!

    Not only testers, but also documenters, and support.

    The people who have to test the product have an important contribution. I have seen many systems that were unduly difficult to test because they had so much opaque state that you could never tell what state you were putting the program into, nor predict what state it would go to next. How do you test a long-lived server application when you have no way to drive it to a known consistent state, nor any way to evaluate whether a particular action had a particular result? All you get is some composite of some large set of operations at the end...

    Designing for testability is very close to designing for monitorability, and important characteristic for enterprise and high-reliability applications.

    As for documentation and support -- these are the people who have to explain the product. They'll let you know if your underlying metaphor doesn't communicate what you think it does, or if the model is unclear, inconsistent, or just hard to explain to users with a different mind-set.

    I have long argued for bringing these people in early, when you're still trying to work out WHAT to build.

    And customers, too, to the extent possible. That's a big part of the XP model ("customer on-site"), but usually you have to use some proxy for your customers -- someone who well understands the customer and his needs. This may be someone in sales, or support, or a former customer you've hired, or simply a developer who tries to minimize the disconnect between developer and user.

    Circumstances vary, but the principle is to involve advocates for the user's interests as much as you can.

    ReplyDelete

Thanks for the Feedback!