Monday, November 5, 2007

The Art of Encapsulation

In all software projects we quickly find ourselves awash in an endless sea of annoying little details.

Complexity run amok is the leading cause of project drownings; controlling it is vital. If you become swamped, it usually gets worse before you can manage to get it under control. It starts as a downwards spiral; picking up speed as it grows. If it is not explicitly brought under control, it swallows the entire project.

The strongest concept we have to control complexity is "encapsulation". The strength behind this idea is the ability to isolate chunks of complexity and essentially make them disappear. Well, to be fair they are not really gone, just buried in a black box somewhere; safely encapsulated away from everything else. In this way we can scale down the problem from being a monstrous wave about to capsize us, into a manageable series of smaller waves that can be easily surmounted.

Most of us have been up against our own personal threshold for keeping track of the project details, beyond which things become to complex to understand. We known that increasing the size of the team can push up the ability to handle bigger problems, but it also increases the amount of overhead. Big teams require bigger management, with each new member becoming increasingly less effective. Without a reasonable means to partition the overall complexity, the management cross-talk for any significant sized project would consume the majority of the resources. Progress quickly sinks below the waves. Just adding more bodies is not the answer, we need something stronger.

Something that is really encapsulated, hides the details from the outside world. By definition, it has become simpler. All of the little details are on the inside, and none of them need to be known on the outside to be effective. In this way, that part of the problem has been solved and is really to go. You are free to concentrate on the other unresolved issues, hopefully removing them one-by-one until the project comes down to just a set of controlled work that can be completed. At each stage, the complexity needs to be found, dealt with and contained.

The complexities of design, implementation and operation are very different from each other. I've seen a few large projects that have managed to contain the implementation complexity, only to lose it all by allowing the operational complexity to get out of hand. Each part in the process requires its own understanding to control its own special complexity. Each part has its own way of encapsulating the details.

Successful projects get there because at every step, the effort and details are well understood. It is not that they are simpler, just that they are compartmentalized enough that changes do not cause unexpected complexity growth. If you are, for example adding some management capabilities to an existing system, the parts should be encapsulated enough that the new code does not cause a significant number of problems with the original code, but it should also be tied closely enough to it too leverage its capabilities and its overall consistency. Extending the system should be minimal work that builds on the existing code base. This is actually easy to do if the components and layers in the system are properly encapsulated; it is a guaranteed side effect of a good architecture. It is also less work and higher quality.

Given that, it is distressing how often you see architectures, libraries, frameworks and SDKs that should encapsulate the details, but they do not. There is some cultural aspect to Computer Science where we feel we cannot take away the choices of other programmers. As such, people often build development tools to encapsulate some programming aspects, but then they leave lots of the details free to percolate upwards; violating the whole encapsulation notion. Badly done encapsulation is not encapsulation. If you can see the details, then you haven't hidden them, have you?

The best underlying libraries and tools are ones that provide a simplified abstraction over some complicated domain, hiding all of the underlying details as they go. If we wanted to get down and dirty we wouldn't use the library, we would do it directly. If we choose to use the library, we don't want to have to understand it and the underlying details too. That's twice as much work, when in the end, we don't care.

If you are going to encapsulate something, then you really should encapsulate it. For every stupid little detail that you let escape, the programmers above you should be complaining heavily. If you hide some of the details, but the people above still need to understand them in order to call things properly, then it wasn't done correctly. We cannot build on an infrastructure that is unstable because we haven't learned how it works. If we have to understand it to use it, it is always faster in the long run just to do it ourselves. So ultimately building on a bad foundation is just wasting our time.

It is possible to properly encapsulate the details. It is something we need to do if we want to build better software. There is no need to watch your projects slip below the churning seas. It is avoidable.

6 comments:

  1. Great post. You have helped shape my thoughts on this, which is rare. Thanks.

    My particular view on this was that we should encapsulate at about 85%, allowing 15% to "leak out". Make the simple things simple, but still allow the complex (unforseen) to be possible.

    A noble goal perhaps, but if I am trying to abstract something in a way that enables order-of-magnitude increases in productivity, then I should take the plunge and hide the details completely.

    ReplyDelete
  2. Hi Steve,

    Thanks for the comment. One way to see a software development project is as a struggle against the ever increasing complexity. Encapsulation if complete, removes some small amount of complexity from the scene. Partial encapsulation doesn't; your getting less value from your work.

    With limited resources -- the staple of software development -- we really need to get the biggest bang for our buck where ever possible.

    ReplyDelete
  3. Hi, Paul,

    Being a twenty-year professional, you don't need to be told the difference between encapsulation and information-hiding, but I'd argue that the encapsulation you describe is not an art.

    It's not an art because the fundamentals of encapsulation theory tell us important truths about our software, and these truths are not, "Mere," artistic aesthetic.

    You may (or may not) be interested in reading a short article on encapsulation theory here:
    http://www.edmundkirwan.com/encap/intro.html

    It's as far from art as you can travel.

    Ed Kirwan

    ReplyDelete
  4. Hi Ed,

    Oddly enough, when I checked my intuitive definitions of encapsulation and information-hiding, they differ from what wikipedia says. To me, encapsulation is 'more' than just hiding the information and making it hard to access. It is burying all of the complexity around that information as well; a much broader concept.

    In a little section in The Programmer's Paradox, I speculated that there is a canonical form for code, in the same way there is a 4th or 5th normal form for a database schema. I did a quick sweep through your encapsulation theory writings (I'll go back later when I get a chance), and it seems like the idea of an equipoised system is similar.

    Not that I am surprised; there is an underlying mathematical structure beneath code that can be normalized with respect to any number of criteria. What one needs to consider though, is that our human-centric version of simplicity and elegance is not the same as a true computational one. We like things slightly messed up.

    I agree with you that we can use algorithmic or heuristic means to structure our code, we've certainly been able to do with with our data for a long time now. However, given that I've so rarely seen schemas that are fully normalized, I suspect that the same principles for code would be equally ignored.

    The 'art' in encapsulation is more of what people actually want, then what is really possible. Programmers love their freedom, even if it wrecks their projects. I blame Knuth, he started it :-)

    ReplyDelete
  5. Regarding the innate drive to provide programmers with flexibility when we should be encapsulating....

    The best tool for balancing this (when both need to be accommodated) is layered abstractions.

    It takes a real effort, though, to make the separation clear, though.

    So you can have a high-level API, written in terms of a low-level API, that encapsulates the standard, consistent way of working, and strongly encourage people to stay within that.

    You can have a lower-level API that gives access to see what's really going on, perhaps to add new capabilities, or provide diagnostics, or to handle the unforeseen.

    I would argue that without layering, encapsulation doesn't scale. Basically, layering is the recursive application of encapsulation to clean up the mess when you get a bunch of encapsulated things interacting in complex ways.

    I've never been happy with the term "information hiding". I think it's a biased term with the wrong connotations. The real goal isn't to hide information, but to relax dependencies and achieve independence. Hiding information is just a way of enforcing that -- a means to an end that can be met via other ways as well.

    If you can be sure that other modules are independent of some internal characteristic of your module, you can then make changes that affect that characteristic. That's the essential value that encapsulation brings.

    But there's another dimension to the problem -- HOW we choose to encapsulate. Where we draw the boundaries, and how we organize the interactions across the boundaries -- i.e. our APIs -- becomes a major issue. You can have good encapsulation, but lousy, hard to use, confusing APIs.

    Here, the goal is simplicity -- easy to describe, understand, remember. A major aid to reaching the goal is consistency.

    I can't tell you how many times I've invoked Java's File.listFiles() method to get a list of files in a directory, only to have my code blow up because the directory was empty, and so it returned null instead of an empty array of files. This anomalous case makes a very simple API a bit more complicated and a bit more prone to usage errors.

    Taken together, the question becomes -- what to encapsulate, and how.

    To return to information hiding again, in your reply to Steve Campbell, you say "Encaptulation if complete, removes some small amount of complexity from the scene. Partial encapsulation doesn't; you're getting less value from your work".

    It's really not about whether encapsulation is complete or not, it's about what independence you can easily verify, that enables the value. If you can only weakly verify that A does not depend on B, you still have a risk that A may indeed depend on B in some undetected way. The stronger your ability to verify independence, the lower the risk, the greater the freedom to evolve, and the greater the value.

    Complete encapsulation with information hiding is one means to this end. Tools can provide another. Even usage patterns, documentation, and discipline can contribute.

    With layered encapsulation, one thing that helps is if you have to get to the lower level via a single (or very small number) of explicit calls. Verifying that the higher layer has not been breached can then be a matter of checking for this specific call.

    That's more effective if the call is part of a common mechanism across your entire encapsulation methodology, so you KNOW to look for it. A guarantee of independence doesn't have a lot of value if you remain unaware of it.

    ReplyDelete
  6. Hi Bob,

    I'm definitely in agreement with your views on layered encapsulation, although I tend to separate it from encapsulation and stuff it into architecture.

    In the end, we slice and dice our huge sets of instructions into smaller pieces to hopefully make it easier to maintain and extend our work. If all we end up doing is just making it needlessly more complicated, then we are working against our own interests. Any of the higher concepts like encapsulation, polymorphism, decoupling or inheritance are only useful if they actually simplify our lives.

    ReplyDelete

Thanks for the Feedback!