Wednesday, May 14, 2008

Hard Code'n

I think that at this stage in our industry, it is important to differentiate between several key, yet very different parts of the software development process. Specifically, I see a huge difference between "software development", which includes design, development and deployment of software, and "programming", which is focused on completing a set of instructions in a computer language to implement some functionality. One is the all encompassing act of creating software including every aspect from beginning to end, while the other is a very specific subset of the process that focuses on writing code to implement some set of algorithms.

In many ways I see this division as similar to accounting vs. bookkeeping. Bookkeeping is an important part of accounting, but it doesn't necessarily have to be handled by a fully-trained accountant, in fact most bookkeepers I know are not accredited accountants. Accounting includes far more than bookkeeping, but bookkeeping is an essential part of it. There is even a "higher" side of management accounting, which still deals with the science, yet only at a very high management level.


CLEVER DEFINITIONS

If I am going to split one out from the other I need to carefully define them or risk the wrath of the net (or even worse, silence). I see programmers as taking descriptions of functionality and making them into code. Software developers on the other hand, analyse specific user domain problems and then design and implement solutions to aide the users in building up their ever increasing piles of data. In that way, programming is just a tiny part of the overall software development. It happens somewhere in the middle. It is the process of "encoding" some functionality into a language as a long set of instructions and doing some testing/fixing to make sure it works. Everything else is software development.

I'm well aware of how our industry and programming culture love to mix together analysis, design, requirements, coding and testing all into one giant lump; for most people these are one in the same operation. Just another day at the office.

I think that mixing these together is a huge mistake because the skill sets are very different from each other. Not to trivialized it, but programming -- such as implementing a function to calculate the Fibonacci sequence -- is reasonably well-understood and well-established. Depending on the functionality, there exists an algorithm or not. At worst, implementing the functionality may require the use of several different algorithms all combined together or modified slightly. Generally, for most types of code, examples already exist and can be modified to fit. The problem of function -> code can have it challenging moments, but ultimately it is not a hard problem if you know what you are building. The key is in knowing.

That is why I really like this differentiation. You see, for all of the software developers out there when they are discussing whether or not it is an art or a craft, or intrinsically hard, etc. what they tend to do is blur the line between analysis and programming. Analysis is hard because we don't know what the users need or what will actually work, but once having settled on a specific algorithm, "coding" it is not all that challenging. Sometimes it involves a bit of research (or should), but after that it's just work.


WHERE THE TROUBLE BEGINS

Well, sometimes. If you sit on enough development teams you quickly come to realize that many programmers have serious weaknesses. We, as techies, love the intricacies of tiny machinery like watches. All those little dials and gears and little things appeal to most programmers at a low level. Not unsurprisingly, our single greatest problem while programming is the tendency to "over-complicate" our solutions. We drift towards pedantic, complex solutions that come from over-thinking the problem. We like the fiddly bits, so we add them wherever possible. We are also "option" happy, adding in tonnes of them that never get used.

You'll see it so often in most code, tonnes of unnecessary variables, conditions, loops. Redundant copies, extra layers of handling, and buckets of "glue" code. Fiddly little bits on diagrams, excessive casting, big ugly useless comments, 2 inch thick designs or manuals, etc. Even programmers who love to call themselves lazy will frequently implement 5,000 lines of code when a mere 200 might do.

It is an epidemic problem with programmers, and I've never met one that wasn't guilty in some form or another. If you think you don't do it, then you've probably not been coding long or hard enough; the simplest, most elegant answer is far simpler and more elegant than most programmers have even begun to realize.

Not only do we constantly over-shoot the code, we also build intricate and complex solutions that drive our users nuts. They're often looking for a quick simple solution, and instead we've build some monolithic all encompassing power-hungry solution were even the simplest bit requires the memorization of masses of new terminology and a three-week course on how to apply it. Manuals, they like to say, are only there to document the design flaws. A reasonable viewpoint, I think.


TRANSFORMERS ARE MORE THAN JUST ROBOTS

Getting back to programming. If indeed you understand the steps necessary to implement your specific functionality, then it is not a particularly hard endeavour. In the end, for most languages, it's some number of variables, a clump of conditions and a few loops, the fewer the better. Programmers "love" to dive into writing some complex code, but most often its either really simple and straight-forward, or there is a well-known algorithm to handle it. Most code is just tying things together and converting between one physical structure of the data and another. These days, the really complex stuff is buried in libraries, far away from most programmer's hands or eyes.

Even more simply, you can see any type of functionality as a transformation on some data. That makes it almost trivial: the data exists in the system or it needs to be loaded, then some algorithm is applied to transform it into some other structure. Then it is saved and/or written out. Programming, from that perspective is not particularly complex; unless we choose to make it so.

When it is complicated, we tend to find really simple reasons why that is true. The most common is that the programmer is making it too complicated, either they've misunderstood the problem or they've misunderstood how the tools work. I've seen enough programmers "flailing" at their keyboards over the years. There is some abstract aspect to programming that some people just never grasp, while other have to work hard to get better at it. Mostly, I think it's some type of anxiety, where people "think" that the problem is hard, so they skip right past the simple solution and start making it really complicated. A kinda of programming fear-of-failure delusion. "It just can't be 'that' simple" we like to tell ourselves.

There are many people afflicted with this type of problem, but fear not if you are one, for most of them coding gets simpler and easier with practice. The real trick is to keep going back and "simplifying" the code, not "adding" to it. E.g. if it doesn't work, don't try to "add in" more logic, instead start stripping it away until it is smaller and simpler. Removing code is the best tool for debugging. It may seem like a slower approach, but it is way way faster than flailing at it. I had a boss once that taught me by leaning over my shoulder and hitting the delete key over and over again. He'd nuke it and make me type it in again. It was the best programming lesson I ever learned (by the third time, you really get it).


FAR TOO CLEVER

Beyond intricate, some programmers gravitate to "clever". They get pulled into really clever ideas that seem like they are going to work really well. Well, at first they seem great. The problem with clever is that it is an extremely "low" level of working. Clever is not simple, in fact it is nearly the opposite. It's a little bit of concentrated complexity all nicely bundled up into a neat programming package. That might work for writing, but it's the type of thing that you come back to "months" later and instantly regret.

Clever you see is just a waste of time at some point in the future. The problem is that to get to something clever, you probably had some cool inspiration. A light went off in your head, or a neat idea popped up in your mind. That's great, but it's not the normal way of thinking. Generally that causes a type of compressed complexity, a neatly packaged clever idea. That makes it a land-mine waiting to get stepped on.

Someone can easy mistake the point or functioning of the code, and in all likelihood unless your lucky enough to get fired, someday, at some point, when you least expect it, you'll have to go back in a rush and try to fix some stupid problem. That, by the way, is always the case with clever. You are essentially just setting yourselves up aren't you?

Given that, however, "abstraction" is not clever. It is a generalization of the purpose of the code, not some cute little syntax trick or something else tricky. When I say clever is bad, some times people take that to mean that "brute force" is good, but that's hardly what I mean either. Pounding out each and every instruction is a huge waste of time, and it's hard to maintain. Brute force is to specific and too large. Clever is too compressed, it took longer to write it, and it's a land-mine.

Good simple short code -- the definition of elegance -- that works at a reasonable level of abstraction so that it can be leveraged, is what does the best for the long term goals of a software development project. A great programmer is someone who can take a hard problem and make the resulting code look simple. It should be so obvious that it doesn't look like a lot of work.


FUNCTIONALLY FLAWED

Another really common problem draws its strength from our unfortunate desire to see programming as an 'art form'. You meet enough programmers who don't want to be engineers, so that don't want any process of any kind. Worse, still, they want the creative "right" to pick a new and unique way of solving each problem, each time. Even if its the same problem over and over again.

And so, by their inconsistency, and the lack of structure they create around themselves an ever increasing vortex of complexity. Mostly you see this with the cowboys, and their fast, yet dangerous band-aid approaches. Cut and pasters are another entertaining variety.

It's quick, its fluid, it works for a while, but like any continual short-term strategy it builds up to the point where it becomes an uncontrollable nightmare.

Fundamentally software development is engineering. We are building something, and we do need to balance out the long-term work with the short-term pressures. Software is saved by the fact that its total ugliness is not visible (if it were there would be a lot of "fired" programmers), but that doesn't mean the effects won't be visible. When you are building "anything" you can only cheat for so long before it becomes unworkable. Sure, if it is a short "assembly" job of combining some pieces together to whack out a simple application for a couple of months, you can get away with a huge number of short-cuts, but once it becomes a multi-year, multi-developer project, each and every short-cut (even the ones that you don't think are actually short-cuts) builds up.

If and when they build up enough, they account for a significant number of project failures. Sadly, "sloppy process" failures are entirely preventable, but only by people who understand them.


BIG BALL OF MUD

If it is not the programmer, or the chaos then it is the functionality itself. It's either poorly specified, or perhaps even just a really "bad" idea. The real trouble in programming doesn't come from feeding in lists of instructions into an abstract machine for execution. Nope. It comes from tying that back to the "real world".

People are irrational, messy and the source of huge problems. If the functionality is not well-defined or it is not "workable", the core reasons behind it almost always come down to people, whether it be limited thinking, politics or egos, it doesn't really matter it's all the same.

All software ultimately is for people to use, so it is actually easy to get the functionality back onto the right track: "pick something simple". Then specify it, in some format that makes it easy to see if it's complete or not. From there, it's just back to programming.

Once in a while, in order to get the system running, the core contains something extremely complex. Generally this is some type of engine or parser or processor or something extremely hard. The really heavy duty programming can be tough, particularly if it is breaking new ground, but it rarely accounts for even a significant percentage of the overall system. Writing good heavy weight code generally involves a strong understanding of some complex discipline or the actual problem domain. Ultimately thought, even the most complex "engine" breaks down into a large number of simple functions. The trick is not writing the pieces, it is getting them to all work together in some intricate, yet simple and elegant solution, a problem which is clearly "architectural" in nature and not really programming.

What hooks a lot of people is that they tackle complex functionality without considering architecture, so the result is a lot of hit and miss attempts to get it all working together properly. If you build the mechanics into the architecture at the general level, then the lower-levels are just specific algorithms to transform data from one stage in the process into another. The code doesn't really fail, it's the architecture that convoluted the process and makes it messy.

No architecture? No wonder your having problems. You wouldn't build a house without first designing the internal frames, so why wouldn't you do the same for your code?


THE LAST FEW HANDS

Programming, then by itself, is relatively simple. That's hardly surprising as you find that in a lot of specific problem domains, many of the programmers are actually domain experts, not Computer Scientists. You don't need a Computer Science degree to write code. In a very real sense, that is why it is closely aligned with bookkeeping, even though I realize that a lot of people might take offense at that comparison. But, like it or not, great reams of domain-specific code is easily written by other disciplines. And, even more horrifying to admit, for you basic bread and butter medium-weight programming work, a degree in computer science is over-kill. You don't have to know about Turing machines to create a screen in a GUI to accept human resources data. You don't need to understand the halting problem to write a social web-app. The expressibility of SQL; does it really matter?

These things have their place in software development, but not necessarily in most programming, they usually only come into play in the core of the technical aspect to a solution, something that is generally wrapped in a framework or infrastructure.

Programming still has it moments when time is tight and you are having trouble focusing, but for most people, after about five years of steady coding it mostly becomes instinctual. I know, there are still readers out there that have been at it a longer time and are still struggling, but if they are fair about why they are struggling, the reasons come down to not really knowing what they are building, as they are building it. It's personal, architectural or analysis, not programming. Really it's a bigger problem.

Software development, on the other hand is extremely young, completely unfinished, and extremely complex. It's the type of thing that people just don't get, and is really hard, even at its simplest level.

You learn this, intrinsically, when you end up in meetings with users who are insisting that the software work in a specific way, while you are quite aware that it is impossible. Not just difficult, but completely and utterly impossible. Yet, it becomes very difficult to explain why it won't work. The certainty is there from experience, yet the ability to simplify it and pass that knowledge onto to someone else is lacking.

In that overlap between people and mathematics, the grey area in there is a largely unexplored, unknown world of fantastically complex problems that we haven't even begun to enunciate yet, let alone tackle. We missing at least one if not many different sciences that make up the knowledge needed to build "reliable" complex systems. We're pretty much guessing at it right now, when we should be far more knowledgeable about what works and what doesn't.

Still, while there are many great problems left to solve in Computer Science, and there is still a whole 'process' left to create to solve the on-going "software crisis", the act of programming is not among the key problems. Our biggest issue with programming is our constantly confusing the issues, and trying to fit a one-size-fits-all approach to unify programming and software development. Getting back to my initial point, if you see them as different, then it becomes easier to see and deal with understanding their own unique issues. A bit of structure can be a grand thing.

6 comments:

  1. I think programming itself is still growing as a field. Witness the very active debate over dynamic vs static languages, etc. Admittedly, the ideas have been around forever - however, we are still learning how to practically apply them.

    I don't think it is useful to distinguish between programming and software development. The problem with doing that is it implies that there is a way to split the responsibilities - just give the programming to the junior guys (or India), and leave the heavy thought to the architects. That can work, but is inefficient because software development is a design process, and by its very nature highly iterative and communication-bandwidth intensive. (It is most efficient to have the designer also do the programming).

    ReplyDelete
  2. Hi Steve,

    Your right, I think, in that programming is constantly changing as we discover and use different idioms (APL, Lisp, Prolog) and higher level abstractions (3rd GL, OO, multi-threading, etc.). But while it is ever evolving and will never stop, the act of encoding a large sequence of steps into some form of syntactical primitives is mostly constant. Java is a little different than assembler, but still similar. What N steps do you need to implement X?

    The convention is to not want to split out the different types of work, but I think it's the convention that is causing a lot of our problems. Lets face it, the industry as a whole is not functioning well. We're proud to have gone from a 30% success rate to a 50% one?

    There are relativity few good designers and analysts but there are lots of people who can program. Leveraging this would lead to better systems. The fear for most people seems to be that they don't want to be 'just a programmer', which is understandable but only if they are willing to pick up all of the different skills needed to be an overall software developer. Somebody doing analysis that doesn't understand it or the problem domain is just asking for trouble. Right now, if they can code a sort algorithm, we think they're automatically qualified for everything else. They're not, that's why it blows up so often.


    Paul.

    ReplyDelete
  3. I am not a programmer or developer or teacher, but i have read a lot of what they are saying about their problems.
    And i have tried to find some kind of unifying logic in what they are doing.

    I have learnt a few things:
    A developer creates the design while a programmer implements it.
    Linear static text is to much information and to little structure, all in the same dimension.
    It´s called "code" for a reason.
    The source makes a lousy specification - even if they call it "test".


    Source-oriented software makes it difficult to formulate the right questions and separate the answers in different semantical layers:

    What should the program do?
    How should i describe it to the user?

    What should happen in the program?
    How should i describe it to the programmer?

    How should i make it happen in the system?
    How should i describe it to the compiler, and what kind of support can i get from the tools when i am doing it?

    What can be done in the compiler instead of by the programmer?
    What should it look like to the programmer, and what must the programmer do to "compute" what it means on the other side of the compiler?

    ReplyDelete
  4. Hi Reino,

    Thanks for the comments. I think you're right, you don't have to be actively building systems in order to have a real appreciation of the complexity of the problems that the software industry faces. Software is the juxtaposition of mathematical purity and human irrationality, sandwiched together at a haphazard moment, but constantly changing.

    It all comes down to how you express your solutions for providing functionality to allow users to play with their piles of data. Expression is the key to all of your questions. We just haven't reached a point yet where we've even touched on a tiny fraction of the different ways to express our systems. We keep getting distracted by the technology.

    Paul.

    ReplyDelete
  5. A note about Abstractions

    Abstractions are culture, defined as conventions with semantics.
    They can be used to describe the program and reason about the program, without any need to define and implement the program (that´s the responsibility of the evaluator and the compiler).

    Sometimes the culture in the language gives you the wrong abstractions, so you have to describe more of the semantics in the program.
    Then the language should give the developer more ways to define new abstractions, with new syntax and semantics in the evaluators, and more ways to select (and define and document) the right evaluator (or part of evaluator).

    That requires a standard way to define and document the abstractions and the semantics in the evaluators, in a composable way (nested evaluators), that can be understood in a standard evaluator.

    I think that is a function that very few programming languages are trying to support.

    ReplyDelete
  6. Hi Reino,

    I keep coming back to the idea that the Arabs based their early counting systems around ten symbols (base ten), while the Romans based theirs around an ever expanding number of symbols (although they seemed to stop at eight).

    Both are abstractions for handling qualities in our world, yet they are very different. Although they both handle arithmetic, it is a far easier chore with the Arabic system. The merchants simplified the task, while the conquers stuck with a cruder system. But is it only culture that shaped that choice?

    Underneath, both systems are equivalent and complete. They are pure mathematical abstractions, no matter how or why they came into being. So 2 + 3 = 5 = V = II + III is always true. Would the Romans have not eventually found base ten on their own? Once you find the path, culture may set your speed, but your goal is always the same.

    In modern computer languages, the choice of technology generally comes with sets of predefined cultural bias. You try not to use loops in APL, while you tend to use partial objects (beans) in Java. But in the same sense as above, these are just points on the overall path forward.

    I am guessing that no matter where we start we will all converge towards the same abstractions. The ones that really work.

    In working with huge systems, they exceed our grasps too quickly, so at another site, http://softblue.wetpaint.com, we've been slowly digging into the underlying essence of the design for software. Size I think is one of the paramount issues, modern systems are massive, making them easy to get off course. We are looking for ways to outline the design without committing to a huge amount of work first.

    I'm inclined to believe that first working in another simpler and more relaxed notation will allow the direction to be more appropriately shaped before it gets cast in stone in a more rigorous programming language.


    Paul.

    ReplyDelete

Thanks for the Feedback!