Saturday, October 17, 2009

Fundamental Laws

OK, technically I'm under the influence. No, not drugs or alcohol, but rather Number Theory. In the last few weeks I've been consuming as much Number Theory as I can get my hands on. Sure, it's having some perverse effect on the way I see other issues, but occasionally one has to forgive the rigor, and just see it as a necessary impediment towards progress.

In thinking about it, I realized that Computer Science just doesn't have any strong "fundamental" theorems. Of course, "theorem" is a bit harsh (and Number Theory-ish) so I figure a softer title of "Laws" may be more appropriate. Here are some suggestions:


Fundamental Laws of Programming Structure:

1. All programming code is hard to write and has bugs.
2. The lower the code is in an architecture, the more you can re-use it.
3. The fastest way to develop code is by re-using it.
4. The best way to develop code is by deleting it.


Fundamental Laws of Spaghetti Code:

1. You can't understand it just by looking at it.
2. Repetitive code will always fall out of synchronization.
3. The work will always gets harder and slower as it progresses.
4. Fixing one bug creates several more.
5. You can't enhance the code if you are spending all of your time fixing the bugs.


Fundamental Laws of Enhancing Software:

1. Adding or upgrading to new functionality is very easy.
2. Upgrading the data and its schema is very hard.
3. It is often impossible or at least, very expensive to re-collect data or to "back-fill" it.
4. Getting the data collection right is far more important than getting the algorithms or code right.
5. Users care more about data, then functionality.


Fundamental Laws of Software Design:

1. Software development projects go on for as long as the software is actively used.
2. All software rusts (falls out of sync with modern technology), if it is not occasionally updated.
3. All code will eventually be "enhanced" to a newer, better version.
4. Software generally, only gets more complex.
5. It is orders of magnitude easier to add complexity to some code, then it is to remove it. Once complexity is entrenched it is very hard to remove.
6. There is no point initially agonizing over the perfect design, it is better to start with the simplest possible version and enhance it later.
7. The initial design phase for software should be as short as possible (but it should happen).


Fundamental Laws of Software Architecture:

1. A system is well-structured if and only if there are clear and consistent lines (APIs, libraries, components, etc.) separating and encapsulating each of the various sub-pieces within the system.
2. Well-structured software is easier to build.
3. Well-structured software is easier to test.
4. Well-structured software is easier to diagnose.
5. Well-structured software is easier to extend.
6. The software structure should match the development teams organizational structure.
7. The software structure should match the operational support services structure.
8. If there are any exceptions to the structural lines, then the software is not well-structured (only partially-structured).


Fundamental Laws of Commercial Software Products:

1. Not all code is industrial-strength code, even if it works.
2. The primary code is only a small percentage of the work (and the total lines-of-code).
3. Design is as important as functionality.
4. Packaging is nearly as important as functionality.
5. Most commercial products require more than 100,000 lines-of-code.
6. All commercial products require at very minimum, no less than two man-years of effort.
7. Releasing industrial-strength code is cheaper than just releasing code.
8. Bugs are inevitable, so support and distribution are key parts of the product.


Fundamental Laws of Testing:

1. All code has bugs, software should never be assumed to be bug-free.
2. There is at least one bug for every 200 lines of changed code.
3. It gets at least exponentially more time consuming to catch a larger percentage of bugs.
4. Every point of "integration" magnifies the number of bugs.
5. System level testing is the least expensive form of testing.
6. All testing resources are finite and not enough to thoroughly testing the entire application.
7. All tests have an inherent narrow scope (they test only a limited number of things).
8. All tests are "randomly" successful.
9. The results of testing don't change if neither the code, nor the test change.
10. All tests are code, and follow any of the normal laws of coding.
11. Tests are an expensive form of code.


I'll leave any discussion, justification or further information for upcoming blog posts (it gives me something to write about for the next few months).

Please feel free to add comments that are for, or against, any or all of these rules. Explanations would help too :-) If there are a lot of comments, this time I won't respond to each, but I may choose quote them later in the follow up posts.

12 comments:

  1. Hi,
    the first thing I can think is: there is some difference between math laws and these.

    1 The laws in math are about math, so they are about things inside math. Here, we have laws in a borader sense, they are common sense phrases, because they involve concepts outside the original field.
    Examples: "useful", where in math do you have a theorem about "useful"? Or, "the best way", where in math you can have a "best way"? "work hard", and where in math you read "this is harder"?

    2 This leads to second point, definitions. In fact in math everything is well defined, here, it can not be. See the words i quoted in preceding point ("useful", "you can't", etc).

    3 So, why is this? The math is an independent universe, programming is not. Programming has an objective outside itself, math not (or we can say it is not mandatory for math, for it has well many pratical applcations)

    You can do math for phyisical reality (project a machine or a house) but, in reverse, you can not create a program having inside itself the motives of its own existence. Programming is an applied science.

    If I think of something else, I will write :-)

    ReplyDelete
  2. Hi xlr8,

    Thanks for the comments. Yes, I could have easily been more rigorous. :-)

    Still, I tried to stay away from terms that were entirely ambiguous. Hard for example, implies a open requirement for extra resources (time, thinking, etc.) Best way, implies that there is a higher precedence over "other ways". I also wanted to keep it short (so no long long definitions).

    I picked the term "law" because I wanted to convey a sense of these things being inherently correct (most of them require little or no proof).

    I figured it was probably a good place for a discussion about which ones aren't "fundamental" or "well-defined" or even "true" :-)

    Paul.

    ReplyDelete
  3. My only grip is with this sentence:
    "2. All software rusts (falls out of sync with modern technology), if it is not occasionally updated."

    Sounds like a tautology: A is B unless otherwise.

    Maybe "software rusts, so it needs maintainance" would be clearer. =)

    ReplyDelete
  4. Hi,
    it seems Seiti is lucky having only one "grip", for me I could discuss every point in a separate post. :-)
    I guess Paul means software must be often updated not by customer new requirements, but for following changes in underlying hardware or operating system.

    The most alien assertion for me to notice right now is:
    6. The software structure should match the development teams organizational structure.
    (and 7. The software structure should match the operational support services structure.)
    I would say the opposite: create the team based on your project (if you need a database get a db specialist, if you want a videogame you must have a 3D algorithms expert... for example)

    ReplyDelete
  5. @Seiti: Yes, I think I should have just stopped with: "All software rusts, eventually"

    "Eventually" at some point you'll be unable to purchase the hardware/software combination required to run the software.

    I've seen this in operational environments a lot. Enough time goes by, and suddenly a company is scrambling to try to fix/update some ancient legacy application that they've relied on but forgotten about.

    @xlr8: If you're writing an application with a GUI and a back-end to the database, and you split your team into the GUI person, the transport person and the database person, then the structure of your team matches your architecture.

    For small teams it isn't that important, but in systems were there are hundreds of programmers it can be crucial, mostly because each sub-team will end up redoing pieces that should have been common, thus wasting a lot of time, and possibly causing a lot of bugs because the code is redundant.

    What I'm really trying to say with these two laws is that we have to factor in the environment we're building in, into the way we are building stuff. If you want to utilize two remote teams to the maximum capacity, then you need to insure (in some way) that they aren't just reduplicating each others efforts.

    In practice, every time I worked with big teams (>20), the sheer amount of redundancies have been staggering. And it's often worse at the really big, well-known software shops. They toss out huge amounts of resources, due to dis-organization.


    Paul.

    ReplyDelete
  6. Programming structure:

    - Programming code is hard because the easy part has already been automated in some way.
    - Programs have bugs because undecidability prevents automated checks
    - Programs have bugs because they are written by humans, and errarum humanum est.

    Spaghetti code:
    I think that characterizing "bad code" is interesting only if one says by which process the original code "degrades" or "rusts". For instance:
    "backwards compatibility is a poison for software"

    Enhancing software:

    This set of laws articulates around the data/code dichotomy, and boils down the simple point that "data=hard, code=easy". It also conflicts with the first rule of programming structure : "all programmind code is hard to write".

    This seems to me highly black-and-white, and therefore highly suspect.
    The 4th point in particular suffers too much counter examples: games, aircraft control systems, etc.

    Software design:

    It's interesting to look at the exceptions and counter-examples.
    For the four first rules, one may think of the various Unix little utilities, which have been mostly stable for years. Same for most embedded systems' code.

    Your first four rules apply to software that has no clearly delimited domain; therefore it can grow with no limit and bugs can be fixed and introduced in it forever.
    I think this a key point. Software should have an initial release and a final release. Added functionnality should go to another piece of software. It then forms a system, so the rules of the next set, software architecture, apply.

    rules 6: design is a human activity, humans make mistakes, so it seems to be wiser to start with a minimal design.

    Commercial software:
    Figures in rules 5 and 6 look arbitrary; for what reason a 50KLOC, 6 man-month software isn't commercial software? How is it relevant to computer science or software engineering if software is "sell-able"?

    Testing software:
    2. bugs/LOC is a constant at best.

    rule 9 is one of these little jewels of obviousness: if the result of the test changes when you believe neither the test nor the source has changed, then you're wrong; either the conditions of the test have changed in a subtle way, or the code has changed in an unexpected way (a build option was changed, compiler or build tool upgrade,...).

    I would add:
    The more a piece of code is reused, the fewer bugs it has.

    ReplyDelete
  7. @Astrobe :
    - Programs have bugs because undecidability prevents automated checks
    Programs have bugs because human make errors (misunderstanding with the customer, misunderstanding within the project team, lack of/uncomplete specification, design errors, coding errors...). That's life.

    That said, it is very true that we cannot be sure that all bugs are detected automatically, because of undecidability (thanks to Mr Gödel and Mr Turing for the theories behind it). But, even without any theory about programming, the number of combinations for the inputs of any non-trivial program is far too large to do exhaustive testing (could the number of combinations be a beginning for a characterization of "non-trivial" ?).

    And when the bug lies in the spec, well, we cannot write a jUnit that checks that the spec is what the customer meant ;)

    "rule 9 is one of these little jewels of obviousness: if the result of the test changes when you believe neither the test nor the source has changed, then you're wrong; either the conditions of the test have changed in a subtle way, or the code has changed in an unexpected way (a build option was changed, compiler or build tool upgrade,...)."

    C and C++ define things like "undefined behaviour" where a program is "allowed" to do things at random. Thus, the behaviour of the program can change without any change in the environment (although in practice some UB in a given program always cause the same behaviour).

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. @Paul W. Homer
    Your laws remind me of the ones Brook wrote in his "Mythical Man-Month". I even wonder if the "software structure = team structure" law wasn't already in his set of laws. I do not have a copy anymore, so I'll have to check somewhere.

    Anyway, he tried to write some laws, and asked for studies to validate/invalidate them. I do not know whether companies/researchers have tried to do that.

    As Astrobe said, such laws have limits : the context has a lot to play in this. Are we talking of ageneral public software publisher, ERP publisher, in-house programs, video games, embedded systems, banking, weather simulations...? Those categories (as well as many others) have their own set of constraints and requirements that influence the validity of "programming laws".

    Regarding the laws
    2. Upgrading the data and its schema is very hard.
    3. It is often impossible or at least, very expensive to re-collect data or to "back-fill" it.

    We could also define laws more specifically for data quality...

    ReplyDelete
  10. Interesting post (as always).

    "2. Well-structured software is easier to build."

    While I can't argue with the above "law" it makes me think. If well-structured software really is easier to build then why isn't everybody building it so? Why is so much of our code, dispite good intentions, ending up as "bad design"?
    Could it be that well-structured software is easy to build but requires good design, which is where the experienced difficulty lies?

    ReplyDelete
  11. Thanks everyone for the comments. So far they've been excellent. Keep em' coming :-)

    I'll do a follow-up post for each set of laws, starting with the most popular (controversial) ones first.


    Paul.

    ReplyDelete
  12. We describe recent progress towards deriving the Fundamental Laws of thermodynamics from nonequilibrium quantum statistical mechanics in simple, yet physically relevant models. Along the way, we clarify some basic thermodynamic notions and discuss various reversible and irreversible thermodynamic processes from the point of view of quantum statistical mechanics.

    ReplyDelete

Thanks for the Feedback!