Wednesday, February 16, 2005

Ramblings on Good-Code (Non complex code)

Ramblings on Good-Code (Non complex code)

What you see is actually what you get.

In an
earlier article of mine, I talked on programming style as a method of producing good code and harped on the idea that bad programming style can be identified by looking at the code it produces. Of course the obvious corollary of this is that good programming style can also be identified the same way. My key identification trait was that bad code is inherently COMPLEX in a whole lot of ways, but probably most glaringly, and easiest to identify, is that bad code is VISUALLY complex. In this article, I go to more depths in defining what I call Complex code and non complex code.

1. White Space

When you take a look at a program source code, the main things you see are the identifiers, keywords, etc. But there's also the things you see that you don't know you're seeing: Whitespace. Now this is probably a hilarious proposition, but whitespace can make any piece of literary art (of which I say programming is one. you don't agree? BYTE me :P ) look good or bad.

If I compress all my code without adequate spaces between them, in languages like C that have statement separators, the compiler will have no problem with that what so ever. The problem is that programs should be written to be read by other humans. So while the compiler has no problem reading and compiling that code, another human will have enormous issues just trying to find where what ends and where the other what begins. This you will agree with me is important, but few people consciously make sure that their code is spaced properly.

Usually when I look at wrongly spaced code, or rather not so well spaced code, my lazy brain does a quick back track, and my mind goes into "Oh no!" mode. Then I usually shift closer to the screen and try to look more closely, while under my breadth, secretly saying not so amazing things about (cursing? :P ) the person that constructed the horror I’m being forced to decipher. Wouldn't it be much better if I could lounge back in my seat and understand the code? At least for me, it would be way preferable.


/*...other code here...*/
long *c=(long *)malloc(SIZEOFLONGBUFFER);
for(int i=0;i<=MAXSIZE;i++){
/*...other code here...*/

compare this to:

/*...other code here...*/

long *c = (long *)malloc(SIZE_OF_LONG_BUFFER);

for(int i=0; i <= MAX_SIZE; i++) {
array[i] += *(c++);

printf("array[%d]=%d", i, array[i]);

/*...other code here...*/

Correct me if I’m wrong, but don't these two code segments actually seem at first glance to be two totally different programs doing two completely different things? The first one, seems to be a rather nifty program doing some complex stuff, but the second one, looks like a regular cheap program.

Well... My point exactly. The first program looks complex, while the second one doesn't PERIOD.

2. Variable Naming and code homogeneity

In the Unix world, lower case rules. I figured it’s because of laziness at hitting the SHIFT keys. The problem arises when your variables begin to get long. For instance using lower case on 'variable' is not bad, but using lower case on 'aratherlongvariablename' is plain wicked, and forget it, there's no how I’m going to read that variable name each time I come across it. This is the reason why I may not pick up that there is actually another variable like 'aratherwrongvariablename' or another one like 'aratherlongmaredname'.

Hehe, see? You're already tired of deciphering the names. Of course, good programmers don't do stuff like this. They have various ways of solving this problem. UNIX and C mainly 'used' to do use the _underscore_. So transforming becomes 'a_rather_long_variable_name', 'a_rather_wrong_variable_name', 'a_rather_long_mared_name'. These are much easier to look at. Java and C#, and I think in general OOP languages prefer the Capitalized approach of: 'aRAtherLongVariableName', 'aRatherWrongVariableName' and 'aRatherLongMaredName'. Well, I’ll leave you to figure out which you prefer (I prefer the underscore because of the simulated white space, since I can read it while lounging back in my seat, but that is another matter altogether, so lets not get religious now).

The other part of importance is that no matter which you prefer, in any single project (and preferably, related projects, or projects done by the same team), that the style you choose be pervasive and consistent throughout the project(s) to cut across as pleasant.

When these styles are mixed, the results are rather unpleasant to the eye. Basically, mixing these styles makes the source code look haphazard, and introduces the "Oh No!" factor yet again. Probably more importantly, it sets a precedence of confusion on a non-thorough attitude in the life of the said project. It then easier for a programmer coming behind you to get worse at what you’ve started, and it will make a brilliant hacker have a bad first impression of the project.

3. Code Density and Contained Units

Now this is a very interesting one for me, but anytime I have more than a certain amount of code before me on the screen, I begin to panic! I myself don't know the particular code per screen density that finally gets to me, but its there.

Dude!!! Get Real!!! Well, I’m getting very real, and when I write my programs, I try to ensure that I don't have too much before me at anytime. Another good thing is that if you use white space properly, you won’t reach that point.

Well, this is just one part of the story. The other part for me is that once a source code file begins to get long, I panic while debugging it. Well... now you're certainly a joker! But wait, let me explain things a bit.

The human mind has a large capacity, but usually, we don't remember a lot of things, in fact, we usually only remember things that make us think of other things that we like. This is called Association. If the human mind can associate a vast amount of things it can remember them very clearly. The point I’m making is that I usually find myself thinking that a source code file is long anytime I begin to find unrelated information inside it.

Okay, so you don't get my subtle point yet? I'm talking about Code Modularity at its very aggressive best. By all means, use functions in programs, and let those functions be short and straight to the point, also let them do one thing and do it well, and then also, group related functions into a single file (for OOP guys, these goes for classes).

Imagine a file that contains functions for manipulating JPEG images, and at the same time contains functions that manipulate the serial port. Each time you open that file to look at it, you brain is pulled in two directions, and you have the feeling, (at least I do), that this file is too long. On the other hand, If I split this file into two, the way any sane person should do, apart from getting a physically smaller file, I also stop my brain splitting in two directions. Another true fact, is that once I’ve split up the functionality in that file into two separate files, those two files, can ONLY grow SO MUCH during the life of that project, hence capping your code density per file again.

On the OOP front, I don't have to cram all my Classes in my Namespace into one huge file called Namespace.bleh. I could and should probably split them up into various classname.bleh files or something like that. You'll find that doing this actually puts a real constraint on how large a source code file can actually become. Sometimes of course, it will still grow very large, but I’ve written a number of software, and I find that once a particular file starts becoming very large, there is usually another file inside it itching to come out or the coder is not being lazy enough to refactor repetitive tasks out. And the same goes for Functions... once your function starts becoming large, there is usually another function inside it, dieing to get out.

4. Global Variables, Debug-ability and Function Parameters: (My beef with OOP Class Member data)

Usually, talking to an old programming hand, you hear such phrases as “GLOBAL variables are evil, avoid them at all cost!”. Hmmm... why do they say that?

Now imagine you have this 1000+ source file (uhh... couldn't Point 3 have helped us reduce this, unfortunately, not always, and like we’ve seen, if you keep related information, it won’t actually cut across as too large), and you're stepping thru this segment of a function somewhere at line 870, and you suddenly come across a variable with a value. You're trying to explain how it got there, and you can't. You look at the function itself, and it wasn't declared there. You look at the function parameters, and still it isn't a function parameter, alas, it has to be a global variable, well... you scoot up to the top of the file, and true enough, it is defined as global (or worse, an extern global!!!), but alas, you don't know where that value was modified, and you've set your break point so far below in the file.

At this point, you begin to hunt thru the file looking for where the variable may have picked up its value from. Finally you find it. Phew!!! Again you've wasted valuable time. If this is just a single variable or a few, no problem, but imagine when you have them everywhere in your program, you have to keep scooting upwards and downwards in a file, looking for various variables. I really don't like that.

My solution to this, is to pass variables around via function parameters. This really helps debugging, since it’s probably easier to tell where a function got called, hence where its parameters where set, than to find out where a variable that is global was assigned to or modified. This is another time ‘shaving’ feature that I usually appreciate anytime I encounter it in a well designed program. Moral of the story... well... I don't know if Global Variables are EVIL or anything so exotic, but I do know that they can make my debugging session more hard work than it should be.

Now what of my beef with OOP Class Member Data?

Well... OOP Classes makes it possible to use Global variables under a new name, but of course the effects are the same. Without good design, Class Member Data pervade a class, and turns debugging into a serious nightmare. It is usually a huge joy to me when I use a class that minimizes its use of Class Global Variables, and even when it uses them, it sets their values in easily deduced places.

5. Libraries

A library is in my opinion, the most important outcome of a software project. Why do I say this? Well, the problems you solve today, are still going to be around tomorrow, so if you've solved it today, why tackle it all over again tomorrow? Why not just solve it once for all today, and reuse your solution tomorrow?

This may seem like an obvious truth, but you'll be surprised how many software projects get finished these days, and there is ABSOLUTELY no library developed during the period it was being built. The excuses normally range from Time Budget, to Deadlines, to "those things need a bit more planning".

NONSENSE!!! That is my reply to those excuses. A good programmer is supposed to be always on the look out for functionality that can be factorized out. These functions/classes make up your library for solving specific problems.

For instance, I just started a personal project to help me manipulate Shellcode on Linux in various interesting ways, during the short time I was researching, I was adding functionality to the program as I progressed, but at each stage, as I added functionality, I was always asking... hmmm.... should this be a function? Won't I need to do this again? Aha! I've done this before! Uhh... I really don't want to solve this using this long method, what have I written already that I can modify to solve this as a generalization.

The result of this continuous introspection, which I didn't have to stop and plan for, is currently a library with 5 functions, at the time of this writing. When it became obvious that I was saving myself, was when I had to extend the program to manipulate some environment variable address, and I suddenly discovered I had already written the functions to do smaller parts of what I wanted to do, I just needed to combine them in the main program.

The art of creating libraries as you write, is so important to software development, because in my opinion, it marks a truly active mind that is constantly searching for innovative solutions. The other approach, is to plan the library and tackle it like a project on its own. Well... I guess this works too, but I’m beginning to see that libraries built while solving an actual problem seem to have more utility than libraries built to solve a problem. Hehe... that seems like a contradiction, but take some time to think of that and you will get what I mean.

That is also in my opinion, the crowning difference b/w TOP-Bottom and Bottoms-Up programming. But so many other people have done justice to that topic, that I wont' bother delving into it, at least in this article.

6. Idea Cohesion and Single Point of truth (SPOT Rule)

I learnt about the SPOT rule in Eric Raymond's 'The Art Of Unix Programming'. It essentially says that one thing shouldn't happen in more than one place in a program. Of course, this is serious paraphrasing, but that's essentially what it means in my own understanding.

Practically, it means being on the lookout for multiple areas of code performing the same function or almost the same functions that could be easily made into a function that will exist in only one place, and can be called from many places. This way, there's only one place in your code that you'll ever make changes to, and once you've located it, it keeps debugging and maintaining interesting and relatively stress-less.

Well idea cohesion is very similar, but it’s broader than just actual code functionality. It’s more like Single Point of Logical Truth. What I mean by this is, to keep closely related functionality (logically related in terms of the problem scope), together. This again may seem trivial to some, but imagine if you're looking thru an Class Definition that overloads operators, and right after the constructor, you see an overloaded + operator, then you run into some property definitions, some 3 to 4 functions latter you run into the overloaded - operator. Hmm... sounds funny? Believe me we sometimes do something akin to this. The problem with this is that, it gets hard to predict where in the file a function will be. This may not seem important, but personally (and I believe lots of programmers are like this too), like to think only once about something. I like to be able to say, with a 98% certainty that something will happen in a way, or be in a place, after I’ve studied the situation. This translates to my being able to intuitively know where in a jungle of source code a particular functionality will likely be. It really saves me debugging time.

The other major part of this, is grouping ONLY closely related functionality into ONE source file. I know I mentioned this in point (3) above, but just to emphasize it here again. Being on the lookout for this, works two ways, it helps tell you when your file is getting too large and it can serve as an indicator that you should probably be building another library that is dependent of this one, instead of adding to this one.

Doing this also helps force you to properly define boundaries between your code components. This will come in handy especially during maintenance and debugging where it can help to keep things focused and simple.

7. File Naming

This one can go unnoticed for a long time, if developers on a project doesn’t change, but once developers change, you find that its helpful to give files the most obvious names possible. I was recently working on a C# project (legacy code), that had slight variations of a single data processing engine. Apparently, when the project started, the first processor class: AEngineClass was built. The class file was also named AEngineClass.cs. As time went on, another class of the engine grew up for a slightly different scenario, and was called: BEngineClassWeb but because (okay…I have no idea why), the file was also named BEngineClass.cs. This may seem trivial, but currently, for that project, there are many of these XEngineClass.cs files, and you’re never sure which one contains just the XEngineClass or XEngineClassWeb classes. This recently took my PRECIOUS time because I was setting a break point in the wrong file!!! It would be simpler and better if all XEngineClass classes were stored in XEngineClass.cs files and XEngineClassWeb classes were stored in XEngineClassWeb.cs files.

Anyway, apart from that, giving your files obvious names, can help tell you in refactoring code!!! I’ll say it again… naming your files correctly, with obvious functionality driven names, can help you refactor code. On a java project I worked on recently, I had to build some custom libraries, and one of the classes was a Utilities, class for collecting odd functions I could reuse across other projects. I started by dumping all kinds of functions there, pretty soon, I had a couple of Persistent Data handling functions, so I decided it was time they had their own Class and file. I pulled them out into a separate file under a separate class. Once there were there, I started seeing a pattern in the code (that had always been under my nose, but was sort of cluttered up with other code), and immediately I refactored most of the functions there. Two things happened here. Firstly, I delayed pulling out the Persistent Data handlers into their own class and file, till there was a real need for it (my Utilities class, had gradually started looking like a Persistent_Data_and_Utilities class. Secondly, once the handlers where properly separated, refactoring needs became obvious because of reduced code density. It’s easier to see stuff when there’s almost nothing to look at.

Right now, I’m aggressively chasing away all my wrongly named files in any project I’m working on, and I’m seeing returns immediately, both from me and from my co workers.

Conclusion: (For Now)

At this stage, I’ve (probably not exhaustively) described what makes me feel that a code base is complex, and that I’m wasting precious time on it, that could have been used to do more interesting things.

So far, I know I’ve said a lot that will make some folks grin with recognition and other shake their head in pity (for my ramblings?). Well, I've tried to articulate my own observation as realistically as possible. The key here is in developing a good nose while programming. A good nose (hmm... I should write on that too), will help save your valuable time in future and a bad nose... well... it just blows… literally :)


Post a Comment

<< Home