Tuesday, December 06, 2005

Listening To Code: The importance of proper nomenclature

Is that even the spelling of Nomenclature up there? ;;)

[hmm… it apparently is correct]

 

Yesterday nite, as I implemented Pop() for yirae singly linked lists, I realized that my return value couldn’t be a ysNode as all my other functions till then had been doing.

Basically, I’d been doing stuff like:

 

ysNode n = NULL;

n = ysNode(n, 5);

 

As you can see, I pass in the node (which is actually the head node of the linked list), and I return the new head of the list. This works well, until I get to POP(), which needs to pull off the first item on the list, and traditionally return it, but unlink it from the list. I couldn’t do:

 

n = ysPop(n);

 

Doing that will loose the data I’m popping. Eventually, I had to do this instead:

 

int data = ysPop(&n);

 

And my declaration for ysPop is:

 

int ysPop(ysNode*);

 

I notice that this is very neat and efficient, and I don’t like my functions having conflicting calling semantics, so I start to implement all my other functions to match this semantic, since I can’t make Pop() match their semantics, so Push() becomes:

 

ysNode n = NULL;

ysPush(&n, 5);

 

succinct and powerfull!!!

 

There is just one problem… now, I have to remember to append the ampersand infront of all my nodes that I’m passing to the various functions… like so: ‘&n’ instead of just ‘n’, since I’m now passing a pointer to a ysNode. Ofcourse, I’m apt to forget that, and infact, many of my ‘make’ invocations remind me of that with an error/warning. Thinking of how to solve this, I begin to think of creating a typedef for ysNode*.

 

Before I continue, a bit of background on ysNode.

 

The definitions if you look in ysnode.c:

 

struct _ysNode {

            int data;

            struct _ysNode* next;

}

 

typedef _ysNode* ysNode;

 

Now, you can see what my basic Node element struct looks like. Ofcourse, I could have been using struct _ysNode throughout my functions, but my code will quickly become ungainly, so according to SAGE advice, I typedef struct _ysNode* as ysNode, and I’m using it throughout. Now, I realize I also have to have a ysNode*, which is essentially a struct ysNode**, I sat back and started thinking… hmm…. Have I made a mistake somewhere? Why do I find myself having to define ysNode*?

 

BAAMM!!! It struck me.

 

A linked list is a collection of items, each item having a reference (or knowing how) to find the next item. Each of these items is a NODE, but the NODES are a distinct ENTITY from the LIST itself. From the way my code is starting to look, there seems to be a LinkedList type struggling to emerge, infact, its shadow is already present in &n (which is a ysNode*), but I hadn’t recognized it yet. Immediately I realized this, I also realized that my public functions were doing the wrong things, namely: exposing nodes to the user of the library.

 

What do I mean. From what I just said previously, it would seem obvious that a NODE is contained in a LIST, but that could as well as just be an IMPLEMENTATION detail… I could by stroke of Genius (AKA Insanity ;) ), also come up with an implementation of Lists that doesn’t use NODES :), I just wonder what would be used, but anyway, the point is that the design is porous as it is exposing the NODES to all eyes. My *public* functions then, SHOULD operate on Lists, and not Nodes. The question is then, what is a list, and what is a node?

 

At the Domain level, a LIST is a collection of Nodes, but at the programmatic level, a LIST is just a pointer to the HEAD of a node… does this sound familiar? Try this, an Array is C, is just a Pointer to the first element of an Array!!! Yup, that a nice zinger… once I realized this, a lot of things just fell into place.

 

The title of this article, is taken from the fact that I WOULD have missed this, if I wasn’t paying attention to what the code was trying to say to me. And the code was able to talk to me clearer because I gave it a good voice to start with – THE NAMES OF THE VARIABLES N STRUCTURES. Its very important to properly name your objects to as accurately as possible model the problem domain you’re trying to solve. If you do this, the code gains a voice, so to speak, and you can read it back and learn from it. Just imagine if I’d stuck to my earlier plan of calling ysNode, yLinkedList. I would NEVER have had the paradigm shift I had in the way I did. Infact, immediately after naming it yLinkedList, I just thought… no… this doesn’t feel right, all I’m passing around are nodes, so I renamed them, and its come back to pay me good for that decision.

 

Its like this, I start out implementing functionality, with a particular rule of thumbs, along the way, I happen to implement some stuff that I don’t exactly plan for from the scratch (b/cos I’m paying attention to refactorings and good coding techniques), then a pattern emerges which I didn’t set out to create. I recognize the pattern and want to optimize it or at least refactor it properly, and in so doing I set out to understand the pattern. The understanding of this pattern, then teaches me something much more than what I knew at the onset of the project. This is my code beginning to talk to me.

 

This may all sound poetic and stuff, but in practice, these things happen all the time… your domain problem becomes clearer as you set out to define the problem in terms of a programmable solution. This is where bottoms-up coding is strongest… as you build small units, you can recognize the patterns that are forming and seek to understand them. If you follow Tops-Down coding, chances are that most of what you’re doing, you’ve already thought of it outside the advantage of being embedded in the problem’s solution, therefore, you’ll miss the tiny details, which are likely to make the solution *elegant*. Ofcourse you may argue that elegance in software is an unreal panacea, or a Holy Grail of sorts, and can’t be achieved in production scenarios, but only in small one on one projects… my answer to that is that, in a proper dev house, the *production software* is just a combination of many *one on one* modules – any project manager will agree with that description. The question then is how to make the coders benefit from the fact that they still end up working on their individual one-on-one projects.

 

Well… I’m still learning, and still growing, and by Jove… Data Structures seems to be a wonderfull way to end 2005 :-)

 

Essien

4 Comments:

At Wednesday, December 07, 2005 7:09:00 AM, Blogger Ugo said...

You can find the best linked list implementation within the linux kernel.

I personally admire their list.h implementation. Its a joy to read.

A generic doubly linked list implemented totally using #defines and the occasional static inline.

http://lxr.linux.no/source/include/linux/list.h

--Ugo

 
At Wednesday, December 07, 2005 7:13:00 AM, Blogger Ugo said...

Nice!

I just read the entire article. Your thoughts evolved pretty closely to the structure of the implementation in the URL I passed.

You should have a lot of fun deciphering it.

--Ugo

 
At Wednesday, December 07, 2005 8:03:00 AM, Blogger Ugo said...

This comment has been removed by a blog administrator.

 
At Wednesday, December 07, 2005 10:07:00 AM, Blogger Essien Ita Essien said...

Ugo,

I just looked at that link... and MY!!!

It's incredible to even do that as a set of macros... it does make a lot of sense though... a whole lot. I'm going to look closer at that stuff tonite, as i'm starting my work on doubly linked lists... and there are some concepts they introduced there which are somewhat new to me.

And hehehe, it gives me a warm feeling to see some of my thoughts albeit in a cleaner manner. :)

Hehe... that's why I keep you close as a pal... Keep your friend close, and your friends who know where to get information on data structures even closer ;)


thnx man

 

Post a Comment

<< Home