Saturday, June 25, 2005

Solution Layering

How many times have you actually felt that you were doing too much work while programming?

Hmmm... ok... i know we all feel overworked, and most of us derive a kind of sadistic pleasure just from the feeling of overworked pressurized must-meet-dealine kinda crap :) but sometimes, its plain unneccessary.

Actually, as you grow as a coder, you begin to relish 'elegant' solutions, that not just showcase how smart you are, but also reduce the number of keystrokes you hit and stuff like that (at least, i crave for that elegance in my solutions). What keeps hitting me everytime, is the importance of layering as an engineering art, and the importance of recognizing funcionality meant for those layers and keeping them there.

I'll give a couple of examples.

First of all, imagine you have a database table with two fields [id, name]. You also have this program that is putting data into the database. Supposing you're reading the data you have, from a couple of CSV files, collected from different secretaries in the same company. As you're about to start running your import script, you're suddenly alerted that there could be duplicate information in those CSV's and you have to ensure it doesn't mess up the database. What do you do?

I'm not sure what the no brainer answer is for a lot of ppl, but lets look at the alternatives.

1. Modify your script to (a) Hold the names in memory as it is reading, and compare new names with those already in memory to ensure duplicates don't occur (DONT LAUGH :) )

2. Before each insert, try to retrieve the same name from the database, just to see if its already there. If its not there (its UNIQUE), then insert.

3. Just make the stupid field in the database UNIQUE and don't touch you script at all.

Looking back at the three alternatives now, its almost a no-brainer see which is the most trivial and probably better (at least for me the programmer) solution.

Let's look at this solutions again before I finalize my point.

Solution 1.

In solution 1, we were trying to do EVERYTHING in our code. Without basically, if we like it or not, or we realize it or not, we're going to need to replicate some functionality of a database (in memory data store). Once we commit to that solution, we're introducing a LOT of possible bugs, since our once simple import script will become an import script with an Embedded Database (albeit a poor man's database), and of course since its a poor man's database, our program and we the programmer will suffer from that decision. What will probably save us from this condition is if we realize this, and remember that usually, our programs should STRIVE to DO ONE THING, and DO IT WELL. Also, if there is ANOTHER program/library, that DOES WHAT WE WANT DONE, BETTER than we do it, WE OWE OURSELVES to TRY TO USE IT. Enough said. (Did i hear someone say Opensource?)

Solution 2

In solution two, the change to our program is much simpler, because we're exploiting a property of our Database Server (querying for prior existence), before we insert. We can see that once we involve the Database System in anyway, our headaches automagically reduce to a very negligible minimum. This is almost acceptable, and in some cases of software layering is very ok. At least, we've pushed the real data manipulation to the system that was designed for data manipulation... the database itself.

Solution 3

The only reason why solution 2 is not the best solution is because Solution 3 is possible. Probably not in all cases, as there maybe some stripped down version of an embedded database to be used in a low resource application that will not have the ability we exploit in Solution 3, but as long as it exists, it is a preferred solution. Here, we totally shift EVERYTHING to the database layer. This is true layering. And our first advantage is ultimate simplicity... we DONT ALTER OUR CODE AT ALL. Now, that is a benefit i'm proud of. On the other hand, we have to be aware that this can be done, else we can't benefit from this. This brings to mind one of the nuggets i came across once about being a good programmer. It said to LEARN ABOUT THE ADVANCED FEATURES OF YOUR LANGUAGE(s). The idea being that most of those advanced features are easier ways of solving very real problems.

Now for a quick summary.

Its always beneficial for us everyone if we can properly design our solution systems to exploit designed-for strengths of various components of our systems. Infact, if we don't, we almost invariably build weaker systems. This is why working harder is not always a good thing, but working smarter usually translates as a problem solver.

Now, at times, it is not obvious that there is a better layer to handle your problem, so what do you do? Well, i usually think about my implementations, and everytime i find my self doing too much work, it *smells* wrong. I may not know the best solution immedietly, but i do leave TODO's and other kinds of hints in my code, so if a wiser me or a wiser someone else comes along, that code can be improved latter. Its usually better to roll-out a quick dirty solution within your time constraint than a better solution out of budget time. Your programming buddies may hail you for that, but your boss will chew you out, and you may have cost your company an actual job.

The key is to COMMIT TO IMPROVEMENT. If you identify a problem that you don't have the mental resources to solve now, not it down, and solve it latter. If on the other hand, you CAN solve it now, please do, or your head WILL Fly :)

Ok... now, repeat after me... "I am done reading this blog entry for today..."


Post a Comment

<< Home