Friday, August 05, 2005

exml : first feelings...

First of all Libxml2 is a great library. I truly respect it.

Ok… that said, the C API … it has too many gotcha’s… sheesh!!!

Lets say I have a file ‘root.xml’ that looks like this

<root>
<node1>
<node2>
foo
</node2>

<node2>
bar
</node2>
</node1>
</root>


Using libxml2 directly, I can get a structure representing the whole document with

xmlDocPtr doc = xmlParseFile(“root.xml”);

To get the root node:

xmlNodePtr node = xmlDocGetRootElement(doc);

currently, node will contain information about the <root> level.

If I try to printf() node->name, it correctly reports ‘root’. Nil Problemo.

My problem arises when I start trying to walk the tree. xmlNodePtr has a member called children,
And its supposed to return the children node. Anyways… if I do:

node = node->children;

I expect it to now be pointing at <node1> right? Well, its not… instead when I try to printf() node->name at this point,
It gives me ‘text’.

Ok… before some libxml2 guru crucifies me, I’m sure there’s something I’ve not read the documentation for, but hey… that’s the behaviour I expect… or am I missing something?
Don’t answer… but I guess I am. I kept trying to play with this for a couple of hrs, trying to recursively walk thru any document… and this behaviour kept tripping me.

Anyways, I just thought.. hmm… what is in exml? Afterall what do I have to loose? I have the entire enlightenment cvs tree on my system anyways…. So away I went.

Using exml, I was pleasantly surprised at how it matched up with my thinking and my expectations (I belive my expectations of API’s are usually very high).

Anyways…. To get the entire document with exml I do:


EXML* doc = exml_new();
exml_file_read(doc, “root.xml”);

Almost same as libxml2, and a bit more verbose I admit. I would have preferred it being structured after a constructor paradigm.

Now, I can query the current point that the document is at. Via functions… so if I do:

printf(“tag = %s\nvalue = %s\n”, exml_tag_get(doc), exml_value_get(doc));

nice and easy… ofcourse… you can still an object that refers to the current node, by:

EXML_Node* node = exml_get(doc);

And the previous can be done with:

printf(“tag = %s\nvalue = %s\n”, node->tag, node->value);

So far, nothing really different.

The tripping part is if I want to get the <node1> part of the document. EXML has very handy functions for clearly moving around the document unambiguously. (I don’t know if libxml2 has these as well, but I didn’t wait to find out once I discovered exml’s ease of use)

exml_down(), exml_up(), exml_next(), exml_next_nomove(), exml_goto(), exml_goto_top();

each of these move you in a specific direction without suprising results, and return the value of the next node tag as a char*, or NULL if there is no next node.

Still trying to drill down to ‘foo’ and ‘bar’ from our xml document, I would do this:

//I’m not checking for errors or freeing memory as I’m going, in real code, I’d do these.

EXML* doc = exml_new();
exml_file_read(doc, “root.xml”); //right now we have the whole document, and we’re at <root>
exml_down(doc); //we’re now at <node1>
exml_down(doc); //we’re now at <node2> so we can read and get ‘foo’

printf(“Expecting ‘foo’, got : ‘%s’\n”, exml_value_get(doc));

exml_next(doc); //we’re now at the second <node2> so we can read and get ‘bar’

printf(“Expecting ‘bar’, got : ‘%s’\n”, exml_value_get(doc));


well…that’s it basically… that’s just it.

All in all EXML.h is just 81 lines of defs, white space included. Also, I didn’t need any primer to figure out how to use this library. I just looked at the header file and it was clear (Rule Of Least Surprise).

To my lazy mind… this is currently THE xml C library for me now. The only thing for some other folks may be that it depends on Ecore, another part of the EFL, which actually depends on Evas and EET as well, for me tho… I already depend on the EFL, so I don’t mind one bit.

Did I remember to mention that its built ontop of libxml? Yup… it’s a very nice piece of work… building on the solid layer that is libxml2, but not having all the gotcha’s etc.

Ofcourse, there’s no inherent support (YET) for XPATH, XSLT, etc, and all the goodies that libxml2 already has, still, it works so far, and in less than 15 minutes after looking at the header file for the first time, I already had the weather.com parsing library upto the point where I can inteprete simple search queries…

Nice? You bet!!!

Essien, out.

2 Comments:

At Wednesday, January 18, 2006 10:00:00 PM, Blogger werkt said...

Happened upon your post when I was looking around for e-cvs posts about exml. I'm pleased the API was relatively easy to absorb by someone pretty far detached from the application it was designed for, and I've just recommitted after getting a minor cvs problem fixed. Hopefully I'll get around to implementing the xpath/xslt functions you mentioned, but in the meantime, keep up the good posts!

 
At Thursday, January 19, 2006 2:53:00 PM, Blogger Essien Ita Essien said...

Hey werkt!!!

Thanks for the comments... like I said... the API _IS_ compact, and it doesn't have any gotchas.

Good design call there matey. I work regularly with C, Python and C#, and I tend to have a very Pythonic expectation of APIs these days... (not that they be automatically garbage collected or something, but that they *make sense*), and exml is a very good layer on top of libxml. Good work.

Also i noticed your recent CVS commits... cool.

Keep up the good work.

 

Post a Comment

<< Home