Friday, September 12, 2008

Tagging Data - OrgNames

A few years ago I was working a project where we were converting journal articles from an SGML DTD to an XML DTD. At that time we thought it might be useful to remove or add tagging as the case may be, based on business needs. One of the elements that was deemed to be too complex was the address.

Addresses are one of the first things that we all learn to tag when taking that initial XML class. It's mostly because everyone has one (or three) and thinks they know all the pieces of it; Name, Street, City, Zipcode, etc. Of course, once you do something in the real world that simplicity falls away.

I'll just focus on the piece that is most salient in my mind, the Organizational Name or OrgName. This is the university, business, etc. where the author is worked when (s)he wrote the paper. It normally included several levels of Organizational Division elements or OrgDiv. These were the department, college, divisions where the author specifically worked within the OrgName. Of course there were times when there were several of these and where defining which was OrgName or OrgDiv was complicated.

For simplicity, and since nothing had ever been done with the different elements, the difference between OrgDiv and OrgName was done away with and replaced with commas. It simplified initial markup of the documents and didn't impact the current (or foreseeable) ouputs.

Why is this important? Because, when developing, choosing, or modifying a DTD you must take into account more than just the "we might need it in the future" ideas. Those have to be weighed against costs of marking up the document, handling the elements for output, and increased quality control time for staff.

It seems like tagging everything is the direction we should be going but checking for a return on investment, even when it comes to markup is always a good idea.

No comments: