Tuesday, September 16, 2008

To "tag" Or To "mark up"

There was a lively discussion recently on the XML-Dev listserv regarding the use of the terms 'tag' and 'mark up'. A synopsis by one of the participants is available here.

I didn't want to address the verb vs. noun question that was the root of the discussion. I was more interested in the use of tag vs. markup (or more correctly, mark up).

The question, to me comes down to professional terminology. To be seen as 'XML-ish' it is probably best to use the term 'mark up', as in "I marked up the document". Since XML comes out of the publishing sector and the term 'mark up' has a long history in that realm the it carries over.

At the same time most non-XML professionals I've met use the sentence "I tagged the document" interchangeably with the previous one. Are they wrong? I don't think so but that comes down to my impression of how they think they are working with the document. For example:

What is this? <table>

What is this? <table>Ikea Round</table>

The first is a tag (a start tag) and the second is an element. So, if you have a document without any pointy-bracket tags in it and you start putting them around existing words, aren't you applying tags (to form elements)? Thus tagging the document?

In the end I have to agree with Debbie Lapeyre when she said:
"You may state which the pros find is best practice, but I think you should also admit there is a popular culture alternative. Those have a habit of taking over the language over the long term. (Contact IS a verb now, as well as a noun, for all that purists may howl.)"

Friday, September 12, 2008

Tagging Data - OrgNames

A few years ago I was working a project where we were converting journal articles from an SGML DTD to an XML DTD. At that time we thought it might be useful to remove or add tagging as the case may be, based on business needs. One of the elements that was deemed to be too complex was the address.

Addresses are one of the first things that we all learn to tag when taking that initial XML class. It's mostly because everyone has one (or three) and thinks they know all the pieces of it; Name, Street, City, Zipcode, etc. Of course, once you do something in the real world that simplicity falls away.

I'll just focus on the piece that is most salient in my mind, the Organizational Name or OrgName. This is the university, business, etc. where the author is worked when (s)he wrote the paper. It normally included several levels of Organizational Division elements or OrgDiv. These were the department, college, divisions where the author specifically worked within the OrgName. Of course there were times when there were several of these and where defining which was OrgName or OrgDiv was complicated.

For simplicity, and since nothing had ever been done with the different elements, the difference between OrgDiv and OrgName was done away with and replaced with commas. It simplified initial markup of the documents and didn't impact the current (or foreseeable) ouputs.

Why is this important? Because, when developing, choosing, or modifying a DTD you must take into account more than just the "we might need it in the future" ideas. Those have to be weighed against costs of marking up the document, handling the elements for output, and increased quality control time for staff.

It seems like tagging everything is the direction we should be going but checking for a return on investment, even when it comes to markup is always a good idea.