Tuesday, December 9, 2008

Of Twitter and Conferences

I am attending a couple days of the XML-In-Practice 2008 conference. And one thing I have noticed is a decided lack of laptops. Sure people have them but a lot of them are closed during the sessions. That is in stark contrast to Balisage where almost every attendee has a laptop and is taking notes or searching for a rebuttal to the speaker.

However, I did see that a few people are twittering, myself included. Bob DuCharme, Scott Abel, and Adam Hill have put up several posts each...in Scott's case it appears to be something of a torrent of tweets. The links above will take you to their Twitter home pages but, as Twitter is designed to do, these will have moved on from conference tweets by tomorrow.

If you'd like to get a view of the tweets from the conference try these links:

XML In Practice

#xml2008

The first is mostly used by Scott Abel, the second was coined by Bob DuCharme and picked up by Adam and I.

As usual this is a stream of consciousness stuff like Twitter is prone to be. But I think the intrepid reader might find useful nuggets in there.

I have notes from several of the sessions that I hope to write up on Wednesday after I get back from the conference.

Tuesday, September 16, 2008

To "tag" Or To "mark up"

There was a lively discussion recently on the XML-Dev listserv regarding the use of the terms 'tag' and 'mark up'. A synopsis by one of the participants is available here.

I didn't want to address the verb vs. noun question that was the root of the discussion. I was more interested in the use of tag vs. markup (or more correctly, mark up).

The question, to me comes down to professional terminology. To be seen as 'XML-ish' it is probably best to use the term 'mark up', as in "I marked up the document". Since XML comes out of the publishing sector and the term 'mark up' has a long history in that realm the it carries over.

At the same time most non-XML professionals I've met use the sentence "I tagged the document" interchangeably with the previous one. Are they wrong? I don't think so but that comes down to my impression of how they think they are working with the document. For example:

What is this? <table>

What is this? <table>Ikea Round</table>

The first is a tag (a start tag) and the second is an element. So, if you have a document without any pointy-bracket tags in it and you start putting them around existing words, aren't you applying tags (to form elements)? Thus tagging the document?

In the end I have to agree with Debbie Lapeyre when she said:
"You may state which the pros find is best practice, but I think you should also admit there is a popular culture alternative. Those have a habit of taking over the language over the long term. (Contact IS a verb now, as well as a noun, for all that purists may howl.)"

Friday, September 12, 2008

Tagging Data - OrgNames

A few years ago I was working a project where we were converting journal articles from an SGML DTD to an XML DTD. At that time we thought it might be useful to remove or add tagging as the case may be, based on business needs. One of the elements that was deemed to be too complex was the address.

Addresses are one of the first things that we all learn to tag when taking that initial XML class. It's mostly because everyone has one (or three) and thinks they know all the pieces of it; Name, Street, City, Zipcode, etc. Of course, once you do something in the real world that simplicity falls away.

I'll just focus on the piece that is most salient in my mind, the Organizational Name or OrgName. This is the university, business, etc. where the author is worked when (s)he wrote the paper. It normally included several levels of Organizational Division elements or OrgDiv. These were the department, college, divisions where the author specifically worked within the OrgName. Of course there were times when there were several of these and where defining which was OrgName or OrgDiv was complicated.

For simplicity, and since nothing had ever been done with the different elements, the difference between OrgDiv and OrgName was done away with and replaced with commas. It simplified initial markup of the documents and didn't impact the current (or foreseeable) ouputs.

Why is this important? Because, when developing, choosing, or modifying a DTD you must take into account more than just the "we might need it in the future" ideas. Those have to be weighed against costs of marking up the document, handling the elements for output, and increased quality control time for staff.

It seems like tagging everything is the direction we should be going but checking for a return on investment, even when it comes to markup is always a good idea.

Wednesday, August 20, 2008

Fonts and Circumstance

If you are a single employee company like me you know the desire to keep costs down. Since I work with XSL-FO for a good bit of my projects, I often get requests to use fonts outside the normally installed sets.

Obviously for XSL-FO the client is going to need the font installed on site. But for many of my projects I am doing work remotely and it is useful to have the font available locally. Like a recent project where the client was using the Frutiger font family (condensed type). I searched around a little bit and found the whole family for $339.00. That's a little out of my price range for a single project.

A little more google-fu (or if I had just read the Wikipedia article) and I discovered that Microsoft's MSReader application installs the Frutiger Linotype font family by default. So three minutes later I was up and running with a reasonable option for Frutiger Condensed.

Just thought I'd share that sometimes you can find a solid option for free.

Monday, August 18, 2008

Heard In The Halls

Just a link this time...in case people aren't finding it via Google (or your favorite search engine)

Heard In The Halls is a Markup Conference in August in Montreal tradition that I find particularly fun. It shines a light on one of the interesting points of Balisage.

(no, not the stupid one-liners, you cynic)

A blank couple sheets of paper where people write down things they overhear while attending the conference. It always ends up with strange statements that can be taken WAAAAY out of context. In fact, one of the heards this year was, "You can't say anything without it being taken out of context.".

Take a look at the link above and perhaps get a giggle or two. It is a VERY in-crowd kind of thing. But even some of them make sense to the mildly technologically literate.

Friday, August 15, 2008

But Wait There's More...

C. M. Sperberg-McQueen - Closing speech

XML 10 is a good time to have a look back and where we've been, where we wanted to go, where we actually went, and where we want to go from here.


There are two kinds of projects in the world
  1. Barn raisings - gather materials and people, do the thing, and then everybody goes home.
  2. community farming - gather materials and people, work together, but no one EVER goes home.
Standards development is more of the later than the former. Revolutions are almost never barn raisings...success means that you are in charge...and then the work is never done.

The biggest problem is Semantics.
It is all those things that we don't know how to do really well. Goal should be to isolate substructures within Semantics to fix them and thus make them no longer part of Semantics.

"the road to hell is paved with compact syntax"

Next Year in Montreal.

Text retrieval of XML-encoded corpora: A lexical approach

Liam Quinn - W3C - Paper

Text retrieval - building a persistent index that makes finding documents for things they contain more simple.

lq-test design- a text retrieval system he developed a long time ago and is seeing if it will work in the XML world.

He's doing a little demonstration of the index using "75 or 300 MB of data, I don't really recall".

I hope his paper has some of this code in it...might be interesting to examine...it appears he's making his own xml-database that is accessed without the need of XQuery...but only because the needs are so basic.

It is useful? For him it is...

Is it as useful as XQuery - XQ can't do match highlighting. XQ can't mix with broken HTML and text. But overall XQuery is more useful...but perhaps not for concordincies like he is using here.

How much to update to XML? Probably better to rebuild than retrofit.

Practical question - won't run under windows (pipelines are implemented by running a program competly to a temporary file and then giving that to another program....in Unix the programs alternate)

Freedom to constrain: Where does attribute constraint come from, Mommy?

Syd Bauman - Brown University, Women Writers Project - Paper

"This paper does not represent an enormous amount of original research." Syd in a rare self-deprecating moment. (I kid, I had a great dinner with Syd last night...I would love to chat with him for hours, but David Durbin and the German guys(TM) had some great games to play instead)

EOA - Explanation of Acronyms

Freedom to Constrain:
Whose freedom am I talking about? The typical humanities project doing text encoding.
  • subject matter expert
  • XML expert
  • encoders (newkeying, post OCR, vendor)
  • proofreaders, web designers, managers, Research Assistants
Constrain what?
For this talk he is narrowing down to an enumerated list of possible vales of an attribute.

Syd is an interesting speaker. He drives directly through topics and talks in a way that makes them accessible to me, which means they are likely easier for the rest of the room to understand. But then he throws in strange things like referencing a his asthma (due to cats) and then throws a cat image on the screen. Which both works and doesn't work at the same time. For me it is a good presentation style...it gives my brain time to catch up with the topic while he branches off to some strange digression.

Schema languages are used for constraint....duh? Oh he was comparing it to spam filters....I think I get it.

Literate programming....one source file contains the program code and the end-user documentation. Single file advantages include that you are forced to think about documentation as you write the code...that is a GREAT idea...

TEI uses this concept in the ODD file, which uses the concept of declarative constraint with formal documentation language.

Comments/Questions:
Matt Johnson - Lexus-Nexis:
How do we get this information exposed to the user without having to have an XML expert insert this into the file? Basically a nice user interface is required.

Translation between RDF and Topic Maps: Divide and Translate

Christo Dichev - Winston-Salem State University - Paper

Creation of a tool to reuse and integrate between existing Topic Maps and RDFs.

This is a good discussion of of translation between different XML tagsets. What do you drop? What matches? What is duplication in the original/combination?
Peter Brown - Pensive - Paper
He has been speaking very interestingly on the situations and contexts of context. But my system would not find a wireless connection...Hotel Europa really needs to invest in 802.11g routers/access points...so I don't have much in the way of notes. But read his paper because it made some interesting points.

And because I have enjoyed listening to Peter talk during much of the week.

Slides will be available at www.pensive.eu/uid/0203


Thursday, August 14, 2008

Parser possibilities: Why write a markup parser?

Norman Smith - Science Applications International Corporation - Paper

Markup Parser: A parser that handles SGML, ML, WordML, etc transparently.

It is a non-validating parser

Reasons not to write a parser: it is a lot of work, validating parsers are even more work.

Reasons to write one: use multiple meta-languages. Learning experience both of the markup language and maybe a programming language.

Structural metadata and the social limitation of interoperability: A sociotechnical view of XML and digital library standards development

Jerome McDonough - University of Illinois at Urbana-Champaign - Paper

"What I'm really going to try and convince you of that we have a tremendous problem in the XML world...we simply do not have enough standards in XML community."

There is a considerable number of different tagsets that need to be interconnected...but there is no standard of content (dates are the common example and the one he used here).

Socio-technical system theory - States that technology and social context are not separable.

This is kind of like a porthole view into the way librarians handle (and have handled) markup technologies.

Why is XML like a rope? - give them enough flexibility and they'll hang themselves.

Solutions?
Argues for treating translations as standards in the same way we treat tagsets as standards.

Also that education has to look into communication vs control.

Wordle Image

Just a little graphical display of the past few posts. I noticed the NCBI folks messing with this website between sessions and thought I'd do the same.

Kind of cool.

An Onion of Documents and Metadata

D. Matthew Kelleher - Y-12 - Paper

Must create an electronic inspection form using XML....But it must be usable by floor shop managers, look the same as the current forms, and be able to run out to HTML, back into XML, and then out again to PDF.

But it is not just the XML file, but it is also things like image, HTML, transform files, etc. So all this had to be packaged together.

They used a Arbortext due to the interface ease of use for non-XML users.

Workflow- Online Form collects data into a XML data file which is merged into an XML file and then output to the final document.

Next Steps: need to handle form management and automatic generation of HTML. Data validation and entry need to be improved. And of course, train the users....which is ALWAYS a challenge.

Linking Page Images To Transcriptions With SVG

Hugh A. Cayless - Carolina Digital Library and Archives - Paper

Preface: "There is very little practice or theory in this talk."

There is a lot of discussion in certain areas of the blogsphere regarding the problems of text-image linking. Hugh has an upcoming huge manuscript digitization projects and might be able to use this theory/practice.

Goals:
  • Create an SVG overlay of the manuscript page image
  • Analyse the structure of the SVG document to detect lines, etc.
  • Link the groups so produced to structures in a TEI transcription
  • Display the results in a usable GUI
Inkscape (the SVG drawing tool) has a tracing tool...trace the bmp (or jpg, etc) and then plug the resulting SVG output into your XML application (freaking cool...didin't know Inkscape could do that).

He gathered tools from the open source commnity rather than writing his own monolithic thing.

potrace - takes bitmap and converts to a vector graphic image - the image has to be bitmap (using imageMagick) The output from potrace was not what he wanted exactly so he used Inkscape from the command line to convert to absolute coordinates. Then he used XSLT to do a little cleanup (adding specifically named IDs)

He then used lxml - ElementTree and numpy - the script reads in the SVG produced by potrace and filtered through Inkscape, does some filtering,detects the lines, then serializes the results back to SVG and Javascript.

Problems:
  • How much can be automated? Now not much is.
  • How deeply can this be analysed?
  • What is the best testing mechanism?
How to tell potrace the black/white cutoff is the major sticking point right now. Also image pre-processing isn't clearly defined and automation of linking/path disposal is pretty important.

His process is very preliminary but really sounds fascinating to me. I have tons of documents from m grandmothers' homes that I am in fear of losing....to either the elements or the trashcan. Not everyone in the family finds them useful. But if I could digitize them, at least I could maintain the content, if not the source material. I will have to look into aspects of this process.

Reconsidering Conventional Markup for Knowledge Representation

David Dubin - University of Illinois Graduate School of Library and Information Science in Champaign, Illinois - Paper

Looks like a theoretical paper....but it's a Dubin presentation so who knows?

Ok, he is discussing finding the youngest (closest to the leaves in the tree) ancestor of node X and node Y using a type of formal logic programming.

He's using formal logic statements to setup a more difficult finding experience. Unfortunately for you, dear reader, I have not used logic operators in a VERY long time, so I won't be able to provide anything beyond his paper.

If you are interested in using formal logic in your Prolog demo, then you know more about this than I ever will and should just go read the paper. ;-)

Wednesday, August 13, 2008

Dirty Laundry : Comittee Disasters, What Happened,What We Learned

Cavet: I am typing this as quickly as I can...I am probably getting things wrong (names, dates, concepts, whatever). If it doesn't have quotes around it is it completely a paraphrase and not their actual words. Think of this as just getting the atmosphere of the session, NOT some kind of historical record.

David Orchard - Current Available for Employment:
One of the creators of Xinclude Was the BEA standards lead.
Dirty Laundry - URIs and non-use of URIs in web services

His story was about how the SOAP group would not embrace URIs. The lesson learned is that if a group doesn't buy into the feedback then there really isn't any point to give the feedback at all. Sounds to me that you can't force anything on a Working Group if they don't want it...that doesn't sound like a great way to get solid specifications. Maybe I don't want to know how the sausage is made.

James David Mason - Y-12 National Security Complex
In standards since 1981 when he attended the organizational meeting of V1(?). Charles Goldfarb was the chief evangelist behind SGML...and we wouldn't have SGML/XML without him...but...

James tells of seeing Goldfarb standing at a blackboard lecturing while two people were arguing loudly about something else, and someone else is yelling at Goldfarb. The only person with the correct response in the room was the person curled in a fetal position in the corner. Steve Newcomb is sitting next to me and states "have you ever been to a standards meeting?" me: "no" Steve: "He isn't exageratting."

What did he learn? Committee chairmen have to be stuborrn as a mule. Editors are not allowed to lock up a text and show it to no one else...in electronic form.

Lauren Wood -
Chair DOM working group
Her husband is Tim Bray, who was editor of the XML spec, Charles Goldfarb tried to get him to do something by working on her....obviously misunderstanding the professionalism of Lauren Wood.

The hardest thing to deal with is when you can't get companies to care about the topic of the committee. But sometimes when companies care too much it becomes a backroom kind of political games thing. Where people would call around to committee members and their bossses to put pressure for their particular spec change or desire.

Microsoft wanted Tim Bray to drop as XML spec Editor because he was consulting with Netscape. But Softquad, where Lauren Wood was working, took the stand that their suggestion for a replacement Eve Mahler who worked for Arbortext. What is comes down to is that people can do good technical work even if they are working for your competitor.

Mavis Cournane

In 2002 a European standards committee through Oasis where the after-market participants were non-technical and the automotive participants had no interest changing their processes. So the process went nowhere and the process died. No spec was created, it was a failed committee spec. So in 2007 a European Union legislative requirement came through that states automotive manufactures have to hold to the Oasis spec (that doesn't exist and no one agreed to at the time). So this is as a mess.

TO fix the mess...Oasis created a new type called an Oasis format...and the auto makers are basically screwed.

What she learned: Curb the powers of the EU. A sucessful standard will have to have a win-win for the key implementors of the standard. There should be some way to stop failed standards from becoming law.

Jon Bosak -
Started with formal standards on DSSL.

XML namespaces is his committe disaster. Namespaces were in play from almost the beginning...but not the forefront like we have now. Until the WWW6 conference in Santa Clara. Tim Burners Lee wanted to have a way to combine multiple repositories (?) in order to support something he called "the semantic web".

By 1997 the group tried to put the whole namespace thing behind them...but Tim wouldn't let it. By July of 1998 they had something but nothing that they were happy about. It arrived at W3C but Tim wouldn't approve it. Because they were using public identifiers for namespace implementation. In the end they were given one week to base namespaces on attributes based on an existing project of Microsoft. That's what we have here.

Jon doesn't blame Tim for this...although he wasn't happy with him at the time. The way W3C was setup at the time made everything roll through Tim which caused these problems. He also said that there is no way of knowing if the public identifiers would have worked better. But pushing it through too quickly is a real problem.

And governance matters. We need to setup committees and then wait awhile to see how things work out. Pushing things isn't working.

Also be concerned about the transparancy of the process. A lot of the problems came from hidding things.

Patrick Durusau -

A group of vendors wrote part of a copright protection standard that they wanted to push through Oasis. It was a DRM standard...a horrible piece of work. But the opposition wasn't uniform. There were people who wanted to just say "hell no" and hold the process hostage. It eventually goes on until there is a motion to disolve the TC. One of the very few TCs that just died. So the original group took part of the standard, went to ISO, and got it as a standard. But Sony turned around said,we aren't taking this standard...it sucks.

He compares this to OOXML. I don't understand the problem there. First time I heard about it was yesterday.

Microphone Comments (I didn't get them all):
David Lee - How does one get involved with a standards body?
Response: Jon Bosak and Lauren Wood - $$$ is an issue. But figure out what you are interested in, take a look and see if they will let you in, and if you really want to be in.

Simon St. Laurent -
He states that no one seems to be able to enforce their standards or even their own process or procedures? Is there any cause for hope in standards organizations. Simon seems very disturbed by the entire standards process. I wonder if he's written about it anywhere?
Response: Patrick says there have been informal conversation about reform but that's been the way it has been before.

Kurt Cagle - ISO has taken a significant hit in credibility due to the OOXML standard problem (what the hell happened?) How do you reestablish authority again after that kind of failing?
Response:
Jon Bosak - JTC1 is actually responsible but everyone seems to be blaming ISO. JTC1 is a combination of ISO and IEEC (?). So it is hard to get credibiilty when the actual culprit isn't identified correctly?
Response:
James Mason - ISO does a lot of standards that have nothing to do with software or protocalls or anything like that. So the processes and procedures may not always line up and they might be spread a bit thin at times.

Murray Altheim-
The motivation involved in becoming a part of a standard committee. Altruism is often a part of being a member. If you feel as a person that your skills can contribute to a standard that impacts millions of people then you have reached the highest part of charity.

Response: Jon Bosak- "Those of us who weren't put there as attack dogs for our company are there for the glory. Pure and simple." (I sure hope I got that out of context cause that doesn't seem to be the point Murray was making) - He clarified that by glory he means "between you and your maker", not between you and the rest of humanity.

Steve DeRose -
There is a human cost to doing what they are doing. I think that eludes back to what James was talking about earlier. Part of it is human nature (people get in a bad mood,just happens). Almost everyone on the original XML committee threatened to walk out, but no body did. Somehow they got past that and made it work. The way the committees are setup makes it difficult.
  1. There is a cost in $$ and time
  2. Experts may want to serve...but you serve at the will of the corporate interests
  3. Lack of respect from co-workers and academic institutions.
(Seriously? The people working on these things are geniuses. How can they be considered less than by others for taking on the challenge and REALLY understanding how this all works? I don't get it.)

How might we be able to change that bias?

State of the Art of Streaming: Why W3C Xproc, W3C XSLT WGs and ISO SC34 WG 1 are looking closely at streaming

Mohamed Zergaoui - Xproc & XSLT 2.0 WG - Paper

Define Streaming:
  • related to memory usage
  • related to input size
  • related to latency of process
It appears that the document size and the requirement to keep it in memory greatly increases the time of processing and thus the difficulty of constant streaming.

Murray Altheim: How to deal with petabytes of data in streaming? Isn't the stream being returned in an event model? Why couldn't there be a model where you take every single hit on the query and not wait until the end?
Response: Not in scope of the XML but having an entry should trigger a usage...but might be a problem of defining what WG should be handling this.

Kurt Cagle: What's happening with DSDL? The site seems out of date on a lot of points.
Response: ODF and OOXML is taking up most of the time and perhaps they can get back to work now that 2 years have been lost. He seemed tentative about saying that.

Alex Milowski: Has anybody done the work to test XPath 2 within the concept of streaming?
Response: XPath has many different subsets of stream and it is going to be a problem. But there are papers out there talking about that very subject.

Xmlsh - a command language (shell) based on the philospohy of Unix Shells designed for XML

David A. Lee - Epocrates - Paper

Why? - Unix was a radical paradigm shift and 40 years of 'progress' have eroded the core design fundamentals. Data types are not byte/line streams,tools have not evolved with the data (XML), and working with XML is WAY too complicated.

Open Source/closed development - test it out, he needs the feedback but no coding assistance.
www.xmlsh.org

Pure Java, Saxon 9, Log4J, Optional external OS commands (not reinventing the wheel)

Why not write new commands? - The shell is aging....focus is on streams and byte streams...he wants to use XML natively.

He does a bunch of live demo...can't show that, but the link above does take you to the code. If you were here you could have gotten it on a tiny 512 USB stick.

Some annoying things but the one that seems to upset him the most is that console IO is limited in Java...no pure java implementation of clear screen...

Really what this is a personal project that everyone can try out. He has it working..but he'd love to see how it works for other people. Check it out.

Questions/Comments:
Murry Altheim: Has he looked into Groovey Scripting?
Response: more interested in a scripting environment...implementing a Groovy interface seems trivial. Something missing right now is the lack of documentation for calling it from another environment...but that is something for later.

It isn't ready to scale...and David is aware of that. He's hoping to learn something from other speakers about non-serialized XML.

Alex Milowski: Have you looked at XProc?
Response: He is looking for command line interface and XProc seems to be something that requires building entire documents for the XProc. But XProc to xmlsh seems doable.

Kurt Cagle:
Have you thought of integrating with eXist?
Response: I hadn't even heard of eXist until yesterday. but I definitively am going to look into it.

Another Attendee: Unicode support?
Response: It is a goal...since it is java it is native, but the parser is utf-8 but testing hasn't occurred on the extended sets. No fundelmental reason why it can't be end-to-end and I want it to be.

Same Attendee: Memory management?
Response: Xquery serialized to a native XML dbase. eXist might assist this problem.

Topic Maps In Near Real-Time

Sam Hunting - Universal Pantograph - Paper

Update: Simon St. Laurent has more well thought out discussions of the proceedings.
Rejects the search paradigm...he only finds 'finding' interesting.

States that his blog is part of the top 5k in the world. I must figure out if I already have that blog in my reader.

He is presenting with a working Drupal application not a set of slides.

Just in case you ever meet Sam, be aware that his presentation here was a topic map of 'The Criminal Bush Administration'. So, I doubt he's much of a Republican.

I apologize in advance....I just don't get this Topic Map stuff. But he is discussing some 'markup' related things so I'll focus on that.

Looks like Sam also attended another conference this year. Eschacon happened March 28-30, 2008...and this appears to have been the official T-Shirt.

Sam was wearing it during his presentation. You'd think DFH means something in XML land where everything has an abbreviation...but you'd be wrong. DFH is Dirty F-ing Hippy

His Drupal application has validation of the wiki markup (his assertions)...which it seems that wiki-media does not have. Which seems like a serious deficiency if true.

Also something to know about Sam...he just said "I LOVE administering Drupal sites". I think that is a pretty useful thing to know about a programmer of any stripe.

He shows how much of the pieces of his Drupal application function. Basically it is a smooth application for creating a real-time (just hit submit) website.

He had a couple hours of material but Steve Newcomb keeps badgering him about wrapping up...and now we are overtime.

12 minutes ago Steve asked for 10 minutes for questions...this is funny the old guard is haranguing him into submission...I think they are going to pull the video feed any moment.

Tuesday, August 12, 2008

XQuery Tips and Tricks - Kurt Cagle - Nocturne

Discussion on oXygen vs. Stylus Studio. Appears to come down to price and responsiveness. Kurt also mentioned that if you are comfortable with the Stylus Studio interface then you are fine, but he likes the oXygen interface.

He's using eXist as the database. Reminds people that images and binaries are accessible from the database as well.

<Pizza Break - Thank You Anonymous Donor>

One of the coolest features: "You can store the XQueries themselves within the database."

GEEK OUT! He is using creation of characters for a D&D like game as the example....Chaotic Neutral Cleric for me!

Tip: declare function local:randchoose($ctermset as node()) as xs:string{
Defining the function as local allows you to keep that namespace private and have another namespace for the public output.

Tip: {$gender}
Embed Xquery in XML by using the {} operator.

Tip: string{$character/gender}
Remember that XPath returns element() and attribute() (and others) not text, unless specifically requested. Could easily cause problems that aren't immediately clear in a browser.

conf.xml in eXist:
setup - serielizer, scheduler, automatic XSL generation, indentation, XSLT transformer, validation mode, Xquery modules section...optional modules (turn them on -uncomment them)
then run the build routine.

Tip: in eXist -
  • util, and request modules are good to get to know.
  • transform allows the use of XSLT on XML in the XQuery. So use XQuery to collect the data and XSLT to transform it...easier recursion.
  • sql module allows you to get connections from just about anywhere with a URI
I'm not sure there was a lot of information for people using a dbase other than eXist or people who are pretty knowledgeable about XQuery. But it was pretty cool to watch him code in front of us. I like seeing people who do their jobs well, do their jobs well.

Evening Break

Well that day at Balisage went by quickly. I wonder why? The speakers were interesting and the topics were intriguing...perhaps time really does fly when one is having fun.

I had the chance to sit at lunch with several people who had bad trips to Montreal and several who had uneventful ones...but it appears that there is always someone with a worse flying story than I have...I guess I need to travel more.

I've just finished a Skype call with my wife and daughters and I will head out in search of a meal soon. It'll have to be a quick one because 7pm is both Kurt Cagle presenting on XQuery tips and tricks and the W3Québec user group. I was looking forward to the user group discussion but I will have to choose..and since XQuery is more likely to help pay the bills...

But I do wish I could hear about using XML in cool web sites. Hopefully someone else blogs about that session.

BTW, I noticed Simon St. Laurent taking pictures today. I hope he posts them soon to flickr.

Office Suite Markup: Is It Worthwhile

Patrick Durusau - Paper

He is the Editor of ODF....so I imagine he thinks it is worthwhile.

Open Office Allows:
  • We are interested in explicit semantics. But users just want to get things done.
  • Interchange
  • Application Independence
  • Long term preservation of content
Server pack 2 for Office 2007 will have native ODF support

Lots of argument with his conclusions that everyone making their own XML is a good thing. My computer crashed (damn Windows) so I am behind on the conversation. Liam and Tommie had some great back-n-forth with Patrick.

ODF tags are presentational? Patrick says they aren't all presentational but isn't specific about it. Sounds like most of the tags are presentational...which is not how I recall XML being used to its full potential.

Wendell: "What you left out is that these formats are REALLY ugly."

This talk brought up a lot of discussion but I think it can be mostly narrowed down to:
  1. How does this help me as an XML developer?
  2. Microsoft Office is pervasive...it is going to be difficult to get through that?

Optimized Cartesian product: A Hybrid approach to derivation-chain checking in XSD 1.1

Maurizio Casimirri - University of Bologna - Paper

Here comes the technical stuff. I'll try to keep up but check the paper if you are really interested.

XSD has no support for co-constraints (if some type of something then this situation)

XSD 1.1 has an xs:alternative element that allows for a test of the a node.

Basically he is describing how XSD 1.1 will determine the co-constraints based on the new elements.

Now he is describing how to used both the static and dynamic methods of detecting schema errors to form a hybrid method which might expose some errors that might not appear in the document instance.

Questions:
Murray Altheim - there's no way to go up the tree?
Response: Nope, it is specified to stay within parent.<at least I hope that is what they meant>

Note: multiple xs:alternative elements are processed in order. Thus an xs:alternative element that is blank (thereby default) would confirm the test and not allow any further elements to be processed.

I think this one stumped much of the audience. Wow, that makes me feel better.

XML: It was not televised after all...

Eduardo Gutentag - Sun - Paper

Not a history lesson but really it will be a history lesson.

I like his presentation slides...basically he is putting his note reminders on screen like "[Insert daughter's story here]"...cute.

He also keeps changing his title and the conference title on the bottom of the screen...I hope he posts the slides....but perhaps you have to be here to get the joke.

Paul Trevithick, in 1997, called XML "the revenge of the 40-somethings".

Eduardo is providing a story of the beginnings of XML in the late 90's by way of quotes from people at the time.

"It is basically about the owner of the content. The content belongs to the creator not the makers of the document creator tools." Jon Bosak 1998,Paris.

Heard in the hall - Any priest who hears that will say 'oy veh'. - Eduardo Gutentag

XML Agenda:
  • Free as in beer
  • Free as in Spartacus
  • Open interface
  • Interchangeable = interoperable
  • You own the content you produce
The use of XML in OpenOffice enabled the Linux open source explosion because they could use it within the Linux environment without having to reboot into to Windows to read.

"I have a beard and every so often I like to use it." - Eduardo (The Prophet)

The very concept of a static document may simply vanish...You start it, anybody can contribute to it, modify it, contribute to it, own it.

Questions/Comments
"if my project falls on its feet, I've got another one" - Murray Altheim

Digital library application for meta-sharing between mesh enabled (1-laptop-per-child) should they use a CC-licensed idea or nothing...Eduardo advocats nothing..

Book Cover


Wendell Piez -
We have these property rights because we don't want people to take what we've created and profit from it to the exclusion of us. It seems like Wendell is channeling Cory Doctrow by way of the Illiad.
Response: The issue of the defence mechanism is an important one. There are many ways of defending yourself...non-assertion covenants?

Someone else:
A big problem in the digital space is that digital items are non-fungible. I have a book and I can sell it, give it away and then not have it. But with a digitial item I can give it away and still have it in pristine format.

Jon Bosak -
A lot of the reason for saying those things in 1998 Paris is just to cause trouble.

He states that he has actually come around to stating that PDFs should be used instead of XML (at least that is what I think he said). He has a paper on it. Read it here.

Informal Ontology Design: A Wiki-Based Assertation Framework

Murray Altheim - Innocation Centre, National Digital Library, National Library of New Zealand - Paper

There is middle ground between formal knowledge representation and 'what humans do'.

"Wikis are really about sharing information with other people and collaboratively working on a document set."

He threw up a slide of a a cat holding it's ears and did a LOL Cat joke...and then launched into a discussion of Epistimology.

Robert Brandum-big fan of this guy...inferentialism...wow,I am so confused.

Dealing with the problem of organization of information...libraries are really good at classification.

Faceted classification is dynamically creating a system based on its different properties.

"if anyone is familiar with Z39.19" - seriously who is familiar with this?

He is doing a project that is merging two separate sets of data about the genealogy of native New Zealanders. Lots of flowcharts about the system that leads to the wiki engine.

He is giving a presentation of the live page at the Library.

Of course the live presentation died...so he jumps around and says "imagine a map that is spinning and jumping around"...good way to handle a presentation difficulty.

Talk about something that needs context:
"If you were trying to assert your fanau with your hapu, you might be interested in assertions."

What I got out of this is that wikis are useful and cool...and can do cool stuff.

Questions:
Liam Quinn-
Is the translation feature bi-directional? Yes.

Wendell Piez -
"I am so envious,I wish I could do your work."
Problem of complexity...a good game has balance between rules and simplicity. Problem of sufficeint and useful complexity...how do you balance?

Response: It would be best showed by having it up and running for a year and see how people use it. Also there is a set of 'roles' that can be added to make the information more complex but that isn't necessary.

Steven Newcomb -
"I am in awe of what you are doing" (That's pretty cool praise) Has something that uses a wiki to create topic maps.
Technical question - Why use JSP wiki?

Response: Because I started with it because the program I began with is a java based program. He used what he knew and it grew mostly organically from there.

RESTful Services: How XQuery and AtomPub Are About To Change Your world

Kurt Cagle - O'Reily Networks - Paper

The Rise of Restful Services
"The web is not about process". It is a giant database predicated on the idea that every address is a multifaceted thing, that never gives it true shape but only hints to those who are patient.

"How do we handle talking about everything?" Good question...I hope he answers it.

Defines resource:
  • unique
  • at least one representation
  • more attributes beyond ID
  • potential schema - maybe not a real schema but at least definable
  • provides context (defined later)
  • reachable within addressable universe
Not a resource; Mailing Address, $32.76, A Gallon of Gasoline, June 25,1963, love

Resource; web site, resume, aircraft, song, transaction, employee

In the right context they might be resources but it would be a stretch.

Representations: "The resource is the elephant, representation is what the blind men tell you about the elephant."

REST - Representional State Transfer- reflection of the state of of a given resource in a given form.
  • resource centric
  • publishing oriented verbs; Get, Put,Post,Delete
What you are dealing with is a database...you are getting things, putting things, deleting things from the web.

Groves? Doesn't define...I'll link back later when I find something online.

The importance of collections is that it is a set of related resources and a resource in and of itself. A data set is a collection.

RESTful Services:
  • URIs (Uniform Resource Identifiers) to differentiate between representations
  • only a few basic things you can do with it...Get, Put, Delete, etc
Treat the web as a database

Exploring XQuery
Data Abstraction 2010: XQuery may augement or even replace traditional server languages.

Basics of XQuery: shell around XPath 2.0 with SET manipulation, custom defined functions, modularization, schema aware...(at least in 1.1)

Basic FLOWR comparison to SQL....

XQuery Modularization: makes libraries possible, namespaced, promotes OOP-like programming (I'd like to know more about that)

<He's FLYING through the presentation...looks like he needed another 30 minutes....>

Talks about extension modules for eXist...points out that SQL can be integrated into XQuery thus making it possible to work on a SQL database from and XML database.

XQuery server language - this is coming...avoids the plumbling....consolidate multiple data streams...

The Prescription of XRX (Xquery/REST/XForms)
collapse multi-tiered data architecture

XQuery, REST,and ATOM to be a data interchange mechanism. (AtomPub) Provides a "bus" of resources for a set of links that are 'out there'. Provides a method for 'paging' blocks of content.

Closing

Slide link from the presentation.

BTW, due to the time constraints he mostly spoke the slides...so you can get a lot of the presentation from the slides.

Questions
SOA vs RESTful - there is lots of SOA that is RESTful...there isn't a fundemental difference between SOA and ROA.
Response: SOA was hijacked by the XML-RPC people. The fundamental difference is that REST treats the thing you are dealing with as a resource not a service.

Sam Hunting - Production sites to point to?
Response: See the slides

Liam Quinn - W3C
Points out list of XQuery list at W3C that shows ~50 implementations of

Alex Bolusky
eXist has an AtomPub module ("I know cause I wrote it") - eXist runs their wiki off an AtomPub so they are using it.
Response: eXist is becoming impressive for what it does. Doesn't seem as scalable as MarkLogic.

Cool vs. Useful

Why the Balisage: The Markup Conference?

"Because it is expensive" -

Tommie says we are not limiting our scope to XML. So why not a Knowledge conference? Because knowledge is too broad and markup is very directed. It is a way to focus our attention.

That makes sense.

There are many badges at this conference; Speaker, Committee, Staff, but there is also Listener. Tommie wants us all to know that Listener is pretty important too.

She tells of publishers (unnamed) that are converting their new fangled XML to SGML and then running it through their production system....and they don't want anyone to know. Cause it isn't cool.

"Did you know that Schemas are cool and DTDs aren't?" But publishers are telling her that they'll be using the Schemas, "soon". They are afraid that people will find out they aren't "cool".

"Let's pretend there is a mainstream XML community." An average XML-er:
  1. Works with fairly structured data
  2. Wants to get things done quickly and effieintly
  3. Dubious about mixed-content
  4. Doesn't want to hear about things you can't do in XMl (like overlaping structures)
"So imagine my surprise that overlapping markup is...cool."

"Only an idiot would do ". "These people aren't experts, they are fasionistas." "That's the sort of thing that we really don't want to do here."

"This is not an environment for sales but tell us what's interesting about your topic."

Tommie tells about two techies from major XML companies who said: "We'll get it worked out to interchange data and then it is just a small step to international standard." Drew a big laugh from the crowd.

"Topic maps are cool but it doesn't have any edges." "Semantic web is a HUGE number of cool ideas, some of them mutually exclusive."

Tomie takes us on a review of the new search engine Cuil and it's use of photos that seem pretty darn random.

And the reason for this digresion..."keep your eye out for nonsense like this at Balisage".

"There is a big difference between 'true' and 'useful'." "Keep a close eye out for Useful."

Comments:

Sid Bauman - "The laws of cool: Knowledge Work and the Culture of Information" by Alan Lui

Balizage

Steve Newcomb welcomes us to Balizage (with a 'Z'). He tells us that he was seduced into this madness by the military industrial complex. He also tells us that we are the keepers of the worlds standards...although I think NIST might disagree.

James David Mason is introducing the timer to the conference. "We'll allot you some time and it counts down. When the red goes away, so do you."

Debbie LaPayre "The parade of the organizing committee is both moderatly boring and moderately useful. And if you ask us a question we'll all answer, 'Ask Tommie'". If you need an impromtue room just walk up to the desk and say "I need a room for the night".

Is it just me or does that sound dirty?

Debbie also introduces us to the concept of posters. I'll try to take a few pictures of the ones I might understand.

Michael Sperberg-McQueen now steps to the podium. "This is not the first conference on markup, in August, in Montreal." "How many people are here for the first time?" Almost a third of the hands went up. His suggestion for getting to know people; "Pretend you are the host of the party and that it is your job to make sure everyone has someone to talk to."

Debbie is now introducing Tommie Usdin, "Somebody really interesting."

Monday, August 11, 2008

I've arrived

I'm sure you were all worried.

Four things I've learned today or that might help others in the future:

  1. NEVER buy a cheese steak from an airport vendor
  2. When arriving via air in Montreal be prepared to WALK to the Documentation Area
  3. When taking the Airport Shuttle Bus make sure to tell them you want to go to Sheraton: Le Cetre. That way you can just walk a block and a half to the Hotel Europa.
  4. When checking in at the Hotel Europa get the i-hotel access code BEFORE going up to your room.
But the important thing for me is that I am here. I am looking forward to the conference tomorrow. I understand that a few other people did not make it out at a reasonable hour yesterday either. Perhaps we can all commiserate together?

Sunday, August 10, 2008

Sitting In A Railway Station,

Got a ticket for my destination...

Well actually it is an airport but I do have a ticket. And, approximately five hours after it was scheduled to depart I believe we actually will. I say this because I believe they will have a riot if they move us from one more gate or tell us one more time "we'll tell you when the plane arrives". But maybe that is just my cynicism about Post-9/11 air travel.

So my hope is that I reach Montreal in time for a quick nap and then the Versioning Symposium. If not, I hope someone else blogs about it.

In Soviet Russia you wait. You do not ask why you wait." - Anonymous

Update::Bad thunderstorms at the Philly airport. The plane appears to be sitting 20 yards from the gate, but we have no idea what is going on. Communication is not an US Air's strong suit.

Update II: Well I am still in PA. I'll be trying to get out on a flight this afternoon. But it doesn't look good. Perhaps I should just drive?

Saturday, August 2, 2008

Balisage Poster

I registered for Balisage: The Markup Conference (which I still contend is repeating oneself) and almost immediately received an email from Debbie Lapeyre asking me to put up a poster. Being that I have a hard time saying no to Debbie (she can be persausive) I agreed.

Thank god she didn't ask me to get up and speak at the Speaker's Corner at the Versioning Symposium.

So part of my idea involves the a light-hearted look at the connections between things like DocBook, the NCBI DTD, and DITA (and others). But I was stymied by what to call them. Since I am steeped in NCBI I normally refer to these things as 'tag sets'. But I know that DocBook calls itself a schema and DITA is an architecture.

The former is too confusing with for my tastes and the latter sound like a marketing professional infiltrated the working group. So I asked a professional collegue of mine what she thought. She falls on the tag set side as well, but gave me the profound statement of the month:

"But there is nothing you can do that won't sound stupid to someone."

I think, as long as I keep the 'someone's to under half the room, I am in good shape.

Wednesday, July 16, 2008

Mini-bio

I came to the markup community by a circuitous route. In college I thought I was going to be a psychologist. My B.A. is in psychology and I basically double majored in it, taking almost every class available at my alma mater. Then I went to graduate school and learned what psychologists actually do all day; gradually guide people to overcoming their problems.

That's an over simplification of it and I do enjoy solving problems but not as a guide rather as an active participant. So I had an opportunity to work in Residence Life at my graduate school and jumped at the chance to stay on another year. Of course that career turned out to be less exciting than my graduate school days had suggested. The problem with grad school is they teach you how to be a Director of Residence Life and not how to deal with the day-to-day concerns of running a residence hall or dealing with University politics.

After a couple years of being an adult and living in the halls with college students, my patience wore thin. But psychology led me to The American Psychological Association and finally into their Electronic Publishing (now Full-Text Serials) Group. There I was first exposed to SGML, then quickly XML, XSLT 1.0 and the slippery slope that is markup lay out before me.

While at APA I earned my CompSci BS and learned more and more about markup and publishing. The time came for me to move, both professionally and physically, and my family and I relocated to the Philadelphia, PA suburbs. I started my own business and began working with various companies to forward their conversion and publishing goals. In the last year I've begun using XQuery more and more and had the opportunity to attend a MarkLogic developer training.

I don't consider myself an expert at any of these technologies. I know the experts. I've worked with the experts. I consider myself more of the journeyman apprentice. There are things that I know well and things that I know a little and, as I learn more, many things I know nothing about. Freelancing has taught me that there are more technologies in the world than anyone could possibly know, so focusing on XML and related techs seems like a good bet.

In many ways I've never left school. I've been using XSL-FO (or just XSL, depending on how dogmatic you want to be) for 4 years now and I still come upon properties that I've never had reason to use. XSLT still surprises me in how much of a "programming language" it really is. (Try calculating the Fibonacci numbers sometime and you'll see what I mean) And XQuery is challenging me left and right to think differently about things I thought I knew in XSLT.

If I weren't learning things anymore what reason is there to continue?

Greetings and Salutations

As this is the opening post to my company blog I imagine it is best to begin by introducing myself.

My name is Mark Shellenberger and I am a 'Markup Guy'. What does that mean to you? Well, as the reader of this blog it means that the majority of the posts here will be centered around the XML world. That means I will:
  • Discuss an interesting thread on one of the major XML-technology listservs
  • Post a technique that I used to solve a problem
  • Link to interesting uses of XML
  • Semi-live blog any XML conferences I attend
  • Link to other blogged XML conferences
  • Discuss the business of being a business
What you won't see here is World of Warcraft techniques, political rants, questionable humor, vacation photos, or sports rants. You can check out any of the previous links for my friends' blogs on those topics.

I'll do a mini-bio post soon but in the meantime see my company's website; Manorfield Consulting, LLC. And, if you happen to need XSLT, XQuery, XSL-FO, or any type of XML work, please get in touch with me through my contacts there.