Friday, August 15, 2008

Text retrieval of XML-encoded corpora: A lexical approach

Liam Quinn - W3C - Paper

Text retrieval - building a persistent index that makes finding documents for things they contain more simple.

lq-test design- a text retrieval system he developed a long time ago and is seeing if it will work in the XML world.

He's doing a little demonstration of the index using "75 or 300 MB of data, I don't really recall".

I hope his paper has some of this code in it...might be interesting to examine...it appears he's making his own xml-database that is accessed without the need of XQuery...but only because the needs are so basic.

It is useful? For him it is...

Is it as useful as XQuery - XQ can't do match highlighting. XQ can't mix with broken HTML and text. But overall XQuery is more useful...but perhaps not for concordincies like he is using here.

How much to update to XML? Probably better to rebuild than retrofit.

Practical question - won't run under windows (pipelines are implemented by running a program competly to a temporary file and then giving that to another program....in Unix the programs alternate)

No comments: