Getting started

From l-processor
Jump to: navigation, search

XQuery is the standard query language developed by the W3C to "combine documents, databases, Web pages and almost anything else". While originally designed to query XML documents, it has been evolving to manipulate and transform also other kinds of documents, such as txt or json files. The W3C maintains a useful list of resources related to XQuery, including books.

Install an XQuery processor

There are a lot of XQuery processors out there. The most known free ones are:

Exist-db and Zorba even provide access to the processor online for a demo. Once you have downloaded an installation package, you can easily access the processor via a GUI or at the command line. BaseX and Exist-db offer intuitive GUI: click on the BaseX.jar file (name may slightly vary depending on the release) in the BaseX folder or access Exide (the Exist-db GUI) on your browser (at http://localhost:8080/exist/apps/eXide/index.html).

If you prefer to use a processor at the command line, you can add the directory of the program in your $PATH variable and type the name of the command followed by the file path containing the script to be run:

$ basex /Users/home/yourXquery.xq

$ zorba /Users/home/yourXquery.xq

If you prefer, you can also not modify your $PATH variable: go inside the directory containing the processor and type:

$ ./basex /Users/home/yourXquery.xq

$ ./zorba /Users/home/yourXquery.xq

Detailed information on how to install/run each processor can be found at the accompanying websites.

Use functions/Install the library module

Use single functions

Each function documented in this wiki can be copied and pasted in a GUI, or saved in a file and run at the command line. The following is an example of how you could use the function lp:grc-nonword-tokenize(). What you should add to the bare function in this particular case is highlighted: (i) the namespace declaration is necessary here because the file being queried (https://goo.gl/n2aIh8) is in that namespace; (ii) the function call is of course mandatory to employ the function declaration in a given case. You can also run this script with Zorba in an online demo.

xquery version "3.0" encoding "utf-8";
 
declare namespace d="http://www.tei-c.org/ns/1.0"; 
(:~
 : Separate tokens divided by nonword characters (i.e, Unicode
 : categories Punctuation, Separators, and Others)
 : Copyright http://creativecommons.org/licenses/by-nc-sa/4.0/
 :
 : @author Giuseppe G. A. Celano
 : @version 1.0
 : @see http://l-processor.org/w/grc
 : @see http://l-processor.org/w/grc;lp:grc-nonword-tokenize
 : @param $x the node containing some text to tokenize
 :)
 
declare namespace lp="http://l-processor.org"; 
 
declare function lp:grc-nonword-tokenize($x as node()*)
 
as item()*
{
        <tokens>{	
		for $c in $x//text()
		return
			for $g in tokenize($c, "\W+")
			return 
				if ($g) then <t>{$g}</t> else ""
        }</tokens>
};
 
 
lp:grc-nonword-tokenize(       doc("https://goo.gl/n2aIh8")//d:div[@subtype="book"][@n="1"]//d:div[@subtype="section"][@n="3"]                       )

Use the Library module

(forthcoming)