http://exist-db.org/xquery/kwic
KWIC module: formats query results to display
keywords in context (KWIC). A configurable amount of text is displayed to the left and right of a matching keyword (or phrase). The module works with all indexes that support match highlighting (matches are tagged with an <exist:match> element). This includes the old full text index, the new Lucene-based full text index, as well as the NGram index. The
kwic:summarize() function represents the main entry point into the module. To have more control over the text extraction context, you can also call
kwic:get-summary() instead. For example, the following snippet will only print the first match within a given set of context nodes ($ancestor):
let $matches := kwic:get-matches($hit)
for $ancestor in $matches/ancestor::para | $matches/ancestor::title | $matches/ancestor::td
return
kwic:get-summary($ancestor, ($ancestor//exist:match)[1], $config)
kwic:get-context($root as element(), $match as element(exist:match), $mode as xs:string) as node()*
Retrieve the following and preceding text chunks for a given match.
kwic:substring($node as item(), $start as xs:int, $count as xs:int) as item()?
Like fn:substring, but takes a node argument. If the node is an element, a new element is created with the same node-name as the old one and the shortened text content.
kwic:display-text($text as text()?) as node()?
kwic:callback($callback as function?, $node as node(), $mode as xs:string) as xs:string?
kwic:truncate-previous($root as node(), $node as node()?, $truncated as item()*,
$max as xs:int, $chars as xs:int, $callback as function?)
Generate the left-hand context of the match. Returns a sequence of nodes and strings, whose total string length is less than or equal to $max characters. Note: this function calls itself recursively until $nodes is empty or the returned sequence has the desired total string length.
kwic:truncate-following($root as node(), $node as node()?, $truncated as item()*,
$max as xs:int, $chars as xs:int, $callback as function?)
Generate the right-hand context of the match. Returns a sequence of nodes and strings, whose total string length is less than or equal to $max characters. Note: this function calls itself recursively until $nodes is empty or the returned sequence has the desired total string length.
kwic:string-length($nodes as item()*) as xs:integer
Computes the total string length of the nodes in the argument sequence
kwic:get-summary($root as node(), $node as element(exist:match),
$config as element(config)) as element()
kwic:get-summary($root as node(), $node as element(exist:match),
$config as element(config), $callback as function?) as element()
Print a summary of the match in $node. Output a predefined amount of text to the left and the right of the match.
kwic:expand($hit as element()) as element()
Expand the element in $hit. Creates an in-memory copy of the element and marks all matches with an exist:match tag, which will be used by all other functions in this module. You need to call kwic:expand before kwic:get-summary. kwic:summarize will call it automatically.
kwic:get-matches($hit as element()) as element(exist:match)*
Return all matches within the specified element, $hit. Matches are returned as exist:match elements. The returned nodes are part of a new document whose root element is a copy of the specified $hit element.
kwic:summarize($hit as element(), $config as element(config)) as element()*
kwic:summarize($hit as element(), $config as element(config),
$callback as function?) as element()*
Main function of the KWIC module: takes the passed element and returns an XHTML fragment containing a chunk of text before and after the first full text match in the node. The optional config parameter is used to configure the behaviour of the function: <config width="character width" table="yes|no" link="URL to which the match is linked"/> By default, kwic:summarize returns an XHTML fragment with the following structure: <p xmlns="http://www.w3.org/1999/xhtml"> <span class="previous">Text before match</span> <a href="passed URL if any" class="hi">The highlighted term</a> <span class="following">Text after match</span> </p> If table=yes is passed with the config element, a tr table row will be returned instead of a span (using the same class names).