Stanford CoreNLP annotation pipeline XQuery Module

An XQuery module to integrate the Stanford CoreNLP annotation pipeline library suite into eXist-db. The package can be installed via the package manager in the eXist-db dashboard or you can build it yourself.

Examples

The module currently provides support to create a Named Entity Recognition (NER) classifier model and run some of the pipeline tools, including the NER classifier, on your documents.

Named entity recognition and classification

To create a NER classifier based on your own document's data start with the following steps:

  1. Upload your word processor document for tokenization and formatting of the document tokenize and format
  2. Annotate the spreadsheet you received in step 1.
  3. If you do not annotate the whole text, make sure to make a new spreadsheet document only containing the annotated part.
  4. Upload the annotated spreadsheet to train the classifier model train the classifier. The classifier model will be returned to you within a minute or two depending on the size of your provided text.