Application Server Configuration
(2Q21)
This section deals with the configuration of the eXist-db Application Server in eXist-db's
main configuration file conf.xml
.
Main configuration file
The main configuration file for eXist-db
is called
conf.xml
, which is loaded the root directory of the
distribution (as specified by the system property exist.home
).
The configuration file conf.xml
is divided into twelve
sections:
-
<db-connection>
: Configures the storage back-end. -
<lock-manager>
: Configures the Lock Manager. -
<repository>
: Settings for the package repository. -
<binary-manager>
: Settings for the Binary Manager. -
<indexer>
: Controls the indexing process. -
<scheduler>
: Job scheduler for system or user jobs such as backups. -
<parser>
: Default settings for parsing structured documents. -
<serializer>
: Default settings for the serializer (external data representation). -
<transformer>
: Default settings for the XSLT Transformer. -
<validation>
: Settings for XML validation. -
<xquery>
: Enable and configure extension modules that contain XQuery functions. -
<xupdate>
: Configuration options related to XUpdate processing.
The following sections describe the most commonly modified of the above elements, including how to change the default behavior of eXist-db's handling of whitespace characters.
db-connection element
This element contains basic default storage settings for eXist-db, including
memory and system limits. Only one <db-connection>
should be specified. An
example configuration for the native back-end is shown below:
<db-connection cacheSize="48M" collectionCache="24M" database="native" files="../data" pageSize="4096" nodesBuffer="-1">
<pool min="1" max="15" sync-period="240000" wait-before-shutdown="60000"/>
<!-- default-permissions collection="0775" resource="0775" / -->
<recovery enabled="yes" sync-on-commit="no" group-commit="no" size="100M" journal-dir="../data"/>
<watchdog query-timeout="-1" output-size-limit="10000"/>
<default-permissions collection="0775" resource="0775"/>
</db-connection>
db-connection attributes
-
database
-
This attribute selects a database system type. Since relational database back-ends are no longer supported by the current release of eXist, only
native
is available. -
files
-
This attribute specifies the directory where the native back-end will keep its database files, and so it is necessary that this directory exists. If a relative path is specified, it will be based on the root directory as defined in the
exist.home
system property. If this data directory does not have write permissions (see User Authentication and Access Control), eXist will internally switch to read-only mode such that any attempt to change the database will throw an exception. -
cacheSize
-
This attribute sets the maximum amount of main memory used by all page buffers (i.e. assuming all page buffers are at full capacity). The database uses this parameter to calculate the maximum size of each internal cache. You can increase this value if your system allows for greater memory use.
While indexing documents, eXist will reserve the amount of memory specified in cacheSize - even if not all caches are filled - and will not use it for temporary data.
The cacheSize should not be more than half of the size of the JVM heap size (set by the JVM
-Xmx
parameter). If the JVM heap is less than 512 megabyte, the cacheSize should even be smaller, e.g. 1/3. -
collectionCache
-
Determines the size of the collection cache, which is a separate caching space. Usually this setting does not need to be changed unless you really have more than a few thousand collections in the db. Increase it carefully, maybe up to 128M.
-
pageSize
-
This specifies the number of bytes used for internal data and B-tree pages. This should be equal to or a multiple of the page size used by the filesystem (usually a multiple of 4096).
-
nodesBuffer
-
Size of the temporary buffer used by eXist for caching index data while indexing a document. If set to -1, eXist will use the entire free memory to buffer index entries and will flush the cache once the memory is full.
If set to a value > 0, the buffer will be fixed to the given size. The specified number corresponds to the number of nodes the buffer can hold, in thousands. Usually, a good default could be
nodesBuffer="1000"
.The default setting, nodesBuffer="-1", can be problematic if you frequently need to store large documents in a multi-user environment. In this case, the index operation may consume most of the memory resources, which means that concurrent threads will be slowed down or even come to a halt.
db-connection/pool element
These settings control the internal database connection pool.
-
min
|max
-
These options specify the minimum and maximum size of the connection pool. This pool restricts the number of parallel (basic) operations that can be executed by the database. Settings should be somewhere between 1 and 20.
Please note that this has nothing to do with the HTTP and XMLRPC server settings - these servers have their own connection pools.
-
sync-period
-
This option defines how often the database will flush its internal buffers to disk (in milliseconds). The sync-thread will interrupt normal database operation after the specified time and write all dirty pages to disk. It also writes a checkpoint to the transaction log. In case of a database crash, only transactions which started after the last checkpoint have to be redone or rolled back. The sync-period should thus not be set too long.
-
wait-before-shutdown
-
This option specifies the maximum amount of time (in milliseconds) that the database will allow for any running processes to complete upon database shutdown. After that, eXist will try to kill the remaining processes.
If wait-before-shutdown is set to a positive number, eXist will stop the db after the specified timeout, even if there were still running database operations. In this case, no checkpoint will be written to the transaction log. If there were any open transactions, eXist will trigger a recovery run after restart.
If wait-before-shutdown is set to -1, eXist will not shut down before all active database operations returned. This is a safe setting, but it may require a manual intervention to stop the JVM.
db-connection/query-pool element
This element configures the global pool for compiled XQuery expressions. For each XQuery, a maximum number of compiled expressions are kept in the pool, and is removed if it hasn't been used for the defined timeout. The XQuery pool is multi-threaded.
<query-pool>
Attributes:
-
max-stack-size
-
The maximum number of queries in the query-pool.
-
size
-
The number of copies of the same query kept in the query-pool. Value
"-1"
effectively disables caching. Queries cannot be shared by threads, each thread needs a private copy of a query. -
timeout
-
The amount of time that a query will be cached in the query-pool in milliseconds.
-
timeout-check-interval
-
The time between checking for timed out queries. For value
"-1"
the time out is switched off, resulting cached queries to remain in the cache forever.
db-connection/recovery element
This element configures the journalling and recovery of the database. With
recovery enabled, the database is able to recover from an unclean database
shutdown due to, for example, power failures, OS reboots, and hanging processes.
For this to work correctly, all database operations must be logged to a journal
file. The location, size and other parameters for this file can be set using the
<recovery>
element.
<recovery>
Attributes:
-
enabled
-
If this attribute is set to
yes
, automatic recovery is enabled. -
size
-
This attributes sets the maximum allowed size of the journal file. Once the journal reaches this limit, a checkpoint will be triggered and the journal will be cleaned. However, the database waits for running transactions to return before processing this checkpoint. In the event one of these transactions writes a lot of data to the journal file, the file will grow until the transaction has completed. Hence, the size limit is not enforced in all cases.
-
journal-dir
-
This attribute sets the directory where journal files are to be written. If no directory is specified, the default path is to the
data
directory. -
sync-on-commit
-
This attribute determines whether or not to protect the journal during operating system failures. That is, it determines whether the database forces a file-sync on the journal after every commit. If this attribute is set to
yes
, the journal is protected against operating system failures. However, this will slow performance - especially on Windows systems. If set tono
, eXist will rely on the operating system to flush out the journal contents to disk. In the worst case scenario, in which there is a complete system failure, some committed transactions might not have yet been written to the journal, and so will be rolled back. -
group-commit
-
If set to
yes
, eXist will not sync the journal file immediately after every transaction commit. Instead, it will wait until the current file buffer (32kb) is really full. This can speed up eXist on some systems where a file sync is an expensive operation (mainly windows XP; not necessary on Linux).However,
group-comit="yes"
will increase the chance that an already committed operation is rolled back after a database crash. -
force-restart
-
Try to restart the db even if crash recovery failed. This is dangerous because there might be corruptions inside the data files. The transaction log will be cleared, all locks removed and the db re-indexed.
Set this option to
yes
if you need to make sure that the db is online, even after a fatal crash. Errors encountered during recovery are written to the log files. Scan the log files to see if any problems occurred. - consistency-check
-
If set to
yes
, a consistency check will be run on the database if an error was detected during crash recovery. This option requires force-restart to be set toyes
, otherwise it has no effect.The consistency check outputs a report to the directory {files}/sanity and if inconsistencies are found in the db, it writes an emergency backup to the same directory.
db-connection/watchdog element
This is the global configuration for the query watchdog. The watchdog monitors all query processes, and can terminate any long-running queries if they exceed one of the predefined limits. These limits are as follows:
<watchdog>
Attributes:
-
query-timeout
-
This attribute sets the maximum amount of time (expressed in milliseconds) that the query can take before it is killed. The setting can be overwritten in an XQuery by specifying the option
exist:timeout
:declare option exist:timeout "time-in-ms";
Please check the documentation on XQuery options.
-
output-size-limit
-
This attribute limits the size of XML fragments constructed using XQuery, and thus sets the maximum amount of main memory a query is allowed to use. This limit is expressed as the maximum number of nodes allowed for an in-memory DOM tree. The purpose of this option is to avoid memory shortages on the server in cases where users are allowed to run queries that produce very large output fragments. The setting can be overwritten in an XQuery by specifying the option
exist:output-size-limit
:declare option exist:output-size-limit "size-hint";
db-connection/default-permissions element
Specifies the default permissions for all resources and collections in eXist
(see User Authentication and Access
Control). When this is not configured, the default mod
(similar to the Unix chmod
command) is set to
0775
in the resources
and
collections
attributes. A different default value may be set
for a database instance. Local overrides are also possible.
lock-manager element
This element contains settings for eXist-db's Lock Manager and Lock Table. The majority of these Lock Manager settings should not be modified unless otherwise suggested by eXist-db Core Development Team.
lock-table/@disabled
-
Disables the database Lock Table which tracks database locks. The Lock Table is enabled by default and allows reporting on database locking via JMX.
Tracking locks via the Lock Table imposes a small overhead per-Lock. Once users have finished testing their system to ensure correct operation, they may wish to disable this in production to ensure the absolute best performance.
Unless nessecary, it is recommened to leave this enabled.
document/@use-path-locks
-
Experimental: Causes path locks to be used for documents as well as collection locks.
This has a performance and concurrency impact, but will ensure that you cannot have deadlocks between Collections and Documents.
Unless nessecary, it is recommened to leave this at its default value.
indexer element
This element sets parameters on how XML files are to be indexed by eXist. An example configuration is shown below:
<indexer caseSensitive="yes" index-depth="5" preserve-whitespace-mixed-content="no" suppress-whitespace="none">
<modules>
<module id="ngram-index" file="ngram.dbx" n="3" class="org.exist.indexing.ngram.NGramIndex"/>
<!-- <module id="spatial-index" connectionTimeout="10000" flushAfter="300" class="org.exist.indexing.spatial.GMLHSQLIndex"/> -->
<module id="lucene-index" buffer="32" class="org.exist.indexing.lucene.LuceneIndex"/>
<!-- The following index can be used to speed up 'order by' expressions by pre-ordering a node set. -->
<module id="sort-index" class="org.exist.indexing.sort.SortIndex"/>
<!-- New range index based on Apache Lucene. Replaces the old range index which is hard-wired into eXist core. -->
<module id="range-index" class="org.exist.indexing.range.RangeIndex"/>
<!-- The following module is not really an index (though it sits in the index pipeline). It gathers relevant statistics on the distribution of elements in the database, which can be used by the query optimizer for additional optimizations. -->
<!-- <module id="index-stats" file="stats.dbx" class="org.exist.storage.statistics.IndexStatistics" /> -->
</modules>
<!-- Default index settings. Default settings apply if there's no collection-specific configuration for a collection. -->
<index>
<!-- settings go here -->
</index>
</indexer>
indexer attributes
-
caseSensitive
-
Specifies whether string comparisons are to be case-sensitive. This option applies to XPath equality tests (i.e. the
=
operator), as well as functions such ascontains()
,starts-with()
andends-with()
.This setting does not apply to operators or functions of the full-text index (e.g.
&=
,|=
,near()
) nor the n-gram index, which are never case-sensitiveWarning:
Setting
caseSensitive="no"
violates the XQuery specs! The option should be regarded as a dirty workaround, which will be removed in the future. Please use the n-gram or full-text indexes for case-insensitive queries or - if that is impossible - specify a collation. -
suppress-whitespace
-
Specifies how the
<indexer>
is to treat whitespace at the start or end of a character sequence. This option only applies to newly stored files, and therefore changing it has no effect on previously stored documents. Possible values for this attribute are:-
leading
- Suppresses leading whitespace. -
trailing
- Suppresses trailing whitespace. -
both
- Suppresses leading and trailing whitespace. -
none
- Preserves all whitespace.
Note that suppressing whitespace at the start or end of character sequences does effectively change the document!
-
-
preserve-whitespace-mixed-content
-
controls how ignorable whitespace is handled. If set to
no
, ignorable whitespace, e.g. between the end tag of an element and the start tag of another, will not be stored into the persistent DOM. This leads to a smaller DOM and usually increases the readability of the XML. Ignorable whitespace is not considered as a part of the logical document model, so removing it doesn't change the document. -
tokenizer
-
This attribute invokes the Java class used to tokenize a string into a sequence of single words or tokens, which are stored to the full-text index. Currently only
SimpleTokenizer
is available. -
index-depth
-
This attribute specifies the depth of the DOM index, or the tree level up to which elements will be added to the index. For example, a value of
2
results in the document root node and all its child elements being indexed; a value of1
only indexes the root node.The DOM index maps unique node identifiers to the nodes' storage locations in the DOM file. Generating this index is time- and memory-consuming. It is furthermore primarily needed to access nodes by their unique node identifier, for example, when serializing XML data for query results or XUpdate - which are operations not normally considered time-critical. Moreover, most XPath expressions can do without this index since they use short-cuts to access the node directly.
Normally only top-level elements are added to the DOM index, whereas attributes and text nodes are always excluded. This results in much smaller index sizes and, consequently, a smaller
dom.dbx
file size. Usually, setting theindex-depth
to a value of2
offers a reasonable compromise of index size and performance.However, if your documents are deeply-structured, you might consider increasing this setting to a level of 3, 4 or 5. For example, if the longest path from the document root to an element node has greater than ten node levels, an
index-depth
setting of4
or5
would probably help to increase overall query performance for some types of queries. -
validation
-
This attribute defines the default setting for the validation of documents by the XML parser. If it is set to
no
, documents will never be validated against an existing DTD or schema. A value ofauto
will leave document validation to the SAX parser.
indexer/modules element
This section configures optional indexing modules. Beginning with version 1.2,
eXist features a modularized indexing architecture, which allows new indexes to
be plugged into the indexing pipeline. The <modules>
section lists and
configures the indexes that will be available to the database:
<modules>
<module id="ngram-index" class="org.exist.indexing.ngram.NGramIndex" file="ngram.dbx" n="3"/>
<!-- <module id="spatial-index" class="org.exist.indexing.spatial.GMLHSQLIndex" connectionTimeout="10000" flushAfter="300" /> -->
</modules>
The only common attributes for each <module>
element are
class
and id
. The other attributes, as
well as any nested elements, are specific to the index implementation. Detailed
information is available in the article on Configuring Database Indexes.
indexer/stopwords element
The file
attribute for this element points to a file
containing a list of stop-words. Stop-words are not added
to the full-text index.
indexer/index element
This configuration element specifies the default index settings. These settings are applied if neither the collection nor any of its ancestors provide a collection configuration.
Configuring indexes via the default settings is not
recommended. If you need a global collection configuration, store one for the
root collection /db
. For more information, see Configuring Indexes.
scheduler element
This section is used to configure asynchronous jobs with eXist's internal scheduler. Three types of jobs are supported:
- startup jobs
-
Startup jobs are executed once during database startup, but before the database becomes available. These jobs are synchronous. The database is blocked to outside requests and no other operations will run at the same time.
- system jobs
-
System jobs require the database to be in a consistent state. The scheduler will run them in an exclusive environment. Once the job is triggered, the database will block all new requests and wait for running operations to complete. It then executes the job. All other database operations will be stopped until the job returns or throws an exception. Any exception will be caught and a warning written to the log.
- user jobs
-
User jobs may be scheduled at any time and may be mutually exclusive or non-exclusive
Below is an example which configures a BackupSystemTask:
<job type="system" name="databackup" class="org.exist.storage.DataBackup" period="120000">
<parameter name="output-dir" value="backup"/>
<parameter name="suffix" value=".zip"/>
<parameter name="prefix" value="backup-"/>
<parameter name="collection" value="/db"/>
<parameter name="user" value="admin"/>
<parameter name="password"/>
<parameter name="zip-files-max" value="28"/>
</job>
Each job is configured in a <job>
element which accepts a number of
standard attributes:
job attributes
-
type
-
The type of the job to schedule. Must be either
startup
,system
oruser
. -
class
-
If the job is written in Java this should be the name of the class that extends either
-
org.exist.scheduler.StartupJob
-
org.exist.storage.SystemTask
-
org.exist.scheduler.UserJavaJob
-
-
xquery
-
If the job is written in XQuery (not suitable for system jobs) this should be a path to the XQuery stored in the database, e.g.
/db/myCollection/myJob.xql
. XQuery job's will be launched under theguest
account initially. The running XQuery may switch permissions through calls toxmldb:login()
. -
cron-trigger
-
To define a firing pattern for the Job using
cron
style syntax. Not applicable to start-up jobs. -
unschedule-on-exception
-
Either
true
(default) orfalse
. Iftrue
and an exception is encountered the job is unscheduled for further execution until a restart. Otherwise, the exception is ignored. -
period
-
Can be used to define an explicit period for firing the job instead of a
cron
style syntax. Expressed in milliseconds. Not applicable to start-up jobs. -
delay
-
Can be used for periodic jobs to delay the start of a job. If unspecified jobs will start as soon as the database and scheduler are initialised.
-
repeat
-
Can be used for periodic jobs to define how many periods a job should be executed. If unspecified, jobs will repeat indefinitely.
Every job can take additional parameters, which are passed as name/value pairs.
serializer element
The serializer is responsible for serializing XML documents or document fragments back into XML. This configuration element defines default settings for various parameters, which can also be specified programmatically. All settings can be overwritten by XQuery serialization options.
serializer attributes
-
enable-xinclude
-
This attribute determines whether
<xinclude>
tags are to be expanded during serialization. Setting the value tofalse
will leave<xinclude>
tags unexpanded. -
enable-xsl
-
Setting this attribute to
true
tells the serializer to pass its output to an XSL stylesheet when it encounters an XSL processing-instruction at the start of the document. -
add-exist-id
-
This attribute tells the serializer to add additional debug attributes to each element. This information includes the internal identifier of the node and source document. Values:
-
all
- Adds debug information to every node in the output. -
element
- Adds debug information to top-level elements only. -
none
(default) - Disables debugging feature.
-
-
indent
-
The serializer defaults to pretty-print the resulting XML source code. Setingt this option to
no
disables pretty-printing. -
match-tagging-elements
-
The database can highlight matches in the text content of a node by tagging the matching text string with
<exist:match>
. This only works for XPath expressions using the full-text index. Set the parameter toyes
to disable this feature.
transformer element
This section determines which XSLT processor will be used by eXist. By default, eXist relies on Saxon.
validation element
Defines the default validation settings active when parsing XML and links to catalog files. Catalog files are used to locate DTDs, schemas and resolve external entities in general.
Please refer to the corresponding documentation on XML Validation.
xupdate element
Inserting new nodes into a document can lead to fragmentation in the DOM storage
file. eXist will thus trigger a de-fragmentation run if the fragmentation exceeds a
certain limit. The frequency of such de-fragmentation runs can be configured in the
<xupdate>
section. The main parameter is called
allowed-fragmentation
:
<xupdate allowed-fragmentation="20" enable-consistency-checks="no"/>
xupdate attributes
-
allowed-fragmentation
-
This attribute defines the maximum number of page splits allowed within a document before a de-fragmentation run is triggered.
-
enable-consistency-checks
-
This attribute is for or debugging purposes only. If the parameter is set to
yes
, a consistency check will be run on modified documents after every XUpdate request. This checks whether the persistent DOM is complete, and all pointers in the structural index point to valid storage addresses that contain valid nodes.
xquery element
<xquery enable-java-binding="no" enable-query-rewriting="no" enforce-index-use="always" disable-deprecated-functions="no" raise-error-on-failed-retrieval="no" backwardCompatible="no">
<builtin-modules>
<!-- Default Modules -->
<module class="org.exist.xquery.functions.util.UtilModule" uri="http://exist-db.org/xquery/util"/>
<!-- ... more modules ... -->
</builtin-modules>
</xquery>
The <xquery>
section is used to enable/disable certain core features of
the XQuery engine. It also lists the XQuery extension modules that will be known to
the query engine by default.
xquery attributes
-
enable-java-binding
-
Set to
yes
to enable the java binding. Giving users full access to all Java classes should be considered a security risk and the feature is thus disabled by default. -
disable-deprecated-functions
-
Set to
yes
to enable XQuery functions marked as deprecated. -
enforce-index-use
-
controls if available range indexes should be used if only some collections in the context set define a matching index. Available settings are:
-
always
to always use an index, even if it does not apply to the entire set of collections being queried. -
strict
to only use indexes if they are defined for the entire collection set.
For example, if you have two collections:
/db/one
and/db/two
, and you define a range index on a certain element<node>
in/db/one
, but not in/db/two
, the query engine would not use the index with settingstrict
if you query both collections. At compile time, eXist doesn't know if<node>
exists in both collections and will not use the index if it determines that an index definition does only apply to a part of the collection set being queried. To use the index, you would need to start your XPath expression with a call tocollection()
, selecting the correct collection with the index defined.If
enforce-index-use
is set toalways
, the query engine only checks if one collection in the collection set has a matching index defined on it. This may lead to an incomplete query result if one forgets certain collections.In other words, when
enforce-index-use
is set to "always", it is the query writer's responsibility to make sure indexes are defined properly. But experience has shown it is easier for users to understand that a certain result is incomplete because an index is missing, whereas they have problems to see that a performance issue is caused by inconsistent indexing. -
-
raise-error-on-failed-retrieval
-
Set to
yes
if a call todoc()
,xmldb:document()
,collection()
orxmldb:xcollection()
should raise an error (FODC0002) when an XML resource can not be retrieved.Set to
no
if a call todoc()
,xmldb:document()
,collection()
orxmldb:xcollection()
should return an empty sequence when an XML resource can not be retrieved. -
enable-query-rewriting
o -
the query engine can often achieve considerable performance improvements by rewriting an XQuery expression into a more efficient form (see the documentation about indexing). However, these features are relatively new. If you have doubts about the correctness of a query result, you may temporarily set
enable-query-rewriting
tono
and see if the result changes in any way. If it does, you have hit a bug which should be reported. -
backwardCompatible
-
|Set to
yes
to enable XPath 1.0 backwards compatibility. The setting mainly effects automatic type conversions, which were less strict in XPath 1.0 than in later versions.
xquery/builtin-modules element
This section lists the XQuery extension modules which will be known to the query engine. The modules in this list can be imported into a query without specifying a location. For example:
<module class="org.exist.xquery.modules.file.FileModule" uri="http://exist-db.org/xquery/file"/>
This establishes a static mapping between the module URI for the file module and the Java class which implements it. When using that module, it is sufficient to provide the correct URI in the import. Specifying a location is not needed, like in:
import module namespace file="http://exist-db.org/xquery/file";
Instead of providing a Java class, one can also specify a src
URI
which must point to the XQuery source code of the module, for instance:
<module uri="http://exist-db.org/xquery/kwic" src="resource:org/exist/xquery/lib/kwic.xql"/>
For the src
attribute, eXist understands the same types of URIs as in an
ordinary XQuery import statement.