REST-Style Web API
(2Q19)
eXist-db provides a REST-style (or RESTful) API through HTTP, which provides a simple and quick way to access the database. To use this API, all one needs is an HTTP client, which is provided by nearly all programming languages and environments. Or simply use a web-browser…
Introduction
In the standard eXIst-db configuration, the system will listen for REST request at:
http://localhost:8080/exist/rest/
The server treats all HTTP request paths as paths to a database collection (instead of the file system). Relative paths are resolved relative to the database root collection. For instance:
http://localhost:8080/exist/rest/db/shakespeare/plays/hamlet.xml
The server will receive an HTTP GET request for the resource
hamlet.xml
in the collection
/db/shakespeare/plays
. Itr will look for this collection and check if
the resource is available. If so, it will retrieve its contents and send this back to
the client. If the document does not exist, an HTTP 404 (Not Found)
status response will be returned.
To keep the interface simple, the basic database operations are directly mapped to HTTP request methods wherever possible:
-
GET
-
Retrieves a resource or collection from the database. XQuery and XPath queries may also be specified using GET's optional parameters applied to the selected resource. See GET Requests.
-
PUT
-
Uploads a resource to the database. If required, collections are automatically created and existing resources overwritten. See PUT Requests.
-
DELETE
-
Removes a resource (document or collection) from the database. See DELETE Requests.
-
POST
-
Submits an XML fragment in the content of the request. This fragment specifies the action to take. The fragment can be either an XUpdate document or a query request. Query requests are used to pass complex XQuery expressions too large to be URL-encoded. See POST Requests.
When running eXist-db as a stand-alone server(when the database has been started
using the shell-script bin/server.sh
(Unix) or batch file
bin/server.bat
(Windows/DOS)), HTTP access is supported through a
simple, built-in web server. This web server has limited capabilities, restricted to
the basic operations defined by eXist's REST API (GET
,
POST
, PUT
and
DELETE
).
When running in a servlet-context (the usual way of starting eXist-db), this same
server functionality is provided by the EXistServlet
.
Both the stand-alone server and the servlet rely on Java class
org.exist.http.RESTServer
to do the actual work.
HTTP Authentication
Authentication is done through the basic HTTP authentication mechanism so only
authenticated users can access the database. If no username and password are specified,
the server assumes a "guest" user identity, which has limited capabilities. If the
username submitted is not known, or an incorrect password is submitted, an error page
(403
- Forbidden) is returned.
GET Requests
If the server receives an HTTP GET
request, it first checks the request
for known parameters. If no parameters are given or known it will try to locate the
collection or document specified in the URI database path and return a representation of
this resource the client.
When the located resource is XML, the returned content-type
attribute value is application/xml
and for binary resources
application/octet-stream
.
If the path resolves to a database collection, the retrieved results are returned as an XML fragment. For example:
<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist">
<exist:collection name="/db/xinclude" owner="guest" group="guest" permissions="rwur-ur-u">
<exist:resource name="disclaimer.xml" owner="guest" group="guest" permissions="rwur-ur--"/>
<exist:resource name="sidebar.xml" owner="guest" group="guest" permissions="rwur-ur--"/>
<exist:resource name="xinclude.xml" owner="guest" group="guest" permissions="rwur-ur--"/>
</exist:collection>
</exist:result>
If an xml-stylesheet
processing instruction is found in a requested
XML document, the database will try to apply this stylesheet before returning the
document. A relative path will be resolved relative to the location of the source
document. For example, if the document hamlet.xml
, which is stored in
collection /db/shakespeare/plays
contains the XSLT processing
instruction:
<?xml-stylesheet type="application/xml" href="shakes.xsl"?>
The database will load the stylesheet from
/db/shakespeare/plays/shakes.xsl
.
GET
accepts the following optional request parameters (which must be
URL-encoded):
-
_xsl=XSL Stylesheet
-
Applies an XSL stylesheet to the requested resource. A relative path is considered relative to the database root collection. This option will override any XSL stylesheet processing instructions found in the source XML file.
Setting
_xsl
tono
disables any stylesheet processing. This is useful for retrieving unprocessed XML from documents that have a stylesheet declaration.Warning:
If your document has a valid XSL stylesheet declaration, the web browser may still decide to apply the XSL. In this case, passing
_xsl=no
has no visible effect, though the XSL is now rendered by the browser, not eXist. -
_query=XPath/XQuery Expression
-
Executes the query specified. The collection or resource referenced in the request path is added to the set of statically known documents for the query.
-
_indent=yes | no
-
Whether to return indented pretty-printed XML. The default value is
yes
. -
_encoding=Character Encoding Type
-
Sets the character encoding for the resulting XML. The default value is
UTF-8
. -
_howmany=Number of Items
-
Specifies the maximum number of items to return from the result sequence. The default value is
10
. -
_start=Starting Position in Sequence
-
Specifies the index position of the first item in the result sequence to return. The default value is
1
. -
_wrap=yes | no
-
Specifies whether the returned query results must be wrapped in a parent
<exist:result>
element. The default value isyes
. -
_source=yes | no
-
Specifies whether the query should display its source code instead of being executed. The default value is
no
. See the<allow-source>
section indescriptor.xml
about explicitly allowing this behaviour. -
_cache=yes | no
-
If set to
yes
, the results of the current query are stored in a session on the server. A session id will be returned with the response. Subsequent requests can pass this session id via the_session
parameter. If the server finds a valid session id, it will return the cached results instead of re-evaluating the query. See below. -
_session=session id
-
Specifies a session id returned by a previous query request. Query results will be read from the cached session.
-
_release=session id
-
Release the session identified by session id.
As an example: The following URI will find all <SPEECH>
elements in the
collection /db/shakespeare
with "Juliet" as the <SPEAKER>
.
As specified, it will return 5 items from the result sequence, starting at position
3:
http://localhost:8080/exist/rest/db/shakespeare?_query=//SPEECH[SPEAKER=%22JULIET%22]&_start=3&_howmany=5
PUT Requests
Documents can be stored or updated in the database using an HTTP PUT
request. The request URI points to the location where the document must be stored. As
defined by the HTTP specifications, an existing document at the specified path will be
updated. Any collections defined in the path that do not exist are created
automatically.
For example, the following Python script stores a document (the name is specified on
the command-line) in the database collection /db/test
,. This will be
created if it does not exist. Note that the HTTP header field
content-type
is specified as application/xml
,
since otherwise the document would be stored as a binary resource.
import httplib
import sys
from string import rfind
collection = sys.argv[1]
file = sys.argv[2]
f = open(file, 'r')
print "reading file %s ..." % file
xml = f.read()
f.close()
p = rfind(file, '/')
if p > -1:
doc = file[p+1:]
else:
doc = file
print doc
print "storing document to collection %s ..." % collection
con = httplib.HTTP('localhost:8080')
con.putrequest('PUT', '/exist/rest/%s/%s' % (collection, doc))
con.putheader('Content-Type', 'application/xml')
clen = len(xml)
con.putheader('Content-Length', `clen`)
con.endheaders()
con.send(xml)
errcode, errmsg, headers = con.getreply()
if errcode != 200:
f = con.getfile()
print 'An error occurred: %s' % errmsg
f.close()
else:
print "Ok."
DELETE Requests
DELETE
removes a collection or resource from the database.
POST Requests
POST
requests require an XML fragment in the content of the request.
This fragment specifies the action to take.
-
If the root node of the fragment uses the XUpdate namespace (
http://www.xmldb.org/xupdate
), the fragment is sent to the XUpdateProcessor to be processed. -
Otherwise the root node must have the namespace for eXist requests (
http://exist.sourceforge.net/NS/exist
). The fragment is interpreted as an extended query request. Extended query requests can be used to post complex XQuery scripts that are too large to be encoded in aGET
request.
The structure of the POST XML request is as follows:
<query xmlns="http://exist.sourceforge.net/NS/exist" start="[first item to be returned]" max="[maximum number of items to be returned]" cache="[yes|no: create a session and cache results]" session-id="[session id as returned by previous request]">
<text>
[XQuery expression]
</text>
<properties>
<property name="[name1]" value="[value1]"/>
</properties>
</query>
The root element query
identifies the fragment as an extended query
request. The XQuery expression for this request is enclosed in the
text
element. The start
, max
,
cache
and session-id
attributes have the same meaning as the
corresponding GET
parameters (see GET Requests).
You may have to enclose the XQuery expression in a CDATA section (i.e. <![CDATA[ ... ]]>) to avoid parsing errors.
Optional output properties, such as pretty-print
, can be passed in the
<properties>
element.
An example of POST
for Perl is provided below:
require LWP::UserAgent;
$URL = 'http://localhost:8080/exist/rest/db/';
$QUERY = <<END;
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns="http://exist.sourceforge.net/NS/exist"
start="1" max="20">
<text>
for \$speech in //SPEECH[LINE &= 'corrupt*']
order by \$speech/SPEAKER[1]
return
<hit>{\$speech}</hit>
</text>
<properties>
<property name="indent" value="yes"/>
</properties>
</query>
END
$ua = LWP::UserAgent->new();
$req = HTTP::Request->new(POST => $URL);
$req->content_type('application/xml');
$req->content($QUERY);
$res = $ua->request($req);
if($res->is_success) {
print $res->content . "\n";
} else {
print "Error:\n\n" . $res->status_line . "\n";
}
The returned query results are enclosed in an <exist:result>
element:
<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist" hits="2628" start="1" count="10">
<SPEECH>
<SPEAKER>
BERNARDO
</SPEAKER>
<LINE>
Who's there?
</LINE>
</SPEECH>
... more items follow ...
</exist:result>
Calling Stored XQueries
The REST interface supports executing stored XQueries on the server. If the target
resource of a GET
or POST
request is a binary resource with
the mime-type application/xquery
, the REST server will try to compile
and execute it as an XQuery script. The script has access to the entire HTTP context,
including parameters and session attributes.
Stored XQueries are a good way to provide dynamic views on data or create small services. However, they can do more: because you can also store binary resources like images, CSS stylesheets or Javascript files into a database collection, it is entirely possible to serve a complex application out of the database. For instance, have a look at the example Using XQuery for Web Applications on the demo server.
Cached Query Results
When executing queries using GET
or POST
, the server is able
to cache query results in a server-side session. These results are cached in
memory.
Memory consumption will be low for query results which reference nodes stored in the database and high for nodes constructed within the XQuery itself.
To create a session and store query results, pass _cache=yes
with a
GET
request or set attribute cache="yes"
within the XML
payload of a POST
query request. The server will execute the query as
usual. If the result sequence contains more than one item, the entire sequence will be
stored into a newly created session.
The id of the created session is included in the response. For requests which return a
<exist:result>
wrapper element, the session id will be specified in the
exist:session
attribute. The session id is also available in the HTTP
header X-Session-Id
.
The following example shows an example of the HTTP header and <exist:result>
tag returned by the server:
HTTP/1.1 200 OK Date: Thu, 01 May 2008 16:28:16 GMT Server: Jetty/5.1.12 (Linux/2.6.22-14-generic i386 java/1.6.0_03 Expires: Thu, 01 Jan 1970 00:00:00 GMT Last-Modified: Tue, 29 Apr 2008 20:34:33 GMT X-Session-Id: 2 Content-Type: application/xml; charset=UTF-8 Content-Length: 4699 <exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist" exist:hits="3406" exist:start="1" exist:count="10" exist:session="2"> ... </exist:result>
The session id can be passed with subsequent requests to retrieve further chunks of
data without re-evaluating the query. For a GET
request, pass the session
id with parameter _session
. For a POST
request, add an
attribute session="sessionId"
to the XML content of the
request.
If the session does not exist or has timed out, the server will re-evaluate the query. The timeout is set to 2 minutes.
A session can be deleted by sending a GET request
to an arbitrary
collection URL. Pass the session id in the _release
parameter:
http://localhost:8080/exist/rest/db?_release=0