How to Search¶
Resources are searchable with a specially-formatted search request to their “browse” URL. For example, you may submit a request to the root URL:
GET / HTTP/1.1
And receive a response like this:
HTTP/1.1 200 OK
...
{
"resources": {
"browse": {
"genre": "/genres/",
},
"view": {
"genre": "/genre-view/id?"
},
}
}
Thus in this case clients should use the /genres/
URL to search for a genre. Most of the rest
of this section describes how to format the search request to obtain desired results.
The Cantus API also allows searching for resources of unknown type. For this use the :http:search:`/(browse.all)/` URL.
Search Request¶
The request body for a search request must be a JSON string. One member is required,
"query"
, which holds the search query in a string. The Cantus-Specific Extension Headers section describes
which headers may be used to modify the formatting of search results. The following request headers
may also be incorporated directly in the SEARCH request body, as indicated below:
- X-Cantus-Include-Resources as
"include-resources"
- X-Cantus-Fields as
"fields"
- X-Cantus-Per-Page as
"per-page"
- X-Cantus-Page as
"page"
- X-Cantus-Sort as
"sort"
If a header and request body member hold conflicting values (e.g., X-Cantus-Page: 10
in the
headers but "page": 10
in the request body) the server MUST use the value from the request body,
disregarding the value from the header, even if the request body’s value is invalid and the header’s
value is valid. The same error response codes SHOULD be provided as for an error in the
corresponding header.
The grammar of the "query"
string is described below in <cantus query grammar>.
Example Search Query¶
The following search request example yields all the chants on a folio of the same manuscript. You
may “flip the pages” of the manuscript by changing the “folio” value in the "query"
string.
SEARCH /(browse.chants)/ HTTP/1.1
X-Cantus-Per-Page: 50
{
"query": "+source_id:123614 +folio:042r",
"sort": "sequence,asc"
}
The following query is also possible, replacing the source_id
field with source
. The server
will automatically search for the source_id
on behalf of the user agent, but this is obviously
more error-prone than using the source_id
directly, if it is known to the user agent. Refer to
the lengthier discussion below for more information.
SEARCH /(browse.chants)/ HTTP/1.1
X-Cantus-Per-Page: 50
{
"query": '+source:"Klosterneuburg, Augustiner-Chorherrenstift - Bibliothek, 1010" +folio:042r',
"sort": "sequence,asc"
}
Although it would be a nice touch, you cannot use the X-Cantus-Page
header to “flip the pages”
in the manuscript. Furthermore, if the X-Cantus-Per-Page
header is not set manually to an
arbitrarily high value, users may inadvertently miss some chants on some pages.
“cantus” Query Grammar¶
The “cantus” query grammar is inspired by Solr’s “standard query parser,” but differs notably. Unlike with Solr, query parameters are part of the request body rather than the URL—as required by the HTTP search method. Also, query parameters are modified and added by the server implementation before being sent to Solr, in order to fetch the expected results. Therefore, even if a server implementation does use Solr, which is not required by this API, clients should not expect their query will be submitted to the Solr server verbatim. This query grammar will be retained even if additional grammars become available in future versions of the API.
All parameters belong in the “query” member of the request body, described in Syntax in the “query” String.
The fields available depends on the resource type being queried (refer to the relevant Resource Types subsection for more information). Some fields—those that refer to a resource type—also have a variant suffixed with “_id”to allow more accurate ID-based Query. For those resources, ID-based filtering is preferred; otherwise a Name-based Query will happen.
For example, a query at the /(browse.source)/
URL may use the following content-based fields:
id, title, siglum, provenance_detail, date, source_status_desc, summary, liturgical_occasions,
description, indexing_notes, and indexing_date. In addition, the following fields correspond to
another resource, so they may be used in ID-based filtering with an “_id” suffix, or in a name-based
sub-query: rism, provenance, century, notation_style, editors, indexers, proofreaders, segment,
and source_status.
In all cases, any unknown, invalid, or inapplicable data are ignored. If all data are ignored, an
empty result body will be provided. For example, a search to the /(browse.source)/
URL for
{'query': '+city:Waterloo'}
will always return no results because Source resources do not have
a “city” field.
Syntax in the “query” String¶
The syntax of this string is kept as close as possible to that of the Solr standard query parser. The “query” string MUST NOT use URL encoding, but it SHOULD be escaped in the same way as any other JavaScript string.
You may include search terms the following ways:
- Term searches by using that word (e.g.,
antiphon
). Beware this does not match similar terms, or partial terms—”antiphoner” will not be included in the results of this search. - Phrase searches with
"
(e.g.,'"of bingen"'
will not match “bingen” unless preceded by “of”). Note that this requires double-quote marks; single-quote marks will not work. - Wildcard with
?
and*
, matching a single character and zero or more characters, respectively. You may want to use the*
wildcard more often than not, since not using it may lead to fewer results than expected. - Fuzzy searches by appending
~
, which returns results arbitrarily similar to a term. For example,antiphon~
would also match “antiphoner.” - Proximity searching with
~
and an integer, as in"manuscript available"~5
, which matches “manuscript is available” and “manuscript is freely available.” - Range searches, as in
date:[1300 TO 1400}
matches the “date” field between 1300 and 1400, including 1300 itself but not 1400 itself. May also use alphabetically ordered ranges. - Boosting term or phrases with
^
and a positive number. The default boost value is 1. The higher a term’s score including boost, the higher it will appear in the default sort (that is, unless the sort field is changed). - Field specification with
:
, as in'incipit:*deus*'
, which will return every Chant where “deus” is part of the “incipit” field. - Boolean operators
&&
,!
, and||
, or their word equivalentsAND
,NOT
, andOR
, which must be capitalized. - Requirement operators
+
and-
. which require that a term is or is not present in the results, respectively. The default (not using these symbols) means that a term is optional, though documents matching more terms will have a higher relevance score. - Grouping with
()
, as in'(cat AND breading) OR silliness'
.
Refer to this page for more complete descriptions.
Fetching a Resource with Its “id”¶
It is possible to fetch a single resource with a known “id” value using a SEARCH
query,
though we recommend you use the resource’s URL. For example, /(browse_indexer)/14
will fetch
the Indexer with an “id” of 14
. This requires less server-side processing, and reduces the
chance of other query parameters interfering. However, the “id” field is still useful in a
SEARCH
query to obtain a range. For example, id:[14 TO 16]
will return the resources
with “id” of 14
, 15
, and 16
.
ID-based Query¶
When you want to limit search results by a particular resource and you know its “id,” use an ID-based filter. This search strategy is more accurate than name-based sub-queries, so we prefer it whenever possible.
For example, if the “id” of the “Antiphon” genre is 122
, the “id” of the “Ljubljana,
Nadškofijski arhiv (Archiepiscopal Archives), 19 (olim 18)” source is 123659
, and the “id” of
the “Jacobi” feast is 2378
, we can find all the Jacobi antiphons in that manuscript with the
following query:
SEARCH /(browse.chants)/ HTTP/1.1
{"query": "genre_id:122 AND source_id:123659 AND feast_id:2378"}
The equivalent name-based query follows:
SEARCH /(browse.chants)/ HTTP/1.1
{"query": 'genre:antiphon AND source:"Ljubljana, Nadškofijski arhiv (Archiepiscopal Archives), 19 (olim 18)" AND feast:Jacobi'}
Name-based Query¶
When you want to limit search results by a particular resource but you do not know the “id,” you can use a name-based sub-query to avoid submitting two queries. For example, to search for Easter antiphons that mention “jesus” in the incipit, you might submit this query:
SEARCH /(browse.chant)/ HTTP/1.1
{
"incipit": "jesus",
"feast": "pascha",
"genre": "antiphon",
}
On the server side, the “_name” fields are first replaced with the corresponding “_id” fields by running a search on the appropriate resource type where the “_name” field is “name,” and using all the returned “id” values in a final search. For example, the preceding example is equivalent to submitting the following three queries:
SEARCH /(browse.feast)/ HTTP/1.1
{"name": "pascha"}
<!-- returns one feast with an id of "08020100" -->
SEARCH /(browse.genre)/ HTTP/1.1
{"name": "antiphon"}
<!-- returns one genre with an id of "422" -->
SEARCH /(browse.chant)/ HTTP/1.1
{
"incipit": "jesus",
"feast_id": "08020100",
"genre_id": "422",
}
The benefit of a name-based sub-query is that using fewer requests means transmitting less data and getting results sooner. The disadvantage is that the results may be much less useful if the “field_name” result provides many more results, or unexpected results. The preceding search, for example, returns results associated with the “Pascha Annotinum” feast, which is not Easter. Because it is virtually impossible for a client or server to predict whether users are running into this problem, ID-based filtering is preferred whenever a resource “id” is available.
Unsearchable Resource Types¶
I decided it did not make sense to search for these—users will always want to search something else too.
- Portfolio
- Segment
- Source Status