CORD19 Search Help

CORD19 Schema

The data indexed in this search engine is CORD19 metadata. It has the following schema, whose fields can be referenced in queries.

  • abstract: the full text of the document's abstract.
  • authors: the list of authors contributing to the document.
  • doi: the DOI number of the document.
  • journal: the journal the document was published in.
  • license: the distribution license associated with the publication of the document.
  • publish_time: the date the document was published to the associated journal, in the form YYYY-MM-DDTHH:mm:ssZ. For example: '2010-01-06T00:00:00Z', meaning January 6th, 2010.
  • title: the title of the document.
  • topic_top_name: the topic terms most closely related to the document, according to MITRE's analysis.
  • top_topic_score: the similarity score of the topic terms most closely related to the document, according to MITRE's analysis.

Note that, despite journal, license, and topic_top_name existing as Filter options to the left of the search interface, they can be specified in the query box manually. Specifying filter fields in queries allows multiple values for the same filter field to be applied at once.

Examples:

  • Query: authors:"Milton, Donald K" AND title:"Biodefense" will return documents where "Biodefense" is in the title and Donald Milton was a contributing author.
  • Query: license:cc-by OR license:unk will return documents released under either of the specified licenses.
  • Query: abstract:[* TO *] will return documents where the abstract is not blank.
  • Query: -abstract:[* TO *] will return documents where the abstract is blank.

Overview of this Search Engine's Query Syntax

This search tool is implemented using the Apache Solr/Lucene open source search engine, and because of that it supports a very rich and expressive variety of search queries, in addition to being able to support quite simple ones, too.

Terms, Phrases, and Operators

Search queries can be constructed from terms and operators, where terms describe the words or phrases being sought, and operators indicate ways those terms can be combined.

Terms consist of a single word or a multi-word quoted phrase:

  • Query: virus
  • Query: influenza

A phrase is a group of words surrounded by double quotes:

  • Query: "corona virus"
  • Query: "influenza virus"

Multiple terms can be combined together with Boolean operators to form more complex queries, such as:

  • Query: "influenza virus" AND vaccine
    • Requires that both of these are present in the documents returned.
  • Query: "influenza virus" OR "corona virus"
    • Requires that at least one of these are present in the documents returned.
  • Query: vaccine NOT "therapy"
    • Requires that the word vaccine is present in the documents returned, just so long as the phrase "therapeutic intervention" is not also present.
  • Query: +vaccine therapy
    • The + symbol is known as the required operator – it requires that the term that appears immediately after it is mentioned in at least one document in order for the query to return a match. In this case, even if the term therapy is present in a document, it will not be returned unless vaccine is also present.

When specifying Boolean operators with the keywords AND, OR and NOT, they must appear in all uppercase.

Querying Specific Fields

The queries shown above search all fields that have been indexed, but sometimes it is useful to search only one of the fields captured in each document. Searches can take advantage of fields to add precision to queries. For example, you can search for a term only in a specific field, such as a title field. To specify a field, type the field name followed by a colon (":") and then the term you are searching for within the field.

For example, suppose an index contains two fields, title and abstract. If you want to find a document called "The Right Way" which contains the text "don’t go this way," you could include either of the following terms in your search query:

  • Query: title:"The Right Way" AND abstract:go

The field is only valid for the term that it directly precedes, so the query title:The Right Way will find only "The" in the title field. It will find "Right" and "Way" wherever it might appear within an individual document.

More about Terms: Wildcards

Terms can indicate wildcards for single or multiple characters (but this is not available in quoted phrases). A single character can be specified as a wildcard like this:

  • Query: te?t
    • The ? wildcard matches any single character. This would match both "text" and "test".
  • Query: te*
    • The * wildcard matches any number of any characters. This would match "test", "testing", "tests", "text", "texts", etc.

Wildcard characters can be applied to single terms, but not to search phrases.

More about Terms: Fuzzy Searches

Fuzzy searches discover terms that are similar to a specified term without necessarily being an exact match. To perform a fuzzy search, use the tilde (~) symbol at the end of a single-word term. For example, to search for a term similar in spelling to "roam," use the fuzzy search:

  • Query: roam~

This search will match terms like roams, foam, and foams. It will also match the word "roam" itself.

An optional distance parameter specifies the maximum number of edits allowed, between 0 and 2, defaulting to 2. For example:

  • Query: roam~1

This will match terms like "roams" and "foam", but not "foams" since that word needs to have two characters changed (removed, replaced or added) to match "roam" instead of just one.

Proximity Searches

A proximity search looks for terms that are within a specific distance from one another. To perform a proximity search, add the tilde character (~) and a numeric value to the end of a search phrase. For example, to search for the word "apache" and "jakarta" within 10 words of each other in a document, use the search:

  • Query: "jakarta apache"~10

In this case the distance referred to here is the number of term "movements" needed to match the specified phrase. In the example above, if "apache" and "jakarta" were 10 spaces apart in a field, but "apache" appeared before "jakarta", more than 10 term movements would be required to move the terms together and position "apache" to the right of "jakarta" with a space in between.

Range Searches

A range search specifies a range of values for a field (a range with an upper bound and a lower bound). The query matches documents whose values for the specified field or fields fall within the range. Range queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically, except on numeric fields. For example, the range query below matches all documents whose top_topic_score field has a value between 0.5 and 1.0.

  • Query: top_topic_score:[0.5 TO 1.0]
    • This query is inclusive of the both values.
  • Query: top_topic_score:{0.5 TO 1.0}
    • This query is exclusive of the both values.
Escaping Special Characters

This Solr-powered search engine interprets some characters as having special meaning when they appear in a query:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : /

To make the search engine interpret any of these characters literally, rather than as a special character, precede the character with a backslash character (\). For example, to search for "(1+1):2" without having Solr interpret the plus sign and parentheses as special characters for formulating a sub-query with two terms, escape the characters by preceding each one with a backslash:

  • Query: \(1\+1\)\:2
Grouping Terms to Form Sub-Queries

Solr/Lucene supports using parentheses to group clauses to form sub-queries. This can be very useful if you want to control the Boolean logic for a query. The query below searches for either "jakarta" and "website", or "apache" and "website":

  • Query: (jakarta OR apache) AND website

This adds precision to the query, requiring that the term "website" exist, along with either the term "jakarta" or "apache."

Grouping Classes within a Field

To apply two or more Boolean operators to a single field in a search, group the Boolean clauses within parentheses. For example, the query below searches for a title field that contains both the word "return" and the phrase "pink panther":

  • Query: title:(+return +"pink panther")