This document describes best practices to follow for the implementation of RESTful NLP web services that rely on the NLP Interchange Format (NIF). „NIF is an RDF/OWL-based format that aims to achieve interoperability between NLP tools language resources and annotations.“ As a proof-of-concept, we have implemented NIF wrappers for the Stanford POS tagger and Stanford parser. Both are licensed under Creative Commons Attribution 4.0 International.

This document was published by the Best Practices for Multilingual Linked Open Data community group. It is not a W3C Standard nor is it on the W3C Standards Track.

There are a number of ways that one may participate in the development of this report:

Natural Language Processing Interchange Format (NIF)

NIF is an RDF-based format. The classes to represent linguistic data are defined in the NIF Core Ontology . All ontology classes are derived from the main class nif:String which respresents strings of Unicode characters.

One important subclass of nif:String is nif:Context. It represents a text in its entirety and holds the characters of this text in the nif:isString property. There are several classes (e.g. nif:Word, nif:Phrase, nif:Sentence) for representing partitions of a text, their choice depends on the unit of annotation. All such subunits have a property nif:referenceContext pointing to their respective nif:Context instance. Furthermore, their position inside the context is specified using the nif:beginIndex and nif:endIndex properties. The actual substring represented by these units can be specified using the nif:anchorOf property. Annotations like POS tags or relation types (see below) can be added as properties to the respective nif.String objects.

NIF individuals are identified by URIs following a nif:URIScheme which restricts the URI's syntax. E.g. a URI following RFC 5147 consists of a prefix string followed by „#char=x,y“, where x and y are the start and end positions of the string in its context. For nif:Context URIs y can be omitted or set to the total number of characters in the text.

Recommended service parameters

NIF services should conform to the NIF 2.0 public API specification.

The following parameters are supported by a specification compliant service. ''Required'' parameters need to be specified by the user in order for the service to function. ''Optional'' parameters can be omitted, in which case default values are used by the service.

Required: Optional:

Furthermore, we recommend to implement a parameter ''info'' which, according to the NIF API specification can be used to output all implemented parameters if info=true. In addition to that, we recommend to output supported parameters and default values as well.

Further recommended parameters, which are not part of the NIF API specification, are the following:

Log Messages

NIF services should generate log messages in RDF format using the RDF Logging Ontology. An rlog message is of type rlog:entry and should contain the properties rlog:level, rlog:date and rlog:message.

We recommend to generate a log entry in the following cases:

Example Implementation

Wrapping the Stanford POS Tagger

Given the content of a file namend example.ttl

@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .

<e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25>
a             nif:Context , nif:RFC5147String , nif:Sentence ;
nif:isString  "This is a sample sentence"^^xsd:string .

<e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4>
a                     nif:RFC5147String , nif:Word ;
nif:anchorOf          "This"^^xsd:string ;
nif:beginIndex        "0"^^xsd:int ;
nif:endIndex          "4"^^xsd:int ;
nif:nextWord          <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ;
nif:sentence	      <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:referenceContext  <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7>
a                     nif:RFC5147String , nif:Word ;
nif:anchorOf          "is"^^xsd:string ;
nif:beginIndex        "5"^^xsd:int ;
nif:endIndex          "7"^^xsd:int ;
nif:nextWord          <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ;
nif:previousWord      <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> ;
nif:sentence	      <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:referenceContext  <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9>
a                     nif:RFC5147String , nif:Word ;
nif:anchorOf          "a"^^xsd:string ;
nif:beginIndex        "8"^^xsd:int ;
nif:endIndex          "9"^^xsd:int ;
nif:nextWord          <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ;
nif:previousWord      <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ;
nif:sentence	      <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:referenceContext  <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16>
a                     nif:RFC5147String , nif:Word ;
nif:anchorOf          "sample"^^xsd:string ;
nif:beginIndex        "10"^^xsd:int ;
nif:endIndex          "16"^^xsd:int ;
nif:nextWord          <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25> ;
nif:previousWord      <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ;
nif:sentence	      <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:referenceContext  <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25>
a                     nif:RFC5147String , nif:Word ;
nif:anchorOf          "sentence"^^xsd:string ;
nif:beginIndex        "17"^^xsd:int ;
nif:endIndex          "25"^^xsd:int ;
nif:previousWord      <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ;
nif:sentence	      <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:referenceContext  <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .
our web service wrapping the Stanford POS tagger can be invoked via curl using the following example call.
curl -G http://sc-lider.techfak.uni-bielefeld.de/NifStanfordPOSTaggerWebService/NifStanfordPOSTagger -d v=true --data-urlencode i="$(<example.ttl)"

The input is expected to be in NIF format and to contain at least one nif:Context element as well as a set of nif:Word elements. The service reads the nif:anchorOf values of all nif:Words elements belonging to a given nif:Context found in the input and passes them to the Stanford POS tagger. Each word is then annotated by adding a nif:posTag property with the POS tag as a literal value to the nif:Word.

The example output of the service can be found here:
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4>
a                     nif:RFC5147String , nif:Word ;
nif:anchorOf          "This"^^xsd:string ;
nif:beginIndex        "0"^^xsd:int ;
nif:endIndex          "4"^^xsd:int ;
nif:nextWord          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ;
nif:posTag            "DT"^^xsd:string ;
nif:referenceContext  <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7>
a                     nif:Word , nif:RFC5147String ;
nif:anchorOf          "is"^^xsd:string ;
nif:beginIndex        "5"^^xsd:int ;
nif:endIndex          "7"^^xsd:int ;
nif:nextWord          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ;
nif:posTag            "VBZ"^^xsd:string ;
nif:previousWord      <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> ;
nif:referenceContext  <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25>
a             nif:Context , nif:RFC5147String , nif:Sentence ;
nif:isString  "This is a sample sentence"^^xsd:string .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16>
a                     nif:RFC5147String , nif:Word ;
nif:anchorOf          "sample"^^xsd:string ;
nif:beginIndex        "10"^^xsd:int ;
nif:endIndex          "16"^^xsd:int ;
nif:nextWord          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25> ;
nif:posTag            "NN"^^xsd:string ;
nif:previousWord      <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ;
nif:referenceContext  <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9>
a                     nif:Word , nif:RFC5147String ;
nif:anchorOf          "a"^^xsd:string ;
nif:beginIndex        "8"^^xsd:int ;
nif:endIndex          "9"^^xsd:int ;
nif:nextWord          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ;
nif:posTag            "DT"^^xsd:string ;
nif:previousWord      <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ;
nif:referenceContext  <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25>
a                     nif:RFC5147String , nif:Word ;
nif:anchorOf          "sentence"^^xsd:string ;
nif:beginIndex        "17"^^xsd:int ;
nif:endIndex          "25"^^xsd:int ;
nif:posTag            "NN"^^xsd:string ;
nif:previousWord      <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ;
nif:referenceContext  <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .
Wrapping the Stanford Parser

Our web service wrapping the Stanford dependency parser can be invoked via curl using the following example call where the input is assumed to be given in a turtle file called input.tll.

curl -G http://sc-lider.techfak.uni-bielefeld.de/NifStanfordParserWebService/NifStanfordParser -d v=true --data-urlencode i="$(<input.ttl)"
The service can be used to parse input that is already POS tagged. I.e. it expects the input to be in NIF format and contain

The words are ordered by context (using nif:referenceContext) and position (using nif:beginIndex) in order to reconstruct the original texts. The service then passes the annotated input to the Stanford parser. For each dependency relation of the parse a nif:dependency property is added to the relation's head with the URI of the dependent word as object. As a word can only have one head, the type of the relation is annotated in the nif:dependencyRelationType property of the dependent word (as a literal).

@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4>
a                           nif:RFC5147String , nif:Word ;
nif:anchorOf                "This"^^xsd:string ;
nif:beginIndex              "0"^^xsd:int ;
nif:dependencyRelationType  "nsubj"^^xsd:string ;
nif:endIndex                "4"^^xsd:int ;
nif:nextWord                <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ;
nif:posTag                  "DT"^^xsd:string ;
nif:referenceContext        <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence                <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7>
a                           nif:Word , nif:RFC5147String ;
nif:anchorOf                "is"^^xsd:string ;
nif:beginIndex              "5"^^xsd:int ;
nif:dependencyRelationType  "cop"^^xsd:string ;
nif:endIndex                "7"^^xsd:int ;
nif:nextWord                <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ;
nif:posTag                  "VBZ"^^xsd:string ;
nif:previousWord            <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> ;
nif:referenceContext        <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence                <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25>
a             nif:Context , nif:RFC5147String , nif:Sentence ;
nif:isString  "This is a sample sentence"^^xsd:string .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16>
a                           nif:RFC5147String , nif:Word ;
nif:anchorOf                "sample"^^xsd:string ;
nif:beginIndex              "10"^^xsd:int ;
nif:dependencyRelationType  "nn"^^xsd:string ;
nif:endIndex                "16"^^xsd:int ;
nif:nextWord                <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25> ;
nif:posTag                  "NN"^^xsd:string ;
nif:previousWord            <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ;
nif:referenceContext        <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence                <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9>
a                           nif:Word , nif:RFC5147String ;
nif:anchorOf                "a"^^xsd:string ;
nif:beginIndex              "8"^^xsd:int ;
nif:dependencyRelationType  "det"^^xsd:string ;
nif:endIndex                "9"^^xsd:int ;
nif:nextWord                <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ;
nif:posTag                  "DT"^^xsd:string ;
nif:previousWord            <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ;
nif:referenceContext        <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence                <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

<uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25>
a                     nif:RFC5147String , nif:Word ;
nif:anchorOf          "sentence"^^xsd:string ;
nif:beginIndex        "17"^^xsd:int ;
nif:dependency        <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ,
                      <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ,
                      <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> ,
                      <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ;
nif:endIndex          "25"^^xsd:int ;
nif:posTag            "NN"^^xsd:string ;
nif:previousWord      <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ;
nif:referenceContext  <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ;
nif:sentence          <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .

Chaining

As one of the services described above (the tagger) produces output the other one (the parser) relies on, they can be used to demonstrate the integration of NIF compliant NLP services.

The following nested call combines both calls from the previous two examples. It invokes the tagger which produces the output of example 1 and passes this POS annotated NIF data to the parser. The output is the same as in Exmple 2.
curl -G http://sc-lider.techfak.uni-bielefeld.de/NifStanfordPOSTaggerWebService/NifStanfordPOSTagger -d v=true --data-urlencode i="$(<example.ttl)"
| curl -G http://sc-lider.techfak.uni-bielefeld.de/NifStanfordParserWebService/NifStanfordParser -d v=true --data-urlencode i@-

References

[RFC5147]
RFC 5147 URL: https://tools.ietf.org/html/rfc5147
[Stanford POS Tagger]
Stanford POS Tagger URL: http://nlp.stanford.edu/software/tagger.shtml
[Stanford Parser]
Stanford Parser URL: http://nlp.stanford.edu/software/lex-parser.shtml
[NIF Core Ontology]
NIF Core Ontology URL: http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html
[NIF 2.0 public API specification]
NIF 2.0 public API specification URL: http://persistence.uni-leipzig.org/nlp2rdf/specification/api.html