Skip to content

Service NLP

Carlos Badenes edited this page Mar 31, 2016 · 1 revision

Some Natural Language Processing (NLP) tasks have been externalized as a service to reuse common functionalities and optimize the use of resources. It takes as existing NLP libraries such as Gate or Stanford-Core, as some particular functionalities.

In short, it offers:

  • tokenization: Splits a stream of text into tokens, i.e. words and symbols.
  • sentence splitting: Splits a sequence of tokens into sentences.
  • lemmatization: Generates the word lemmas for all tokens.
  • stemming: Reduces the token to the morphological root of the word.
  • part-of-speech: Labels tokens with their POS tag based on both its definition and its context.
  • entity recognition: Identifies entities such as Person, Organization, Location, Time and Numerical expressions.

Currently, it is implemented as a internal resource and as a external resource having two interfaces: a WS-REST for public clients and a Thrift-based for internal clients.

service-nlp

Clone this wiki locally