Skip to content
Giuseppe Attardi edited this page Aug 19, 2015 · 4 revisions

As input a network expects a feature vector representation of the data. Such representation is produced by a Converter, which is an instance of the class:

class Converter(object):
    def add(self, extractor)
    def size(self)
    def convert(self, list sentence)
    def lookup(self, sentence, [out=None])
    def update(self, sentence, gradients)

Method convert(), converts a sentence (a list of tokens) into a feature vector. To do so, the converter exploits a list of feature extractors, added to it by means of method add(). The overall feature vector is the concatenation of the feature vectors produced by all extractors.

Method update is invoked to update the feature weights along the gradients computed by backpropagation during training.

A feature extractor is defined as a class that inherits from the abstract class Extractor with the following interface:

class Extractor(object):
   def extract(self, tokens)
   def lookup(self, feature)
   def save(self, file)
   def load(self, file)

Method extract(), applied to a list of tokens, extracts features from each token and returns a list of IDs for those features. The argument is a list of tokens rather than a single token, since features might depend on consecutive tokens. For instance a gazetteer extractor needs to look at a sequence of tokens to determine whether they are mentioned in its dictionary.

Method lookup() returns the vector of weights for a given feature.

Methods save()/load() allow saving and reloading the Extractor data to/from disk.

Currently available Extractors include:

  • Embeddings, extracts the word embedding for the tokens
  • CapsExtractor, extracts capitalization features from tokens
  • PrefixExtractor, adds a feature representing whether the token prefix is present in a common list of prefixes
  • SuffixExtractor, extract a feature representing whether the token has a suffix present in a list of common suffixes
  • GazetteerExtractor, extractor for dealing with the gazetteers typically used in a NER.

Clone this wiki locally