License Apache 2.0 Python 3.6

Neural Named Entity Recognition and Slot Filling

This component solves Named Entity Recognition (NER) and Slot-Filling task with different neural network architectures. To read about NER without slot filling please address README_NER.md. This component serves for solving DSTC 2 Slot-Filling task. In most of the cases, NER task can be formulated as:

Given a sequence of tokens (words, and maybe punctuation symbols) provide a tag from a predefined set of tags for each token in the sequence.

For NER task there are some common types of entities used as tags:

In this component

Furthermore, to distinguish adjacent entities with the same tag many applications use BIO tagging scheme. Here “B” denotes beginning of an entity, “I” stands for “inside” and is used for all words comprising the entity except the first one, and “O” means the absence of entity. Example with dropped punctuation:

Restaraunt  O
in          O
the         O
west        B-LOC
of          O
the         O
city        O
serving     O
modern      B-FOOD
european    I-FOOD
cuisine     O

In the example above, FOOD means food tag, LOC means location tag, and “B-“ and “I-“ are prefixes identifying beginnings and continuations of the entities.

Slot Filling can be formulated as:

Given an entity of a certain type and a set of all possible values of this entity type provide a normalized form of the entity.

In this component, the Slot Filling task is solved by Levenshtein Distance search across all known entities of a given type. Example:

There is an entity of “food” type:

chainese

It is definitely misspelled. The set of all known food entities is {‘chinese’, ‘russian’, ‘european’}. The nearest known entity from the given set is chinese. So the output of the Slot Filling system should be chinese.

Configuration of the model

Configuration of the model can be performed in code or in JSON configuration file. To train the model you need to specify four groups of parameters:

In the subsequent text we show the parameter specification in config file. However, the same notation can be used to specify parameters in code by replacing the JSON with python dictionary.

Dataset Reader

The dataset reader is a class which reads and parses the data. It returns a dictionary with three fields: “train”, “test”, and “valid”. The basic dataset reader is “ner_dataset_reader.” The dataset reader config part with “ner_dataset_reader” should look like:

"dataset_reader": {
    "name": "dstc2_datasetreader",
    "data_path": "dstc2"
} 

where “name” refers to the basic ner dataset reader class and data_path is the path to the folder with DSTC 2 dataset.

Dataset

For simple batching and shuffling you can use “dstc2_ner_dataset”. The part of the configuration file for the dataset looks like:

"dataset": {
    "name": "dstc2_ner_dataset"
}

There are no additional parameters in this part.

Chainer

The chainer part of the configuration file contains the specification of the neural network model and supplementary things such as vocabularies. The chainer part must have the following form:

"chainer": {
    "in": ["x"],
    "in_y": ["y"],
    "pipe": [
      ...
    ],
    "out": ["y_predicted"]
  }

The inputs and outputs must be specified in the pipe. “in” means regular input that is used for inference and train mode. “in_y” is used for training and usually contains ground truth answers. “out” field stands for model prediction. The model inside the pipe must have output variable with name “y_predicted” so that “out” knows where to get predictions.

The major part of “chainer” is “pipe”. The “pipe” contains the model and vocabularies. Firstly we define vocabularies needed to build the neural network:

"pipe": [
    {
        "id": "word_vocab",
        "name": "default_vocab",
        "fit_on": ["x"],
        "level": "token",
        "save_path": "ner_conll2003_model/word.dict",
        "load_path": "ner_conll2003_model/word.dict"
    },
    {
        "id": "tag_vocab",
        "name": "default_vocab",
        "fit_on": ["y"],
        "level": "token",
        "save_path": "ner_conll2003_model/tag.dict",
        "load_path": "ner_conll2003_model/tag.dict"
    },
    {
        "id": "char_vocab",
        "name": "default_vocab",
        "fit_on": ["x"],
        "level": "char",
        "save_path": "ner_conll2003_model/char.dict",
        "load_path": "ner_conll2003_model/char.dict"
    },
    ...
]

Parameters for vocabulary are:

Vocabularies are used for holding sets of tokens, tags, or characters. They assign indices to elements of given sets an allow conversion from tokens to indices and vice versa. Conversion of such kind is needed to perform lookup in embeddings matrices and compute cross-entropy between predicted probabilities and target values. For each vocabulary “default_vocab” model is used. “fit_on” parameter defines on which part of the data the vocabulary is built. ["x"] stands for the x part of the data (tokens) and ["y"] stands for the y part (tags). We can also assemble character-level vocabularies by changing the value of “level” parameter: “char” instead of “token”.

The network is defined by the following part of JSON config:

"pipe": [
    ...
    {
        "in": ["x"],
        "in_y": ["y"],
        "out": ["y_predicted"],
        "main": true,
        "name": "dstc_slotfilling",
        "learning_rate": 1e-3,
        "save_path": "ner/dstc_ner_model",
        "load_path": "ner/dstc_ner_model",
        "word_vocab": "#word_vocab",
        "tag_vocab": "#tag_vocab",
        "char_vocab": "#char_vocab",
        "filter_width": 7,
        "embeddings_dropout": true,
        "n_filters": [
          64,
          64
        ],
        "token_embeddings_dim": 64,
        "char_embeddings_dim": 32,
        "use_batch_norm": true,
        "use_crf": true
    }
]

All network parameters are:

After the “chainer” part you should specify the “train” part:

"train": {
    "epochs": 100,
    "batch_size": 64,

    "metrics": ["ner_f1"],
    "validation_patience": 5,
    "val_every_n_epochs": 5,

    "log_every_n_epochs": 1,
    "show_examples": false
}

training parameters are:

And now all parts together:

{
  "dataset_reader": {
    "name": "dstc2_datasetreader",
    "data_path": "dstc2"
  },
  "dataset": {
    "name": "dstc2_ner_dataset",
    "dataset_path": "dstc2"
  },
  "chainer": {
    "in": ["x"],
    "in_y": ["y"],
    "pipe": [
      {
        "id": "word_vocab",
        "name": "default_vocab",
        "fit_on": ["x"],
		"level": "token",
        "save_path": "vocabs/word.dict",
        "load_path": "vocabs/word.dict"
      },
      {
        "id": "tag_vocab",
        "name": "default_vocab",
        "fit_on": ["y"],
		"level": "token",
        "save_path": "vocabs/tag.dict",
        "load_path": "vocabs/tag.dict"
      },
      {
        "id": "char_vocab",
        "name": "default_vocab",
        "fit_on": ["x"],
		"level": "char",
        "save_path": "vocabs/char.dict",
        "load_path": "vocabs/char.dict"
      },
      {
        "in": ["x"],
        "in_y": ["y"],
        "out": ["y_predicted"],
        "main": true,
        "name": "dstc_slotfilling",
        "learning_rate": 1e-3,
        "save_path": "ner/dstc_ner_model",
        "load_path": "ner/dstc_ner_model",
        "word_vocab": "#word_vocab",
        "tag_vocab": "#tag_vocab",
        "char_vocab": "#char_vocab",
        "verbouse": true,
        "filter_width": 7,
        "embeddings_dropout": true,
        "n_filters": [
          64,
          64
        ],
        "token_embeddings_dim": 64,
        "char_embeddings_dim": 32,
        "use_batch_norm": true,
        "use_crf": true
      }
    ],
    "out": ["y_predicted"]
  },
  "train": {
    "epochs": 100,
    "batch_size": 64,

    "metrics": ["slots_accuracy"],
    "validation_patience": 5,
    "val_every_n_epochs": 5,

    "log_every_n_epochs": 1,
    "show_examples": false
  }
}

Train and use the model

Please see an example of training a NER model and using it for prediction:

from deeppavlov.core.commands.train import train_model_from_config
from deeppavlov.core.commands.infer import interact_model
PIPELINE_CONFIG_PATH = 'configs/ner/slotfill_dstc2.json'
train_model_from_config(PIPELINE_CONFIG_PATH)
interact_model(PIPELINE_CONFIG_PATH)

This example assumes that the working directory is deeppavlov.