（六）RASA NLU意图分类器

北京邮电大学计算机科学与技术硕士

RASA的逻辑是根据用户本轮说话的意图做分类，然后结合历史上下文，给出一个action。意图分类是后续策略选择的基础。

RASA支持的意图分类器有：

MitieIntentClassifier

使用MitieNLP的分类器，需要Tokenizer都使用MitieNLP，但是MitieIntentClassifier分类器里面已经自带Featurizer功能，所以不是必须配置的。简单来说，是基于稀疏线性核的一个多分类线性SVM。具体算法参考：

SklearnIntentClassifier

使用Sklearn去做意图识别。sklearn也是通过SVM做意图识别，只是sklearn的SVM是通过grid search方法优化的，关于Grid Search参考

SklearnIntentClassifier使用时候需要将SVM的超参数配置上。具体配置如下：

pipeline:
- name: "SklearnIntentClassifier"
 # Specifies the list of regularization values to
 # cross-validate over for C-SVM.
 # This is used with the ``kernel`` hyperparameter in GridSearchCV.
 C: [1, 2, 5, 10, 20, 100]
 # Specifies the kernel to use with C-SVM.
 # This is used with the ``C`` hyperparameter in GridSearchCV.
 kernels: ["linear"]
 # Gamma parameter of the C-SVM.
 "gamma": [0.1]
 # We try to find a good number of cross folds to use during
 # intent training, this specifies the max number of folds.
 "max_cross_validation_folds": 5
 # Scoring function used for evaluating the hyper parameters.
 # This can be a name or a function.
 "scoring_function": "f1_weighted"

KeywordIntentClassifier

简单的关键字匹配意图分类，适用于小型项目，意图比较少的情况。当意图很多，相关性又很大的时候，关键词分类器无法区分。

关键字的匹配方式是，训练数据的整句话都作为关键字，去搜索用户说的话。因此写配置数据的时候，仔细设计那个训练数据很重要，关键字不能太长，这容易匹配不上意图，也不能太短，缺少意图的区分度。

DIETClassifier

DIET模型是Dual Intent and Entity Transformer的简称, 解决了对话理解问题中的2个问题，意图分类和实体识别。DIET使用的是纯监督的方式，没有任何预训练的情况下，无须大规模预训练是关键，性能好于fine-tuning Bert, 但是训练速度是bert的6倍。输入是用户消息和可选意图的稠密或者稀疏向量。输出是实体，意图和评分。

DIET体系结构基于两个任务共享的Transformer。实体标签序列通过Transformer后，输出序列进入顶层条件随机场（CRF）标记层预测，输出每个Token成为BIOE的概率。完整话语和意图标签经过Transformer输出到单个语义向量空间中。利用点积损失最大化与目标标签的相似度，最小化与负样本的相似度。具体DIET的算法参考：

如果只想将DIETClassifier用于意图分类，请将entity_recognition设置为False。如果只想进行实体识别，请将intent_classification设置为False。默认情况下，DIETClassifier同时执行这两项操作，即实体识别和意图分类都设置为True。

可以定义多个超参数来调整模型。如果要调整模型，请首先修改以下参数：

epochs：此参数设置算法将看到训练数据的次数（默认值：300）。一个epoch等于所有训练实例的一个向前传播和一个向后传播。有时模型需要更多的epoch来正确学习。epoch数越少，模型的训练速度就越快。

hidden_layers_sizes：此参数允许您为用户消息和意图定义前馈层的数量及其输出维度（默认值：文本：[]，标签：[]）。列表中的每个条目都对应一个前馈层。例如，如果设置text:[256，128]，我们将在转换器前面添加两个前馈层。输入token的向量（来自用户消息）将被传递到这些层。第一层的输出维度为256，第二层的输出维度为128。如果使用空列表（默认行为），则不会添加前馈层。确保只使用正整数值。通常使用二次幂的数字，第二个值小于或等于前一个值。

embedding_dimension：该参数定义模型内部使用的嵌入层的输出维度（默认值：20）。我们在模型架构中使用了多个嵌入层。例如，在比较和计算损失之前，将完整的话语和意图的向量传递到嵌入层。

number_of_transformer_layers：此参数设置要使用的transformer层数（默认值：2）。transformer层的数量对应于要用于模型的transformer块。

transformer_size：此参数设置transformer中的单位数（默认值：256）。来自transformer的矢量将具有给定的transformer_size。

weight_sparsity：该参数定义模型中所有前馈层的内核权重的分数（默认值：0.8）。该值应介于0和1之间。如果将weight_sparsity设置为0，则不会将内核权重设置为0，该层将充当标准的前馈层。您不应该将weight_sparsity设置为1，因为这将导致所有内核权重为0，即模型无法学习。

一般来说，调整这些参数就可以获得比较好的模型。另外还有其他可以调整的参数，具体见下表。

+---------------------------------+------------------+--------------------------------------------------------------+
| Parameter                       | Default Value    | Description                                                  |
+=================================+==================+==============================================================+
| hidden_layers_sizes             | text: []         | Hidden layer sizes for layers before the embedding layers    |
|                                 | label: []        | for user messages and labels. The number of hidden layers is |
|                                 |                  | equal to the length of the corresponding list.               |
+---------------------------------+------------------+--------------------------------------------------------------+
| share_hidden_layers             | False            | Whether to share the hidden layer weights between user       |
|                                 |                  | messages and labels.                                         |
+---------------------------------+------------------+--------------------------------------------------------------+
| transformer_size                | 256              | Number of units in transformer.                              |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_transformer_layers    | 2                | Number of transformer layers.                                |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_attention_heads       | 4                | Number of attention heads in transformer.                    |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_key_relative_attention      | False            | If 'True' use key relative embeddings in attention.          |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_value_relative_attention    | False            | If 'True' use value relative embeddings in attention.        |
+---------------------------------+------------------+--------------------------------------------------------------+
| max_relative_position           | None             | Maximum position for relative embeddings.                    |
+---------------------------------+------------------+--------------------------------------------------------------+
| unidirectional_encoder          | False            | Use a unidirectional or bidirectional encoder.               |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_size                      | [64, 256]        | Initial and final value for batch sizes.                     |
|                                 |                  | Batch size will be linearly increased for each epoch.        |
|                                 |                  | If constant `batch_size` is required, pass an int, e.g. `8`. |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_strategy                  | "balanced"       | Strategy used when creating batches.                         |
|                                 |                  | Can be either 'sequence' or 'balanced'.                      |
+---------------------------------+------------------+--------------------------------------------------------------+
| epochs                          | 300              | Number of epochs to train.                                   |
+---------------------------------+------------------+--------------------------------------------------------------+
| random_seed                     | None             | Set random seed to any 'int' to get reproducible results.    |
+---------------------------------+------------------+--------------------------------------------------------------+
| learning_rate                   | 0.001            | Initial learning rate for the optimizer.                     |
+---------------------------------+------------------+--------------------------------------------------------------+
| embedding_dimension             | 20               | Dimension size of embedding vectors.                         |
+---------------------------------+------------------+--------------------------------------------------------------+
| dense_dimension                 | text: 128        | Dense dimension for sparse features to use.                  |
|                                 | label: 20        |                                                              |
+---------------------------------+------------------+--------------------------------------------------------------+
| concat_dimension                | text: 128        | Concat dimension for sequence and sentence features.         |
|                                 | label: 20        |                                                              |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_negative_examples     | 20               | The number of incorrect labels. The algorithm will minimize  |
|                                 |                  | their similarity to the user input during training.          |
+---------------------------------+------------------+--------------------------------------------------------------+
| similarity_type                 | "auto"           | Type of similarity measure to use, either 'auto' or 'cosine' |
|                                 |                  | or 'inner'.                                                  |
+---------------------------------+------------------+--------------------------------------------------------------+
| loss_type                       | "softmax"        | The type of the loss function, either 'softmax' or 'margin'. |
+---------------------------------+------------------+--------------------------------------------------------------+
| ranking_length                  | 10               | Number of top actions to normalize scores for loss type      |
|                                 |                  | 'softmax'. Set to 0 to turn off normalization.               |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_positive_similarity     | 0.8              | Indicates how similar the algorithm should try to make       |
|                                 |                  | embedding vectors for correct labels.                        |
|                                 |                  | Should be 0.0 < ... < 1.0 for 'cosine' similarity type.      |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_negative_similarity     | -0.4             | Maximum negative similarity for incorrect labels.            |
|                                 |                  | Should be -1.0 < ... < 1.0 for 'cosine' similarity type.     |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_maximum_negative_similarity | True             | If 'True' the algorithm only minimizes maximum similarity    |
|                                 |                  | over incorrect intent labels, used only if 'loss_type' is    |
|                                 |                  | set to 'margin'.                                             |
+---------------------------------+------------------+--------------------------------------------------------------+
| scale_loss                      | False            | Scale loss inverse proportionally to confidence of correct   |
|                                 |                  | prediction.                                                  |
+---------------------------------+------------------+--------------------------------------------------------------+
| regularization_constant         | 0.002            | The scale of regularization.                                 |
+---------------------------------+------------------+--------------------------------------------------------------+
| negative_margin_scale           | 0.8              | The scale of how important it is to minimize the maximum     |
|                                 |                  | similarity between embeddings of different labels.           |
+---------------------------------+------------------+--------------------------------------------------------------+
| weight_sparsity                 | 0.8              | Sparsity of the weights in dense layers.                     |
|                                 |                  | Value should be between 0 and 1.                             |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate                       | 0.2              | Dropout rate for encoder. Value should be between 0 and 1.   |
|                                 |                  | The higher the value the higher the regularization effect.   |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate_attention             | 0.0              | Dropout rate for attention. Value should be between 0 and 1. |
|                                 |                  | The higher the value the higher the regularization effect.   |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_sparse_input_dropout        | True             | If 'True' apply dropout to sparse input tensors.             |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_dense_input_dropout         | True             | If 'True' apply dropout to dense input tensors.              |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_every_number_of_epochs | 20               | How often to calculate validation accuracy.                  |
|                                 |                  | Set to '-1' to evaluate just once at the end of training.    |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_on_number_of_examples  | 0                | How many examples to use for hold out validation set.        |
|                                 |                  | Large values may hurt performance, e.g. model accuracy.      |
+---------------------------------+------------------+--------------------------------------------------------------+
| intent_classification           | True             | If 'True' intent classification is trained and intents are   |
|                                 |                  | predicted.                                                   |
+---------------------------------+------------------+--------------------------------------------------------------+
| entity_recognition              | True             | If 'True' entity recognition is trained and entities are     |
|                                 |                  | extracted.                                                   |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_masked_language_model       | False            | If 'True' random tokens of the input message will be masked  |
|                                 |                  | and the model has to predict those tokens. It acts like a    |
|                                 |                  | regularizer and should help to learn a better contextual     |
|                                 |                  | representation of the input.                                 |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_directory       | None             | If you want to use tensorboard to visualize training         |
|                                 |                  | metrics, set this option to a valid output directory. You    |
|                                 |                  | can view the training metrics after training in tensorboard  |
|                                 |                  | via 'tensorboard --logdir <path-to-given-directory>'.        |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_level           | "epoch"          | Define when training metrics for tensorboard should be       |
|                                 |                  | logged. Either after every epoch ('epoch') or for every      |
|                                 |                  | training step ('minibatch').                                 |
+---------------------------------+------------------+--------------------------------------------------------------+
| featurizers                     | []               | List of featurizer names (alias names). Only features        |
|                                 |                  | coming from the listed names are used. If list is empty      |
|                                 |                  | all available features are used.                             |
+---------------------------------+------------------+--------------------------------------------------------------+
| checkpoint_model                | False            | Save the best performing model during training. Models are   |
|                                 |                  | stored to the location specified by `--out`. Only the one    |
|                                 |                  | best model will be saved.                                    |
|                                 |                  | Requires `evaluate_on_number_of_examples > 0` and            |
|                                 |                  | `evaluate_every_number_of_epochs > 0`                        |
+---------------------------------+------------------+--------------------------------------------------------------+
| split_entities_by_comma         | True             | Splits a list of extracted entities by comma to treat each   |
|                                 |                  | one of them as a single entity. Can either be `True`/`False` |
|                                 |                  | globally, or set per entity type, such as:                   |
|                                 |                  | ```                                                          |
|                                 |                  | ...                                                          |
|                                 |                  | - name: DIETClassifier                                       |
|                                 |                  |   split_entities_by_comma:                                   |
|                                 |                  |     address: True                                            |
|                                 |                  |     ...                                                      |
|                                 |                  | ...                                                          |
|                                 |                  | ```                                                          |
+---------------------------------+------------------+--------------------------------------------------------------+

FallbackClassifier

当意图识别的得分比较低时，使用该分类器决定是否给出nlu_fallback意图。注意，这个FallbackClassifier总是跟在其他意图分类器之后，对前一个意图分类提给出的意图及置信度进行判定。如果前一个意图分类器给出的意图预测置信度低于threshold，或者两个排名最高的意图的置信度得分接近时，FallbackClassifier实施回退操作。

回退意图的应答，可以通过规则来实现。

rules:
- rule: Ask the user to rephrase in case of low NLU confidence
  steps:
  - intent: nlu_fallback
  - action: utter_please_rephrase

FallbackClassifier的配置参数有：

threshold：此参数设置预测nlu_fallback意图的阈值。如果前一个意图分类器预测的意图置信度小于threshold，则FallbackClassifier将返回一个置信度为1.0的nlu_fallback意图。

ambiguity_threshold：如果两个排名最高的意图的置信度得分之差小于ambiguity_threshold，FallbackClassifier将返回一个置信度为1.0的nlu_fallback意图。

编辑于 2021-01-08 21:20

聊天机器人

文本分类

命名实体识别

（六）RASA NLU意图分类器

文章被以下专栏收录

RASA框架应用