TensorFlow Object Detection API 源码(2) 组件介绍

清欢守护者

人间有味是清欢

0. 前言

参考资料：

CSDN文章：ensorflow object detection API 源码阅读笔记：基本类（1）
如果对物体检测完全不熟悉，建议参考另外一个更容易的SSD-TensorFlow实现。

主要内容：

介绍在core文件夹中的部分组件。
以DetectionModel的子类SSDMetaArch为例，介绍core中组件是如何应用的。

PS：目前理解还不够透彻，可能会有错误，如果发现了请告知我……

1. 组件介绍

简单介绍core文件夹中几个模块，以及SSDFeatureExtractor。
一开始可能不能理解每个组件的功能，建议在了解物体检测的大致过程后，详细了解组件的功能。

1.1. anchor_generator.py

源码地址
主要类：AnchorGenerator类。
功能：生成默认anchors。

一般模型直接预测的bbox信息是基于 默认anchors 的变化量。
以SSD为例，需要对若干特征图上的每个点生成若干默认anchors。

其他：具体实现类在 anchor_generators 中。

1.2. minibatch_sampler

源码地址：minibatch_sampler.py和balanced_positive_negative_sampler.py
主要类：

MinibatchSampler（来自minitatch_sampler.py）
BalancedPositiveNegativeSampler（来自balanced_positive_negative_sampler.py）。

功能：

MinibatchSampler：在minibatch中进一步进行取样（subsample a minibatch based on some criterion）。
BalancedPositiveNegativeSampler：是MinibatchSampler的子类，主要用于设置正例和反例的比例。

1.3. box_coder.py

源码地址
主要类：BoxCoder。
bbox的表达方式主要有两种：

基于整张图片的相对坐标。
基于当前anchor的相对坐标。

功能：bbox的表达方式的转换（即在以上两种表示方式中转换）。
其他：具体实现类在 box_coders 中。

1.4. box_list

源码地址：box_list.py 和 box_list_ops.py。
主要类：BoxList（来自box_list.py）。
主要功能：

BoxList：用于保存一组bbox信息。
box_list_ops.py：有一系列以 BoxList 对象作为输入的操作。

1.5. box_predictor.py

源码地址
主要类（其中第一个类是其他类的基类）：

BoxPredictor
RfcnBoxPredictor
MaskRCNNBoxPredictor
ConvolutionalBoxPredictor
WeightSharedConvolutionalBoxPredictor

功能：

以特征图作为输入，获取预测信息。
预测结果主要包括每个box的位置信息以及分类信息。
将作为 loss function 的输入。

1.6. losses.py

源码地址
主要类：

Loss
WeightedL2LocalizationLoss
WeightedSmoothL1LocalizationLoss
WeightedIOULocalizationLoss
WeightedSigmoidClassificationLoss
SigmoidFocalClassificationLoss
WeightedSoftmaxClassificationLoss
WeightedSoftmaxClassificationAgainstLogitsLoss
BootstrappedSigmoidClassificationLoss
HardExampleMiner

功能：

Loss：所有损失函数的基类。
物体检测任务中，损失函数主要包括位置信息误差XxxLocalizationLoss和分类信息误差XxxClassificationLoss。
HardExampleMiner：选择一部分 region 进行BP（忽略剩下一部分）。

该方法用于提高模型性能，如人脸识别MTCNN中就用到了这种技术。

1.7. matcher.py

源码地址
主要类：

Match
Matcher

功能：

根据 similarity matrix 来比较某两个 bbox 之间的关系。

Each column is matched to at most one row.
有三种类型的关系：match, no_match, ignore。

Match用于保存比较结果。
Matcher用于具体实现比较过程。

其他：Matcher的实现类保存在 matchers 中。

1.8. standard_fields.py

源码地址
主要功能：保存一系列常量。

1.9. region_similarity_calculator.py

源码地址
主要类：

RegionSimilarityCalculator
IouSimilarity
NegSqDistSimilarity
IoaSimilarity

功能：计算两个bbox之间的相似度。

1.10. target_assigner.py

源码地址
主要方法：

类：TargetAssigner.
方法：create_target_assigner，batch_assign_targets。

主要功能：

由于物体检测任务中，ground truth 和预测结果并不一一对应，TargetAssigner 的任务就是根据给定的bbox转换ground truth，使得转换结果与预测结果一一对应。
英文文档很好，大家可以参考：for a given set of anchors (bounding boxes) and groundtruth detections (bounding boxes), to assign classification and regression targets to each anchor as well as weights to each anchor。

步骤：

计算每个预测anchor与每个ground truth的相似度（使用RegionSimilarityCalculator）。
根据一定条件获取筛选结果（使用Matcher）。
转换bbox的表达方式（使用BoxCoder），获取转换后的位置信息 ground truth（与预测的位置结果一一对应）。
获取转换后的分类信息 ground truth（与预测的位置结果一一对应）。

1.11. SSDFeatureExtractor

源码地址
该模块不在core中，而是和后面的SSDMetaArch在同一模块中。
功能：

对输入图片进行预处理。
输入图片预处理结果，获取特征图列表。

class SSDFeatureExtractor(object):
  """SSD Feature Extractor definition."""

  # 构造器中的参数大多用于构建模型的过程中
  def __init__(self,
               is_training,
               depth_multiplier,
               min_depth,
               pad_to_multiple,
               conv_hyperparams_fn,
               reuse_weights=None,
               use_explicit_padding=False,
               use_depthwise=False,
               override_base_feature_extractor_hyperparams=False):
      # 忽略赋值语句...

  # 用于预处理图片
  # 输入 [batch, height, width, channels] 的float数据
  # 输出 preprocessed_inputs, true_image_shapes
  # preprocessed_inputs 为 [batch, height, width, channels] 的float数据
  # true_image_shapes 为 [batch, 3] 的int32数据，
  @abstractmethod
  def preprocess(self, resized_inputs):
    pass

  # 用途提取特征图
  # 以之前 preprocess 函数的输出作为输入，即 [batch, height, width, channels] 的float数据
  # 输出 tensor 列表，每个tensor的shape是 [batch, height_i, width_i, depth_i]
  @abstractmethod
  def extract_features(self, preprocessed_inputs):
    raise NotImplementedError

2. SSDMetaArch 源码分析

源码位于 ssd_meta_arch.py 中。

2.1. 构造函数

构造函数信息量很大，可以知道构建一个 SSDMetaArch 实例需要多少哪些组件。

def __init__(self,
           is_training,
           anchor_generator,  # AnchorGenerator对象
           box_predictor,  # BoxPredictor对象
           box_coder,  # BoxCoder对象
           feature_extractor,  # SSDFeatureExtractor对象
           matcher,  # Matcher对象
           region_similarity_calculator,  # RegionSimilarityCalculator对象
           encode_background_as_zeros,  # boolean
           negative_class_weight,
           image_resizer_fn,
           non_max_suppression_fn,  # post_processing.py 中的函数 batch_multiclass_non_max_suppression
           score_conversion_fn,  # 一般用于将 logits 转换为 predictions，如softmax
           classification_loss,  # Loss对象
           localization_loss,  # Loss对象
           classification_loss_weight,  # float
           localization_loss_weight,  # float
           normalize_loss_by_num_matches,  # boolean
           hard_example_miner,  # HardExampleMiner对象
           add_summaries=True,
           normalize_loc_loss_by_codesize=False,
           freeze_batchnorm=False,
           inplace_batchnorm_update=False,
           add_background_class=True,
           random_example_sampler=None,  # BalancedPositiveNegativeSampler对象
           ):
    super(SSDMetaArch, self).__init__(num_classes=box_predictor.num_classes)

    # 一大堆基本赋值操作就不贴了
    # 形式类似 self._xxx = xxx
    ...

    # 调用 TargetAssigner 获取训练目标
    self._target_assigner = target_assigner.TargetAssigner(
        self._region_similarity_calculator,
        self._matcher,
        self._box_coder,
        negative_class_weight=negative_class_weight,
        unmatched_cls_target=unmatched_cls_target)

2.2. preprocess

流程：

使用构造函数中定义的 _image_resizer_fn 来resize图片。
使用构造函数中定义的 _feature_extractor 对图片进行preprocess。

def preprocess(self, inputs):
    if inputs.dtype is not tf.float32:
      raise ValueError('`preprocess` expects a tf.float32 tensor')
    with tf.name_scope('Preprocessor'):
      # 使用 _image_resizer_fn 来处理输入数据
      outputs = shape_utils.static_or_dynamic_map_fn(
          self._image_resizer_fn,
          elems=inputs,
          dtype=[tf.float32, tf.int32])
      resized_inputs = outputs[0]
      true_image_shapes = outputs[1]

      # 使用 _feature_extractor 来preprocess数据
      return (self._feature_extractor.preprocess(resized_inputs),
              true_image_shapes)

2.3. predict

流程：

使用 _feature_extractor 来获取特征图（_feature_extractor的作用类似于SSD论文中的 base network，不包括从特征图中获取各类预测数据）。
获取输入数据以及特征图的shape。
使用 _anchor_generator 构建默认anchors。
使用 _box_predictor，从特征图中获取预测结果。

def predict(self, preprocessed_inputs, true_image_shapes):
    batchnorm_updates_collections = (None if self._inplace_batchnorm_update
                                     else tf.GraphKeys.UPDATE_OPS)
    with slim.arg_scope([slim.batch_norm],
                        is_training=(self._is_training and
                                     not self._freeze_batchnorm),
                        updates_collections=batchnorm_updates_collections):
      with tf.variable_scope(None, self._extract_features_scope,
                             [preprocessed_inputs]):
        # 提取特征图
        feature_maps = self._feature_extractor.extract_features(
            preprocessed_inputs)
      # 获取特征图shape
      feature_map_spatial_dims = self._get_feature_map_spatial_dims(
          feature_maps)
      # 获取输入数据的shape
      image_shape = shape_utils.combined_static_and_dynamic_shape(
          preprocessed_inputs)

      # 获取默认anchors
      self._anchors = box_list_ops.concatenate(
          self._anchor_generator.generate(
              feature_map_spatial_dims,
              im_height=image_shape[1],
              im_width=image_shape[2]))

      # 从特征图中获取预测结果
      prediction_dict = self._box_predictor.predict(
          feature_maps, self._anchor_generator.num_anchors_per_location())

      # 获取预测结果，并封装到字典中
      box_encodings = tf.concat(prediction_dict['box_encodings'], axis=1)
      if box_encodings.shape.ndims == 4 and box_encodings.shape[2] == 1:
        box_encodings = tf.squeeze(box_encodings, axis=2)
      class_predictions_with_background = tf.concat(
          prediction_dict['class_predictions_with_background'], axis=1)
      predictions_dict = {
          'preprocessed_inputs': preprocessed_inputs,
          'box_encodings': box_encodings,
          'class_predictions_with_background':
          class_predictions_with_background,
          'feature_maps': feature_maps,
          'anchors': self._anchors.get()
      }
      self._batched_prediction_tensor_names = [x for x in predictions_dict
                                               if x != 'anchors']
      return predictions_dict

2.4. postprocess

流程：

提取 predict 的预测结果。
转换预测结果中 bbox 的表达方式。
将预测结果中的 logits 转换为 predictions（一般是通过softmax函数）。
使用nms算法筛选bbox。
封装返回结果。

def postprocess(self, prediction_dict, true_image_shapes):
    # 判断数据合法性
    if ('box_encodings' not in prediction_dict or
        'class_predictions_with_background' not in prediction_dict):
      raise ValueError('prediction_dict does not contain expected entries.')


    with tf.name_scope('Postprocessor'):
      # 获取预测结果
      preprocessed_images = prediction_dict['preprocessed_inputs']
      box_encodings = prediction_dict['box_encodings']
      class_predictions = prediction_dict['class_predictions_with_background']

      # 转换bbox信息
      detection_boxes, detection_keypoints = self._batch_decode(box_encodings)
      detection_boxes = tf.expand_dims(detection_boxes, axis=2)

      # 将logits转换为predictions
      detection_scores_with_background = self._score_conversion_fn(
          class_predictions)

      detection_scores = tf.slice(detection_scores_with_background, [0, 0, 1],
                                  [-1, -1, -1])

      additional_fields = None
      if detection_keypoints is not None:
        additional_fields = {
            fields.BoxListFields.keypoints: detection_keypoints}

      # 通过nms算法筛选bbox
      (nmsed_boxes, nmsed_scores, nmsed_classes, _, nmsed_additional_fields,
       num_detections) = self._non_max_suppression_fn(
           detection_boxes,
           detection_scores,
           clip_window=self._compute_clip_window(
               preprocessed_images, true_image_shapes),
           additional_fields=additional_fields)

      # 封装返回结果
      detection_dict = {
          fields.DetectionResultFields.detection_boxes: nmsed_boxes,
          fields.DetectionResultFields.detection_scores: nmsed_scores,
          fields.DetectionResultFields.detection_classes: nmsed_classes,
          fields.DetectionResultFields.num_detections:
              tf.to_float(num_detections)
      }
      if (nmsed_additional_fields is not None and
          fields.BoxListFields.keypoints in nmsed_additional_fields):
        detection_dict[fields.DetectionResultFields.detection_keypoints] = (
            nmsed_additional_fields[fields.BoxListFields.keypoints])
      return detection_dict

2.5. loss

流程：

keypoints 相关（还没搞清楚keypoints是啥）。
获取预测 targets（用于后续计算损失函数，通过TargetAssigner实现）。
二次筛选样本（如需要设置正例和反例的比例，则在这一步实现）。
分别计算位置误差与分类误差（通过Loss子类对象）。
hard example 相关。
封装返回结果。

def loss(self, prediction_dict, true_image_shapes, scope=None):
    with tf.name_scope(scope, 'Loss', prediction_dict.values()):
      # keypoints 相关操作
      keypoints = None
      if self.groundtruth_has_field(fields.BoxListFields.keypoints):
        keypoints = self.groundtruth_lists(fields.BoxListFields.keypoints)

      # 获取预测 targets（用于后续计算损失函数）
      weights = None
      if self.groundtruth_has_field(fields.BoxListFields.weights):
        weights = self.groundtruth_lists(fields.BoxListFields.weights)
      (batch_cls_targets, batch_cls_weights, batch_reg_targets,
       batch_reg_weights, match_list) = self._assign_targets(
           self.groundtruth_lists(fields.BoxListFields.boxes),
           self.groundtruth_lists(fields.BoxListFields.classes),
           keypoints, weights)
      if self._add_summaries:
        self._summarize_target_assignment(
            self.groundtruth_lists(fields.BoxListFields.boxes), match_list)

      # 二次筛选样本
      # 如需要设置 正例和反例 的比例，则在这一步实现
      if self._random_example_sampler:
        batch_sampled_indicator = tf.to_float(
            shape_utils.static_or_dynamic_map_fn(
                self._minibatch_subsample_fn,
                [batch_cls_targets, batch_cls_weights],
                dtype=tf.bool,
                parallel_iterations=self._parallel_iterations,
                back_prop=True))
        batch_reg_weights = tf.multiply(batch_sampled_indicator,
                                        batch_reg_weights)
        batch_cls_weights = tf.multiply(batch_sampled_indicator,
                                        batch_cls_weights)

      # 分别计算位置误差与分类误差（通过`Loss`子类对象）
      location_losses = self._localization_loss(
          prediction_dict['box_encodings'],
          batch_reg_targets,
          ignore_nan_targets=True,
          weights=batch_reg_weights)
      cls_losses = ops.reduce_sum_trailing_dimensions(
          self._classification_loss(
              prediction_dict['class_predictions_with_background'],
              batch_cls_targets,
              weights=batch_cls_weights),
          ndims=2)

      # hard example 相关
      if self._hard_example_miner:
        (localization_loss, classification_loss) = self._apply_hard_mining(
            location_losses, cls_losses, prediction_dict, match_list)
        if self._add_summaries:
          self._hard_example_miner.summarize()
      else:
        if self._add_summaries:
          class_ids = tf.argmax(batch_cls_targets, axis=2)
          flattened_class_ids = tf.reshape(class_ids, [-1])
          flattened_classification_losses = tf.reshape(cls_losses, [-1])
          self._summarize_anchor_classification_loss(
              flattened_class_ids, flattened_classification_losses)
        localization_loss = tf.reduce_sum(location_losses)
        classification_loss = tf.reduce_sum(cls_losses)

      # Optionally normalize by number of positive matches
      normalizer = tf.constant(1.0, dtype=tf.float32)
      if self._normalize_loss_by_num_matches:
        normalizer = tf.maximum(tf.to_float(tf.reduce_sum(batch_reg_weights)),
                                1.0)

      localization_loss_normalizer = normalizer
      if self._normalize_loc_loss_by_codesize:
        localization_loss_normalizer *= self._box_coder.code_size
      localization_loss = tf.multiply((self._localization_loss_weight /
                                       localization_loss_normalizer),
                                      localization_loss,
                                      name='localization_loss')
      classification_loss = tf.multiply((self._classification_loss_weight /
                                         normalizer), classification_loss,
                                        name='classification_loss')

      # 封装返回结果
      loss_dict = {
          str(localization_loss.op.name): localization_loss,
          str(classification_loss.op.name): classification_loss
      }
    return loss_dict

编辑于 2018-06-22 21:03

深度学习（Deep Learning）

TensorFlow 学习

物体检测

TensorFlow Object Detection API 源码(2) 组件介绍

0. 前言

1. 组件介绍

1.1. anchor_generator.py

1.2. minibatch_sampler

1.3. box_coder.py

1.4. box_list

1.5. box_predictor.py

1.6. losses.py

1.7. matcher.py

1.8. standard_fields.py

1.9. region_similarity_calculator.py

1.10. target_assigner.py

1.11. SSDFeatureExtractor

2. SSDMetaArch 源码分析

2.1. 构造函数

2.2. preprocess

2.3. predict

2.4. postprocess

2.5. loss

文章被以下专栏收录

Bob学步