Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object detect api fasterrcnn OOM #3697

Closed
shartoo opened this issue Mar 22, 2018 · 10 comments
Closed

object detect api fasterrcnn OOM #3697

shartoo opened this issue Mar 22, 2018 · 10 comments

Comments

@shartoo
Copy link

shartoo commented Mar 22, 2018

Here is my basic information

system summary

  • windows: 10/16GB RAM
  • GPU : GeForce GTX 1080 Ti
  • cuda : 8.0
  • cudnn: 7.1
  • tensorflow-gpu: 1.5
  • python: 3.5

data summary

  • number of samples: 10000 (training and validation percent are 70% and 30% respectively)
  • size of train.records: 744MB
  • size of val.records: 334MB
  • image (jpg)
    • shape: $640\times 480$ (resized by opencv)
    • size: 35kb to 1275kb

configuration of training

I'm using faster_rcnn_inception_resnet_v2.config copied from obeject detection sample config files. Here is the detail

model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_resnet_v2'
      first_stage_features_stride: 8
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 8
        width_stride: 8
      }
    }
    first_stage_atrous_rate: 2
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.6
    # modify from 300 to  600
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 17
    maxpool_kernel_size: 1
    maxpool_stride: 1
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.7
		iou_threshold: 0.3
        max_detections_per_class: 10 
        max_total_detections: 40
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 4
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 0
            learning_rate: .0003
          }
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: ""
  from_detection_checkpoint: true
  num_steps: 100000
  data_augmentation_options {
    random_horizontal_flip {
    }
	}
}

train_input_reader: {
  tf_record_input_reader {
    input_path: ""
  }
  label_map_path: ""
}

eval_config: {
  num_examples: 1000
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: ""
  }
  label_map_path: ""
  shuffle: false
  num_readers: 1
}

as you can read ,the batch_size=4(decrease from 64 to 4 while made no difference)

error log

Here is the error log

INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From D:\workspace\compet\ipcr3\object_detection\trainer.py:176: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From D:\workspace\compet\ipcr3\object_detection\builders\optimizer_builder.py:105: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
INFO:tensorflow:Summary name Learning Rate is illegal; using Learning_Rate instead.
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
C:\Python35\lib\site-packages\tensorflow\python\ops\gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2018-03-22 15:03:47.552158: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-03-22 15:03:47.890510: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2018-03-22 15:03:47.890825: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path D:/workspace/compet/ipcr3/data/ICPR3part1/tf_ckpt\model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
2018-03-22 15:05:22.078875: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 124.51MiB.  Current allocation summary follows.
2018-03-22 15:05:22.079202: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\bfc_allocator.cc:627] Bin (256): 	Total Chunks: 730, Chunks in use: 666. 182.5KiB allocated for chunks. 166.5KiB in use in bin. 36.1KiB client-requested in use in bin.
......
2018-03-22 15:05:23.036772: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\bfc_allocator.cc:683] Sum Total of in-use chunks: 8.52GiB
2018-03-22 15:05:23.036933: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\bfc_allocator.cc:685] Stats: 
Limit:                  9280555582
InUse:                  9147851008
MaxInUse:               9226589952
NumAllocs:                    5341
MaxAllocSize:           1278345216

2018-03-22 15:05:23.037782: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\bfc_allocator.cc:277] ****************************************************************************************************
2018-03-22 15:05:23.038097: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[16,38,50,384]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.ResourceExhaustedError'>, OOM when allocating tensor with shape[4,75,100,1088]
	 [[Node: FirstStageFeatureExtractor/InceptionResnetV2/InceptionResnetV2/Repeat_1/block17_8/add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FirstStageFeatureExtractor/InceptionResnetV2/InceptionResnetV2/Repeat_1/block17_7/Relu, FirstStageFeatureExtractor/InceptionResnetV2/InceptionResnetV2/Repeat_1/block17_8/mul)]]
	 [[Node: gradients/FirstStageFeatureExtractor/InceptionResnetV2/InceptionResnetV2/Repeat_1/block17_16/Branch_0/Conv2d_1x1/BatchNorm/FusedBatchNorm_grad/tuple/control_dependency_2/_5985 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_21725_gradients/FirstStageFeatureExtractor/InceptionResnetV2/InceptionResnetV2/Repeat_1/block17_16/Branch_0/Conv2d_1x1/BatchNorm/FusedBatchNorm_grad/tuple/control_dependency_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'FirstStageFeatureExtractor/InceptionResnetV2/InceptionResnetV2/Repeat_1/block17_8/add', defined at:

other information

I can train on other dataset using the same configuration well,while failed on the dataset described above whatever parameters changes made. Can somebody pull me out?Thank you !

@TheFlashover
Copy link

TheFlashover commented Mar 23, 2018

Try to change

train_config: {
  batch_size: 4

to

train_config: {
  batch_size: 1

in faster_rcnn_inception_resnet_v2.config

@shartoo
Copy link
Author

shartoo commented Mar 24, 2018

@TheFlashover Thank you for your advice but error remain the same

@kirk86
Copy link

kirk86 commented Mar 27, 2018

@TheFlashover @shartoo What I've found really strange is that for me all faster-rcnn configs work fine with batch_size=1 and as soon as I change that to 2 then everything breaks down.

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,500,625,3] vs. shape[1] = [1,500,500,3]
         [[Node: concat_1 = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Preprocessor/sub, Preprocessor_1/sub, gradients/Gather_grad/concat/axis)]]
         [[Node: gradients/FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block1/unit_1/bottleneck_v1/shortcut/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_4233 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_13772...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Which I can't understand why that's happening and the documentation is scarce in this case.

@HelloWorldzyy
Copy link

@kirk86 hi, I meet the same problem,do you solve it?could you give me some solution?

@kirk86
Copy link

kirk86 commented Mar 28, 2018

@HelloWorldzyy I don't understand if you're trying to mock me or what? First you down vote a legitimate problem and then you're asking for solution. I can't really understand what you're trying to achieve?

@kellenf
Copy link

kellenf commented Jul 24, 2018

@kirk86 Thanks!your comment solve my problem,but what you said is not really accurate,when my dataset only have one object need to detect,the batch size can be 8,16 and so on.
however,When my dataset has 7 object ,the batch size only should be 1!
But I don't know why,but it must caused by the details in faster rcnn,I would read the paper again carefully.If you can give me the answer,I will appreciate it very much!!!

@mawanda-jun
Copy link

Hi,
I don't know if you solved your problem yet, but the problem is that your training set is made up of different image size. As the neural networks that are inside the algorithms needs equal dimensions matrix, the error arise.
I suggest you to change your resizer from
keep_aspect_ration_resizer { min_dimension: 600 max_dimension: 1024 }
To:
fixed_shape_resizer { width: 600 height: 800 }
Or whatever you want. It depends on how much RAM and computational power you have.
I'm not such an expert but for me it worked.
Remember also that the bigger batch size you have the more optimum results you have in general, however this is valid since all your batches stays in RAM (or GPU). If your PC starts swapping then you should consider to reduce the shape resizer or reduce batch size, since you'll end up in slowing down your computation due to problems with throughput.

Hope this helped out, however if somebody is more skilled than me than listen to him!

@dshahrokhian
Copy link

I want to add an additional option to the ones mentioned above. As a summary, there are 3 possible solutions:

  1. Add pad_to_max_dimension : true in keep_aspect_ratio_resizer:
keep_aspect_ratio_resizer {
  pad_to_max_dimension : true
}
  1. Change batch size to 1:
train_config: {
  batch_size: 1
}
  1. Use fixed_shape_resizer instead of keep_aspect_ratio_resizer:
fixed_shape_resizer { width: <pixels> height: <pixels> }

@pedramtehranchi
Copy link

same problem

@tensorflowbutler
Copy link
Member

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants