-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Closed
Labels
StaleStale and schedule for closing soonStale and schedule for closing soon
Description
Rectangular inference is implemented by default in detect.py
. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. On zidane.jpg
, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).
Square Inference
Letterboxes to 416x416 squares.
python3 detect.py # 416 square inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s)
image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)
Rectangular Inference
Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.
python3 detect.py # 416 rectangular inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)
zidane.jpg |
bus.jpg |
---|---|
416x416![]() |
416x416![]() |
256x416![]() |
416x320![]() |
1280 × 720![]() |
810 × 1080![]() |
glenn-jocher, sunny0808, AlphaGoMK, Owen-Fish, denkywu and 28 moreBitbeyHub and glenn-jocherJaiczay, okanlv, wovai, BitbeyHub, glenn-jocher and 3 moreeyesho
Metadata
Metadata
Assignees
Labels
StaleStale and schedule for closing soonStale and schedule for closing soon
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
glenn-jocher commentedon Apr 24, 2019
Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to (
shuffle=False
now), and then letterboxing it to the minimum viable 32 mulitple for the most square image in that batch. This should be included in our upcoming v7 release, with enormous training speed improvements (about 1/3 faster on mixed aspect ratio datasets like COCO).glenn-jocher commentedon Apr 29, 2019
Rectangular training results on
coco_100img.data
. Speedup was not material in this case because CUDA was constantly optimizing at each batch due tobenchmark
set to True, and the dataset of 100 images had only 6 batches, each with a different shape. Speedup should be more impactful on larger training sets. Individual batches were timed as fast as 0.189 seconds here vs 0.240 seconds for 416 square training using a V100.yolov3/train.py
Line 64 in 7e6e189
Rectangular training can be accessed here:
yolov3/utils/datasets.py
Lines 146 to 148 in 7e6e189
glenn-jocher commentedon Apr 29, 2019
Rectangular inference is now working in our latest iDetection iOS App build! This is a screenshot recorded today at 192x320, inference on vertical 4k format 16:9 aspect ratio iPhone video. This pushes the performance to realtime 30 FPS!! This means that we now have YOLOv3-SPP running in realtime on an iPhone Xs using rectangular inference! This is a worldwide first as far as we know.

dakdouky commentedon Dec 5, 2019
Hi @glenn-jocher,
I'm trying rectangular training with
rect=True
but the tensors during training are all square starting with the inputtorch.Size([16, 3, 416, 416])
, what could be the problem?I'd expect the shapes to be the nearest multiples of 32 for both image dimensions.
What should be
img_size
in the line:self.batch_shapes = np.ceil(np.array(shapes) * img_size / 32.).astype(np.int) * 32
I also noticed that images look rectangular in the
test_batch.jpg
but square intrain_batch.jpg
, does this mean that rectangular training is unsupported?glenn-jocher commentedon Dec 5, 2019
@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line:
yolov3/utils/datasets.py
Line 408 in e27b124
dakdouky commentedon Dec 5, 2019
Yes, I disabled it but the images are still squares of
416x416
.glenn-jocher commentedon Dec 5, 2019
@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again.
dakdouky commentedon Dec 5, 2019
Done, still getting this with

rect=True
, mosaic and augmentation disabled.55 remaining items
autograd500 commentedon Oct 17, 2023
I'm very confused about rectanglar training. In the yolov5/utils/dataloaders.py Line545-568, it fills the images in the same batch into a square shape.Why is it called rectangular training? What do the rectangular images above show?
Thank you in advance.
glenn-jocher commentedon Oct 17, 2023
@autograd500 rectangular training refers to the process of letterboxing images in a batch to a common size with a minimum viable multiple of 32 for the most square image. The term "rectangular" here is used to indicate that the images in the batch may have different dimensions, resulting in a rectangular shape after letterboxing. The images shown in the example demonstrate this process, where each image is letterboxed to the same size within the batch. This approach is used to optimize training speed, especially for datasets with mixed aspect ratios like COCO. I hope this clarifies the concept for you. Let me know if you have any further questions.
autograd500 commentedon Oct 18, 2023
Thanks for the answer, I still have the following questions:
When maxi < 1, the images s[1]/s[0] <1 within the batch . But the final image size is [img_size * maxi, img_size], where s[0] < s[1]. So, the aspect ratio of images changed after letterboxing?
glenn-jocher commentedon Oct 18, 2023
@autograd500 yes, in rectangular training, the aspect ratio of the images within the batch can be adjusted during the process of letterboxing. The code you shared is responsible for setting the training image shapes based on the aspect ratios of the images.
When
maxi < 1
, it means that the width of the image (s[0]
) is greater than the height (s[1]
), resulting in an aspect ratio less than 1. In this case, the code sets the image shape to[maxi * img_size, img_size]
, which means that the width will be scaled down and the height will remain the same after letterboxing.So, to answer your question, yes, the aspect ratio of the images can change after letterboxing to achieve a consistent shape within the batch. Let me know if you have any further questions.
autograd500 commentedon Oct 18, 2023
To achieve a consistent shape within the batch, it can also set the image shape to [img_size, maxi * img_size]. In this case, the aspect ratio of the images can be consistent. I intuitively feel that such process is better, because the proportions of the images are not broken.
Why not set the image shape to [img_size, maxi * img_size]?Doesn’t the aspect ratio of the images matter?
glenn-jocher commentedon Oct 18, 2023
@autograd500 thank you for your question and suggestion. The aspect ratio of the images does indeed matter in object detection tasks. When training models like YOLOv3, maintaining the original aspect ratio of the images can help preserve the proportions of objects in the scene.
The current approach of setting the image shape to
[maxi * img_size, img_size]
whenmaxi < 1
is aimed at ensuring a consistent shape within the batch while still allowing for some variation in aspect ratios. This approach strikes a balance between maintaining the proportions of objects and achieving a common size for efficient batch processing.However, your idea of setting the image shape to
[img_size, maxi * img_size]
is interesting and worth considering. It could potentially provide a different trade-off between aspect ratio consistency and preserved object proportions. The choice between the two approaches may depend on the specific requirements and characteristics of the dataset being used.Thank you for your contribution and for raising this point. It's valuable feedback that could be explored further in future enhancements. Let us know if you have any more questions or suggestions.
autograd500 commentedon Oct 18, 2023
There is no questions for the time being, if there is, I will consult you again.
Thank you very much for your reply!
glenn-jocher commentedon Oct 18, 2023
@autograd500 hi there,
You're welcome! I'm glad I could help. If you have any more questions or need further assistance in the future, please don't hesitate to reach out. Have a great day!
SimonHKPU commentedon May 16, 2024
@glenn-jocher If I want to train a model with input images of size 512x288 and I want the model's input to be fixed, similar to 640x640, what should I do? Why does --rect cause each batch to have different widths and heights? Aren't the neural network inputs supposed to be of fixed size? Thank you.
glenn-jocher commentedon May 16, 2024
Hi there!
To train a model with a fixed input size of 512x288, you will need to modify the
img_size
in your training configuration to[512, 288]
and deactivate the--rect
training option. This setup will ensure that all your inputs are reshaped to 512x288 regardless of their original sizes.The
--rect
training option allows for rectangular training, where each batch can adjust its shape according to the aspect ratios of the images within that batch. This is beneficial for mixed aspect ratio datasets, reducing padding and potentially speeding up training. However, the neural network still processes images of a consistent size within each batch.If you require fixed dimensions for all inputs, simply setting
img_size
without the--rect
option should address your needs. Here’s an example command:Hope this clears up your query! Let me know if there's anything else you'd like to discuss. 🌟
SimonHKPU commentedon May 16, 2024
@glenn-jocher Sorry, it doesn't work.It shows
usage: train.py [-h] [--weights WEIGHTS] [--cfg CFG] [--data DATA] [--hyp HYP] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--imgsz IMGSZ] [--rect] [--resume [RESUME]] [--nosave] [--noval] [--noautoanchor] [--evolve [EVOLVE]] [--bucket BUCKET] [--cache [CACHE]] [--image-weights] [--device DEVICE]
[--multi-scale] [--single-cls] [--optimizer {SGD,Adam,AdamW}] [--sync-bn] [--workers WORKERS] [--project PROJECT] [--name NAME] [--exist-ok] [--quad] [--cos-lr] [--label-smoothing LABEL_SMOOTHING] [--patience PATIENCE] [--freeze FREEZE [FREEZE ...]] [--save-period SAVE_PERIOD]
[--local_rank LOCAL_RANK] [--entity ENTITY] [--upload_dataset [UPLOAD_DATASET]] [--bbox_interval BBOX_INTERVAL] [--artifact_alias ARTIFACT_ALIAS]
train.py: error: unrecognized arguments: 288
SimonHKPU commentedon May 17, 2024
@glenn-jocher Excuse me, do you have any idea about this question
glenn-jocher commentedon May 17, 2024
@Chenplushao hey there!
It looks like you tried to specify separate width and height using
--img 512 288
, buttrain.py
expects a single number for the--imgsz
argument, which sets both the width and height to the same value.If you need different dimensions and want them to be fixed, you'll have to modify the model configuration file and adjust the input dimensions directly there, as YOLO typically expects square inputs. The other option is to ensure your images are resized to be square while maintaining their aspect ratio through padding before training.
If you have any other questions or need further clarification, feel free to ask. Happy coding! 😊
SimonHKPU commentedon May 17, 2024
Thank you sir!Have a nice day!
glenn-jocher commentedon May 17, 2024
@Chenplushao You're welcome, and thank you! If you need any more help down the line, don't hesitate to reach out. Have a fantastic day! 😊