Search code, repositories, users, issues, pull requests...

opened

on Apr 22, 2019

· edited by glenn-jocher

Member

Rectangular inference is implemented by default in detect.py. This reduces inference time proportionally to the amount of letterboxed area padded onto a square image vs a 32-minimum multiple rectangular image. On zidane.jpg, for example, CPU inference time (on a 2018 MacBook Pro) reduces from 1.01s to 0.63s, a 37% reduction, corresponding to a 38% reduction in image area (416x416 to 256x416).

Square Inference

Letterboxes to 416x416 squares.

python3 detect.py  # 416 square inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x416 1 handbags, 3 persons, 1 buss, Done. (0.999s)
image 2/2 data/samples/zidane.jpg: 416x416 1 ties, 2 persons, Done. (1.008s)

Rectangular Inference

Letterboxes to 416 along longest image dimension, pads shorter dimension to minimum multiple of 32.

python3 detect.py  # 416 rectangular inference
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.767s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.632s)

`zidane.jpg`	`bus.jpg`
416x416	416x416
256x416	416x320
1280 × 720	810 × 1080

added

mentioned this

Training on non-squared images #126

self-assigned this

on Apr 24, 2019

MemberAuthor

Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to (shuffle=False now), and then letterboxing it to the minimum viable 32 mulitple for the most square image in that batch. This should be included in our upcoming v7 release, with enormous training speed improvements (about 1/3 faster on mixed aspect ratio datasets like COCO).

MemberAuthor

Rectangular training results on coco_100img.data. Speedup was not material in this case because CUDA was constantly optimizing at each batch due to benchmark set to True, and the dataset of 100 images had only 6 batches, each with a different shape. Speedup should be more impactful on larger training sets. Individual batches were timed as fast as 0.189 seconds here vs 0.240 seconds for 416 square training using a V100.

yolov3/train.py

Line 64 in 7e6e189

torch.backends.cudnn.benchmark = True # unsuitable for multiscale

Rectangular training can be accessed here:

yolov3/utils/datasets.py

Lines 146 to 148 in 7e6e189

    
           # Rectangular Training  https://github.com/ultralytics/yolov3/issues/232 
        
           self.train_rectangular = False 
        
           if self.train_rectangular:

MemberAuthor

Rectangular inference is now working in our latest iDetection iOS App build! This is a screenshot recorded today at 192x320, inference on vertical 4k format 16:9 aspect ratio iPhone video. This pushes the performance to realtime 30 FPS!! This means that we now have YOLOv3-SPP running in realtime on an iPhone Xs using rectangular inference! This is a worldwide first as far as we know.

mentioned this

on Apr 29, 2019

Worse performance in this repo than in a yolov3 pytorch implementation AlexeyAB/darknet#2914

mentioned this

on Jun 14, 2019

Confused about network's input size support, img_size in the code, and "rectangular training"? #332

MichaelCong

mentioned this

on Jul 4, 2019

dataloader #361

mentioned this

on Aug 15, 2019

Image size #456

dakdouky

Hi @glenn-jocher,

I'm trying rectangular training with rect=True but the tensors during training are all square starting with the input torch.Size([16, 3, 416, 416]), what could be the problem?

I'd expect the shapes to be the nearest multiples of 32 for both image dimensions.

What should be img_size in the line:
self.batch_shapes = np.ceil(np.array(shapes) * img_size / 32.).astype(np.int) * 32

I also noticed that images look rectangular in the test_batch.jpg but square in train_batch.jpg, does this mean that rectangular training is unsupported?

MemberAuthor

@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line:

yolov3/utils/datasets.py

Line 408 in e27b124

    
           mosaic = True and self.augment  # load 4 images at a time into a mosaic (only during training)

dakdouky

@MOHAMEDELDAKDOUKY training uses a mosaic loader, which loads 4 images at a time into a mosaic. You can disable this on this line:

yolov3/utils/datasets.py

Line 408 in e27b124

mosaic = True and self.augment # load 4 images at a time into a mosaic (only during training)

Yes, I disabled it but the images are still squares of 416x416.

MemberAuthor

@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again.

dakdouky

@MOHAMEDELDAKDOUKY your repo may be out of date. git clone a new version and try again.

Done, still getting this with rect=True, mosaic and augmentation disabled.

55 remaining items

Rectangular training example in the works, first batch of COCO. This is a bit complicated as we need to letterbox all images in the batch to the same size, and some of the images are being pulled simultaneously by parallel dataloader workers. So part of this process is determining apriori the batch index that each image belongs to (shuffle=False now), and then letterboxing it to the minimum viable 32 mulitple for the most square image in that batch. This should be included in our upcoming v7 release, with enormous training speed improvements (about 1/3 faster on mixed aspect ratio datasets like COCO).

I'm very confused about rectanglar training. In the yolov5/utils/dataloaders.py Line545-568, it fills the images in the same batch into a square shape.Why is it called rectangular training? What do the rectangular images above show?

Thank you in advance.

MemberAuthor

@autograd500 rectangular training refers to the process of letterboxing images in a batch to a common size with a minimum viable multiple of 32 for the most square image. The term "rectangular" here is used to indicate that the images in the batch may have different dimensions, resulting in a rectangular shape after letterboxing. The images shown in the example demonstrate this process, where each image is letterboxed to the same size within the batch. This approach is used to optimize training speed, especially for datasets with mixed aspect ratios like COCO. I hope this clarifies the concept for you. Let me know if you have any further questions.

Thanks for the answer, I still have the following questions:

# Rectangular Training
        if self.rect:
            # Sort by aspect ratio
            s = self.shapes  # wh
            ar = s[:, 1] / s[:, 0]  # aspect ratio
            irect = ar.argsort()
            self.im_files = [self.im_files[i] for i in irect]
            self.label_files = [self.label_files[i] for i in irect]
            self.labels = [self.labels[i] for i in irect]
            self.segments = [self.segments[i] for i in irect]
            self.shapes = s[irect]  # wh
            ar = ar[irect]

            # Set training image shapes
            shapes = [[1, 1]] * nb
            for i in range(nb):
                ari = ar[bi == i]
                mini, maxi = ari.min(), ari.max()
                if maxi < 1:
                    shapes[i] = [maxi, 1]
                elif mini > 1:
                    shapes[i] = [1, 1 / mini]

            self.batch_shapes = np.ceil(np.array(shapes) * img_size / stride + pad).astype(int) * stride

When maxi < 1, the images s[1]/s[0] <1 within the batch . But the final image size is [img_size * maxi, img_size], where s[0] < s[1]. So, the aspect ratio of images changed after letterboxing?

MemberAuthor

@autograd500 yes, in rectangular training, the aspect ratio of the images within the batch can be adjusted during the process of letterboxing. The code you shared is responsible for setting the training image shapes based on the aspect ratios of the images.

When maxi < 1, it means that the width of the image (s[0]) is greater than the height (s[1]), resulting in an aspect ratio less than 1. In this case, the code sets the image shape to [maxi * img_size, img_size], which means that the width will be scaled down and the height will remain the same after letterboxing.

So, to answer your question, yes, the aspect ratio of the images can change after letterboxing to achieve a consistent shape within the batch. Let me know if you have any further questions.

To achieve a consistent shape within the batch, it can also set the image shape to [img_size, maxi * img_size]. In this case, the aspect ratio of the images can be consistent. I intuitively feel that such process is better, because the proportions of the images are not broken.

Why not set the image shape to [img_size, maxi * img_size]？Doesn’t the aspect ratio of the images matter？

MemberAuthor

@autograd500 thank you for your question and suggestion. The aspect ratio of the images does indeed matter in object detection tasks. When training models like YOLOv3, maintaining the original aspect ratio of the images can help preserve the proportions of objects in the scene.

The current approach of setting the image shape to [maxi * img_size, img_size] when maxi < 1 is aimed at ensuring a consistent shape within the batch while still allowing for some variation in aspect ratios. This approach strikes a balance between maintaining the proportions of objects and achieving a common size for efficient batch processing.

However, your idea of setting the image shape to [img_size, maxi * img_size] is interesting and worth considering. It could potentially provide a different trade-off between aspect ratio consistency and preserved object proportions. The choice between the two approaches may depend on the specific requirements and characteristics of the dataset being used.

Thank you for your contribution and for raising this point. It's valuable feedback that could be explored further in future enhancements. Let us know if you have any more questions or suggestions.

Glenn Jocher

There is no questions for the time being, if there is, I will consult you again.

Thank you very much for your reply！

MemberAuthor

@autograd500 hi there,

You're welcome! I'm glad I could help. If you have any more questions or need further assistance in the future, please don't hesitate to reach out. Have a great day!

Glenn Jocher

@glenn-jocher If I want to train a model with input images of size 512x288 and I want the model's input to be fixed, similar to 640x640, what should I do? Why does --rect cause each batch to have different widths and heights? Aren't the neural network inputs supposed to be of fixed size? Thank you.

MemberAuthor

Hi there!

To train a model with a fixed input size of 512x288, you will need to modify the img_size in your training configuration to [512, 288] and deactivate the --rect training option. This setup will ensure that all your inputs are reshaped to 512x288 regardless of their original sizes.

The --rect training option allows for rectangular training, where each batch can adjust its shape according to the aspect ratios of the images within that batch. This is beneficial for mixed aspect ratio datasets, reducing padding and potentially speeding up training. However, the neural network still processes images of a consistent size within each batch.

If you require fixed dimensions for all inputs, simply setting img_size without the --rect option should address your needs. Here’s an example command:

python train.py --img 512 288 --batch-size 16 --data dataset.yaml --weights yolov3.pt

Hope this clears up your query! Let me know if there's anything else you'd like to discuss. 🌟

@glenn-jocher Sorry, it doesn't work.It shows
usage: train.py [-h] [--weights WEIGHTS] [--cfg CFG] [--data DATA] [--hyp HYP] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--imgsz IMGSZ] [--rect] [--resume [RESUME]] [--nosave] [--noval] [--noautoanchor] [--evolve [EVOLVE]] [--bucket BUCKET] [--cache [CACHE]] [--image-weights] [--device DEVICE]
[--multi-scale] [--single-cls] [--optimizer {SGD,Adam,AdamW}] [--sync-bn] [--workers WORKERS] [--project PROJECT] [--name NAME] [--exist-ok] [--quad] [--cos-lr] [--label-smoothing LABEL_SMOOTHING] [--patience PATIENCE] [--freeze FREEZE [FREEZE ...]] [--save-period SAVE_PERIOD]
[--local_rank LOCAL_RANK] [--entity ENTITY] [--upload_dataset [UPLOAD_DATASET]] [--bbox_interval BBOX_INTERVAL] [--artifact_alias ARTIFACT_ALIAS]
train.py: error: unrecognized arguments: 288

@glenn-jocher Excuse me, do you have any idea about this question

@glenn-jocher Sorry, it doesn't work.It shows usage: train.py [-h] [--weights WEIGHTS] [--cfg CFG] [--data DATA] [--hyp HYP] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--imgsz IMGSZ] [--rect] [--resume [RESUME]] [--nosave] [--noval] [--noautoanchor] [--evolve [EVOLVE]] [--bucket BUCKET] [--cache [CACHE]] [--image-weights] [--device DEVICE] [--multi-scale] [--single-cls] [--optimizer {SGD,Adam,AdamW}] [--sync-bn] [--workers WORKERS] [--project PROJECT] [--name NAME] [--exist-ok] [--quad] [--cos-lr] [--label-smoothing LABEL_SMOOTHING] [--patience PATIENCE] [--freeze FREEZE [FREEZE ...]] [--save-period SAVE_PERIOD] [--local_rank LOCAL_RANK] [--entity ENTITY] [--upload_dataset [UPLOAD_DATASET]] [--bbox_interval BBOX_INTERVAL] [--artifact_alias ARTIFACT_ALIAS] train.py: error: unrecognized arguments: 288

MemberAuthor

@Chenplushao hey there!

It looks like you tried to specify separate width and height using --img 512 288, but train.py expects a single number for the --imgsz argument, which sets both the width and height to the same value.

If you need different dimensions and want them to be fixed, you'll have to modify the model configuration file and adjust the input dimensions directly there, as YOLO typically expects square inputs. The other option is to ensure your images are resized to be square while maintaining their aspect ratio through padding before training.

If you have any other questions or need further clarification, feel free to ask. Happy coding! 😊

@Chenplushao hey there!

It looks like you tried to specify separate width and height using --img 512 288, but train.py expects a single number for the --imgsz argument, which sets both the width and height to the same value.

If you need different dimensions and want them to be fixed, you'll have to modify the model configuration file and adjust the input dimensions directly there, as YOLO typically expects square inputs. The other option is to ensure your images are resized to be square while maintaining their aspect ratio through padding before training.

If you have any other questions or need further clarification, feel free to ask. Happy coding! 😊

Thank you sir！Have a nice day！