-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Closed
Labels
StaleStale and schedule for closing soonStale and schedule for closing soon
Description
f1_gamma=0.5
alpha=0.5/0.25
we get the error below:
WARNING: non-finite loss, ending training tensor([9.14797, nan, 0.00000, nan], device='cuda:0')
After I set the parameters as:
f1_gamma=2
alpha=0.25
The network works but fails to converge.
Epoch gpu_mem GIoU obj cls total targets img_size
76/272 4.97G 2.39 2.73e-06 0 2.39 82 416: 100%|██████████████████████████████████████| 55/55 [00:14<00:00, 3.77it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:04<00:00, 1.09s/it]
all 391 409 0.0278 0.139 0.00872 0.0464
Epoch gpu_mem GIoU obj cls total targets img_size
77/272 4.97G 2.35 2.73e-06 0 2.35 83 416: 100%|██████████████████████████████████████| 55/55 [00:15<00:00, 3.59it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:04<00:00, 1.02s/it]
all 391 409 0.0463 0.169 0.0249 0.0727
Epoch gpu_mem GIoU obj cls total targets img_size
78/272 4.97G 2.36 2.71e-06 0 2.36 83 416: 100%|██████████████████████████████████████| 55/55 [00:15<00:00, 3.58it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:05<00:00, 1.40s/it]
all 391 409 0.0199 0.147 0.00453 0.0351
Epoch gpu_mem GIoU obj cls total targets img_size
79/272 4.97G 2.35 2.72e-06 0 2.35 84 416: 100%|██████████████████████████████████████| 55/55 [00:14<00:00, 3.74it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:05<00:00, 1.29s/it]
all 391 409 0.0146 0.132 0.00409 0.0262
Epoch gpu_mem GIoU obj cls total targets img_size
80/272 4.97G 2.33 2.71e-06 0 2.33 85 416: 100%|██████████████████████████████████████| 55/55 [00:15<00:00, 3.66it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:04<00:00, 1.03s/it]
all 391 409 0.0613 0.152 0.0397 0.0873
Epoch gpu_mem GIoU obj cls total targets img_size
81/272 4.97G 2.35 2.74e-06 0 2.35 83 416: 100%|██████████████████████████████████████| 55/55 [00:14<00:00, 3.68it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:05<00:00, 1.46s/it]
all 391 409 0.0137 0.112 0.00248 0.0244
Epoch gpu_mem GIoU obj cls total targets img_size
82/272 4.97G 2.36 2.72e-06 0 2.36 80 416: 100%|██████████████████████████████████████| 55/55 [00:15<00:00, 3.65it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:05<00:00, 1.40s/it]
all 391 409 0.0159 0.115 0.00383 0.0279
Epoch gpu_mem GIoU obj cls total targets img_size
83/272 4.97G 2.33 2.78e-06 0 2.33 77 416: 100%|██████████████████████████████████████| 55/55 [00:15<00:00, 3.59it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:05<00:00, 1.31s/it]
all 391 409 0.0288 0.174 0.0126 0.0495
Epoch gpu_mem GIoU obj cls total targets img_size
84/272 4.97G 2.34 2.74e-06 0 2.34 99 416: 100%|██████████████████████████████████████| 55/55 [00:15<00:00, 3.59it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:05<00:00, 1.40s/it]
all 391 409 0.0225 0.147 0.00658 0.039
Epoch gpu_mem GIoU obj cls total targets img_size
85/272 4.97G 2.34 2.73e-06 0 2.34 86 416: 100%|██████████████████████████████████████| 55/55 [00:15<00:00, 3.65it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:03<00:00, 1.03it/s]
all 391 409 0.0492 0.149 0.0127 0.074
Epoch gpu_mem GIoU obj cls total targets img_size
86/272 4.97G 2.32 2.78e-06 0 2.32 79 416: 100%|██████████████████████████████████████| 55/55 [00:14<00:00, 3.78it/s]
Class Images Targets P R mAP@0.5 F1: 100%|████████████████████████████████████████| 4/4 [00:04<00:00, 1.04s/it]
all 391 409 0.0303 0.139 0.00757 0.0498
what's more, I only have one class and I use the command below:
python train.py --cfg cfg/yolov3-tiny.cfg --arc Fdefault
Metadata
Metadata
Assignees
Labels
StaleStale and schedule for closing soonStale and schedule for closing soon
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
glenn-jocher commentedon Jan 29, 2020
@pprp there is about zero obj loss in your second example, so obviously the network will never learn obj this way.
glenn-jocher commentedon Jan 29, 2020
@pprp also, if focal loss produces worse results, then clearly don't use it.
pprp commentedon Jan 30, 2020
What should I do if i want use focal loss?
glenn-jocher commentedon Jan 30, 2020
@pprp try different settings.
pprp commentedon Jan 31, 2020
Thank you very much. I will try to fix this problem..
glenn-jocher commentedon Jan 31, 2020
@pprp by the way, I was looking at the focal loss function. I think the reduction setting may need an update now that the loss reduction functions are set to
sum
rather thanmean
, so there may be a bug here that is our fault. I'll try to push an update today.glenn-jocher commentedon Jan 31, 2020
@pprp ok, the fix is done in 189c704
Can you
git pull
and try training again, starting from the default focal loss parameters?pprp commentedon Feb 1, 2020
Thanks for your reply, I will retrain tomorrow and inform you of the final result.
pprp commentedon Feb 2, 2020
@glenn-jocher I try the fixed version but get the same problem.
I use your default focal loss parameters:
if I use Fdefault, the network will get non-finite loss error.
if I use uFBCE, the network does not converge.
glenn-jocher commentedon Feb 3, 2020
@pprp ah ok. Well, it seems focal loss is not the best choice for your problem. I recommend you stick to the repo defaults (i.e.
--arc default
). They are the defaults for a reason.FranciscoReveriano commentedon Feb 4, 2020
From experience @pprp Focal Loss is usually not the best way to go. I don't know what you are training on. But I would recommend either increasing the img-size, lowering the initial learning rate by a magnitude of 10, or lowering the training IoU.
pprp commentedon Feb 14, 2020
In my problem, I want to use focal loss to balance the positive samples and negative samples.
I have a question about lobj.
In compute_loss function:
Can you tell me why to calculate the loss between the output and the giou? Does this have an effect on the focal loss?
glenn-jocher commentedon Feb 16, 2020
@pprp this is experimental. I think we will revert back to the original formulation below, we are currently testing the effect of the change. Focal loss is independent of this though.
github-actions commentedon Mar 18, 2020
This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.