-
Notifications
You must be signed in to change notification settings - Fork 21.3k
Description
I am training Darknet YOLO-V3 on cat-dog dataset. When I do the training portion, following error occurs. Can someone help me. The error is
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs
14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
15 conv 21 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 21 0.004 BFLOPs
16 yolo
17 route 13
18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs
19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128
20 route 19 8
21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs
22 conv 21 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 21 0.007 BFLOPs
23 yolo
Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
576
Floating point exception (core dumped)
How can I solve this?
Activity
[-]Error: STB Reason: can't fopen[/-][+]Error: Floating point exception (core dumped)[/+]TacoMeatless commentedon Feb 14, 2019
I am having the exact same problem. But I cannot even get to the first learning Rate
I have gone through my cfg file line by line too
damnko commentedon Apr 12, 2019
Same problem here, trying to train darknet on CPU.
kalyco commentedon Apr 30, 2019
Same same. I migrated it from CPU to a GPU Gcloud instance but am still seeing the floating point issue. Wondering if the annotation text file conversion from BBox to Yolo got messed up somewhere.
harshthakkar01 commentedon May 14, 2019
I have the same problem and I followed this.
I believe Floating point exception is because batch/subdivision in .cfg file is not integer. I changed it to generate integer and it started working.
@ashnaeldho could you verify this, if this helps?
kalyco commentedon May 14, 2019
@harshthakkar01 that is the tutorial I followed too.
After fixing an incorrect path on my training set per my comment here, I was still having issues with the CPU defaulting over GPU.
I ended up needing to stop using CMake because it was improperly configuring my Makefile, which needs to include
I've also seen this comment which has helped other people to add
to .bashrc
Ti-tanium commentedon Jun 14, 2019
I came across the same problem.
It was caused by a small mistake in the .data file.
It was supposed to locate the train.txt file like the following, but I didn't.
hope it may help.
ixtiyoruz commentedon Jan 10, 2020
In my case i was compiling darknet with cmake when the problem occurs, i changed compiling mode to make and then it worked.
maykulkarni commentedon Jan 12, 2020
In my case, it worked after setting the
subdivisions
to alowerbigger number (4
)cloudy-sfu commentedon Mar 3, 2020
If
batch_size
is also a low number,4
for example, it still doesn't work. However, a higherbatch_size
with lowersubdivisions
often leads to the error "GPU out of memory".maykulkarni commentedon Mar 3, 2020
@cloudy-sfu
subdivisions
is basically how many images batches to consider while passing it to the model. Meaning,batch_size
of 64 andsubdivisions
of 64 would mean only 1 image would be passed. I have corrected my previous message what I meant is increase the number ofsubdivisions
, not decrease.fprotopapa commentedon Jul 18, 2020
Error occurred due to an empty train.txt file created by an external script. Since I have encountered this script several times now on the web, I would add this workaround here. While populating the test and train files the current directory is searched, but the path to the data isn't added.
for pathAndFilename in glob.iglob(os.path.join(current_dir, ".jpg")):
=>
for pathAndFilename in glob.iglob(os.path.join(current_dir, path_data, ".jpg")):
Shashi630 commentedon Jan 27, 2024
@maykulkarni
Demo
layer filters size input output
0 conv 32 3 x 3 / 1 544 x 544 x 3 -> 544 x 544 x 32 0.511 BFLOPs
1 max 2 x 2 / 2 544 x 544 x 32 -> 272 x 272 x 32
2 conv 64 3 x 3 / 1 272 x 272 x 32 -> 272 x 272 x 64 2.727 BFLOPs
3 max 2 x 2 / 2 272 x 272 x 64 -> 136 x 136 x 64
4 conv 128 3 x 3 / 1 136 x 136 x 64 -> 136 x 136 x 128 2.727 BFLOPs
5 conv 64 1 x 1 / 1 136 x 136 x 128 -> 136 x 136 x 64 0.303 BFLOPs
6 conv 128 3 x 3 / 1 136 x 136 x 64 -> 136 x 136 x 128 2.727 BFLOPs
7 max 2 x 2 / 2 136 x 136 x 128 -> 68 x 68 x 128
8 conv 256 3 x 3 / 1 68 x 68 x 128 -> 68 x 68 x 256 2.727 BFLOPs
9 conv 128 1 x 1 / 1 68 x 68 x 256 -> 68 x 68 x 128 0.303 BFLOPs
10 conv 256 3 x 3 / 1 68 x 68 x 128 -> 68 x 68 x 256 2.727 BFLOPs
11 max 2 x 2 / 2 68 x 68 x 256 -> 34 x 34 x 256
12 conv 512 3 x 3 / 1 34 x 34 x 256 -> 34 x 34 x 512 2.727 BFLOPs
13 conv 256 1 x 1 / 1 34 x 34 x 512 -> 34 x 34 x 256 0.303 BFLOPs
14 conv 512 3 x 3 / 1 34 x 34 x 256 -> 34 x 34 x 512 2.727 BFLOPs
15 conv 256 1 x 1 / 1 34 x 34 x 512 -> 34 x 34 x 256 0.303 BFLOPs
16 conv 512 3 x 3 / 1 34 x 34 x 256 -> 34 x 34 x 512 2.727 BFLOPs
17 max 2 x 2 / 2 34 x 34 x 512 -> 17 x 17 x 512
18 conv 1024 3 x 3 / 1 17 x 17 x 512 -> 17 x 17 x1024 2.727 BFLOPs
19 conv 512 1 x 1 / 1 17 x 17 x1024 -> 17 x 17 x 512 0.303 BFLOPs
20 conv 1024 3 x 3 / 1 17 x 17 x 512 -> 17 x 17 x1024 2.727 BFLOPs
21 conv 512 1 x 1 / 1 17 x 17 x1024 -> 17 x 17 x 512 0.303 BFLOPs
22 conv 1024 3 x 3 / 1 17 x 17 x 512 -> 17 x 17 x1024 2.727 BFLOPs
23 conv 28269 1 x 1 / 1 17 x 17 x1024 -> 17 x 17 x28269 16.732 BFLOPs
24 detection
mask_scale: Using default '1.000000'
Loading weights from ../yolo9000-weights/yolo9000.weights...Done!
[ WARN:0] global ../modules/videoio/src/cap_v4l.cpp (998) tryIoctl VIDEOIO(V4L2:/dev/video0): select() timeout.
Floating point exception
please help me with this.