Skip to content

Error: Floating point exception (core dumped) #1419

@ashnaeldho

Description

@ashnaeldho

I am training Darknet YOLO-V3 on cat-dog dataset. When I do the training portion, following error occurs. Can someone help me. The error is
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs
14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
15 conv 21 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 21 0.004 BFLOPs
16 yolo
17 route 13
18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs
19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128
20 route 19 8
21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs
22 conv 21 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 21 0.007 BFLOPs
23 yolo
Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
576
Floating point exception (core dumped)

How can I solve this?

Activity

changed the title [-]Error: STB Reason: can't fopen[/-] [+]Error: Floating point exception (core dumped)[/+] on Feb 9, 2019
TacoMeatless

TacoMeatless commented on Feb 14, 2019

@TacoMeatless

I am having the exact same problem. But I cannot even get to the first learning Rate
I have gone through my cfg file line by line too

damnko

damnko commented on Apr 12, 2019

@damnko

Same problem here, trying to train darknet on CPU.

kalyco

kalyco commented on Apr 30, 2019

@kalyco

Same same. I migrated it from CPU to a GPU Gcloud instance but am still seeing the floating point issue. Wondering if the annotation text file conversion from BBox to Yolo got messed up somewhere.

harshthakkar01

harshthakkar01 commented on May 14, 2019

@harshthakkar01

I have the same problem and I followed this.
I believe Floating point exception is because batch/subdivision in .cfg file is not integer. I changed it to generate integer and it started working.
@ashnaeldho could you verify this, if this helps?

kalyco

kalyco commented on May 14, 2019

@kalyco

@harshthakkar01 that is the tutorial I followed too.
After fixing an incorrect path on my training set per my comment here, I was still having issues with the CPU defaulting over GPU.
I ended up needing to stop using CMake because it was improperly configuring my Makefile, which needs to include

GPU=1
CUDNN=1
OPENCV=1
DEBUG=1

I've also seen this comment which has helped other people to add

PATH=/usr/local/cuda-<YOUR_VERSION>/bin${PATH:+:${PATH}} 
LD_LIBRARY_PATH=/usr/local/cuda-<YOUR_VERSION>/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
NVCC = /usr/local/cuda/bin/nvcc

to .bashrc

Ti-tanium

Ti-tanium commented on Jun 14, 2019

@Ti-tanium

I came across the same problem.
It was caused by a small mistake in the .data file.
It was supposed to locate the train.txt file like the following, but I didn't.


  1 classes= 20
  2 train  = <path-to-voc>/train.txt
  3 valid  = <path-to-voc>2007_test.txt

hope it may help.

ixtiyoruz

ixtiyoruz commented on Jan 10, 2020

@ixtiyoruz

In my case i was compiling darknet with cmake when the problem occurs, i changed compiling mode to make and then it worked.

maykulkarni

maykulkarni commented on Jan 12, 2020

@maykulkarni

In my case, it worked after setting the subdivisions to a lower bigger number (4)

cloudy-sfu

cloudy-sfu commented on Mar 3, 2020

@cloudy-sfu

In my case, it worked after setting the subdivisions to a lower number (4)

If batch_size is also a low number, 4 for example, it still doesn't work. However, a higher batch_size with lower subdivisions often leads to the error "GPU out of memory".

maykulkarni

maykulkarni commented on Mar 3, 2020

@maykulkarni

@cloudy-sfu subdivisions is basically how many images batches to consider while passing it to the model. Meaning, batch_size of 64 and subdivisions of 64 would mean only 1 image would be passed. I have corrected my previous message what I meant is increase the number of subdivisions, not decrease.

fprotopapa

fprotopapa commented on Jul 18, 2020

@fprotopapa

Error occurred due to an empty train.txt file created by an external script. Since I have encountered this script several times now on the web, I would add this workaround here. While populating the test and train files the current directory is searched, but the path to the data isn't added.

for pathAndFilename in glob.iglob(os.path.join(current_dir, ".jpg")):
=>
for pathAndFilename in glob.iglob(os.path.join(current_dir, path_data, "
.jpg")):

Shashi630

Shashi630 commented on Jan 27, 2024

@Shashi630

@maykulkarni
Demo
layer filters size input output
0 conv 32 3 x 3 / 1 544 x 544 x 3 -> 544 x 544 x 32 0.511 BFLOPs
1 max 2 x 2 / 2 544 x 544 x 32 -> 272 x 272 x 32
2 conv 64 3 x 3 / 1 272 x 272 x 32 -> 272 x 272 x 64 2.727 BFLOPs
3 max 2 x 2 / 2 272 x 272 x 64 -> 136 x 136 x 64
4 conv 128 3 x 3 / 1 136 x 136 x 64 -> 136 x 136 x 128 2.727 BFLOPs
5 conv 64 1 x 1 / 1 136 x 136 x 128 -> 136 x 136 x 64 0.303 BFLOPs
6 conv 128 3 x 3 / 1 136 x 136 x 64 -> 136 x 136 x 128 2.727 BFLOPs
7 max 2 x 2 / 2 136 x 136 x 128 -> 68 x 68 x 128
8 conv 256 3 x 3 / 1 68 x 68 x 128 -> 68 x 68 x 256 2.727 BFLOPs
9 conv 128 1 x 1 / 1 68 x 68 x 256 -> 68 x 68 x 128 0.303 BFLOPs
10 conv 256 3 x 3 / 1 68 x 68 x 128 -> 68 x 68 x 256 2.727 BFLOPs
11 max 2 x 2 / 2 68 x 68 x 256 -> 34 x 34 x 256
12 conv 512 3 x 3 / 1 34 x 34 x 256 -> 34 x 34 x 512 2.727 BFLOPs
13 conv 256 1 x 1 / 1 34 x 34 x 512 -> 34 x 34 x 256 0.303 BFLOPs
14 conv 512 3 x 3 / 1 34 x 34 x 256 -> 34 x 34 x 512 2.727 BFLOPs
15 conv 256 1 x 1 / 1 34 x 34 x 512 -> 34 x 34 x 256 0.303 BFLOPs
16 conv 512 3 x 3 / 1 34 x 34 x 256 -> 34 x 34 x 512 2.727 BFLOPs
17 max 2 x 2 / 2 34 x 34 x 512 -> 17 x 17 x 512
18 conv 1024 3 x 3 / 1 17 x 17 x 512 -> 17 x 17 x1024 2.727 BFLOPs
19 conv 512 1 x 1 / 1 17 x 17 x1024 -> 17 x 17 x 512 0.303 BFLOPs
20 conv 1024 3 x 3 / 1 17 x 17 x 512 -> 17 x 17 x1024 2.727 BFLOPs
21 conv 512 1 x 1 / 1 17 x 17 x1024 -> 17 x 17 x 512 0.303 BFLOPs
22 conv 1024 3 x 3 / 1 17 x 17 x 512 -> 17 x 17 x1024 2.727 BFLOPs
23 conv 28269 1 x 1 / 1 17 x 17 x1024 -> 17 x 17 x28269 16.732 BFLOPs
24 detection
mask_scale: Using default '1.000000'
Loading weights from ../yolo9000-weights/yolo9000.weights...Done!
[ WARN:0] global ../modules/videoio/src/cap_v4l.cpp (998) tryIoctl VIDEOIO(V4L2:/dev/video0): select() timeout.
Floating point exception

please help me with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @damnko@maykulkarni@kalyco@Ti-tanium@ixtiyoruz

        Issue actions

          Error: Floating point exception (core dumped) · Issue #1419 · pjreddie/darknet