-
Notifications
You must be signed in to change notification settings - Fork 74.7k
Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. #24828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In the meanwhile I have tried with Cudnn versions : 7.1,7.0.5,7.3,7.4 , gcc6,still no luck, however I dont get any of these issues when i installed it from conda using conda install tensorflow-gpu. |
I had the same issue with TensorFlow 1.12 on an almost identical system as yours. Solution is to downgrade TensorFlow to 1.8.0 using: |
I also have the same error with TF 1.12,1.11, and I have Cuda 9.0, and cuDnn 7.3.1, 7.4.2. Sometimes it works but sometimes not, what is causing this error to happen. Did anyone solve this error? |
@gunan Can you please take a look or suggest someone? Apparently there is an incompatibility between the cuda 9.0 and cuDNN version above 7.0. Thanks! |
I cannot help much on this one. Maybe TF GPU team can help? |
This error may be related to installation TF with The possible solution is like this: If the result is not empty as the above, so it means you used conda installed TF, when using conda for installing TF, then it will install all the dependencies even CUDA and cuDNN, but the cuDNN version is very low for TF, so it will bring compatibility problem. So you can uninstall the cuDNN and the CUDA which was installed by conda, and then run TF, then it will work. |
@deepakrai9185720 Is this still an issue for you? Can you please try @Bahramudin 's suggestion and confirm if it solves the problem for you? |
maybe same problem.. |
I think it will be a version problem. say if @ifssk1991 solution works |
Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks! |
I'm also facing this issue also. Is any workaround except downgrading ? |
Same issue here! |
Hi, I had the same error. |
|
It is just a problem with cuDNN version incompatibility. I also downgrade to 1.8, although solving the problem, but no need when there is a much higher version which is definitely better than the old version. So I found that I was using conda installing TF, which conda was also installed everything even Cuda and cudnn, so the python was not detecting my installed Cuda and cudnn, it was using which was installed by conda which was very old version of cudnn, so I deleted the conda installed Cuda and cudnn, and then installed TF with pip, and it was OK. |
i also has same problem system win 10 |
Had same problem with cuda 9.0 and cuDNN 7.0.5.15-1 on Ubuntu 16.04 with Tensorflow 1.12. Updating to cuDNN 7.4.2.24 fixed this for me! |
Did you use NCCL, if so which version? |
Same issue here. I have an RTX 2070, cuda 10, cudnn 7.4.1 and tensorflow 2.0 running on ubuntu 18.04. Downgraded cudnn to 7.3.0 but still same error. I see it helped for some people downgrading tensorflow but I guess that's not an option for me. Any help is much appreciated. |
OK, I was able to execute my CNN. I'm using tensorflow tf-nightly-gpu-2.0-preview, and running on a ipython notebook. I had to add this to my notebook: from tensorflow.compat.v1 import ConfigProto config = ConfigProto() Here are some more details Also, this issue is associated with [24496] (#24496) |
I'm having the same issue with Cuda 9.0, Cudnn 7.4.2 and 7.0.5. |
Did you try setting up allow_growth = True? That resolved the problem for me. |
Yes, it helps! config = tf.ConfigProto() |
This solution worked for me. Just to add on - if you set |
Yea it works! Actually what is the concept behind by enabling |
I tried this solution but doesn't work, my system specifications are: |
Try with this. |
See, I made a solution.
Now if you want to want to run tensorflow with eager execution then
This makes tensorflow>=2.0.0 codes to run and even you can make them else if you want to stick with older version tensorflow<2.0.0 then run as usual with tf.Session(), tf.placeholders, tf.Variables, ... and so on. |
Confirming that I hit this error while training a model on an RTX 2060, and setting |
Had a similar issue on a machine with 2 A100s. There was some device ambiguity and so looping through the devices and setting the memory growth manually worked in a tensorflow 2.x environment.
Taken from a different issue |
What worked for my Win10 using Anaconda (Python 3.5.x and NVIDIA GTX1650 was: --Hope this helps-- |
It probably because of the memory growth under tensorflow framework, so try to add |
try: import tensorflow as tf
|
Hy I hope that you all are doing good. I need to train my mrcnn model on gtx 3070. Model loads onto the gpu but stuck while starting training no error appears but it stuck. When I list tensorflow device it show GPU exists but training not starts. Versions I am using:
I will really be thankful to you for helping me out. Thank you |
So one should reinstall using the nvidia installer? |
This solve my problem here. Try to match the verison. |
So you managed to make it work with these following versions ? |
@q-55555 Yes. |
especially cuDNN version and tensorflow version, I downgraded tf version to match cuDNN version, then it's ok. |
I had the same problem and I solved it with this code |
is works ! |
I got the same issue. |
Thanks, this also worked for me in a RTX 2070 Super, TF 2.2, CUDA 10.1 on Ubuntu 18.04 |
Thanx!!! |
Add the following code
|
I had the same error with cudnn=7.6.5 tensorflow-gpu=2.3.0. |
many years later, this worked for me Im using RTX 4090 on Ubuntu 20.04 |
Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template
System information
Describe the problem
I tried installting tensorflow 1.12 using both pip install and building from source.However when I am trying to run faster rcnn model i get following error message:
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
I only get this with tf 1.12 and python 3.6 ,it works fine with python 3.6
Provide the exact sequence of commands / steps that you executed before running into the problem
Any other info / logs
Traceback (most recent call last):
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, FeatureExtractor/MobilenetV1/Conv2d_0/weights/read/_4__cf__7)]]
[[{{node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_21/Gather/GatherV2_2/_211}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7500_...GatherV2_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "detection_app.py", line 67, in worker
output_q.put(y.get_stats_and_detection(frame))
File "/home/user/faster_rcnn_inception_v2_coco_2018_01_28/base_model.py", line 142, in get_stats_and_detection
boxes, scores, classes, num = self.processFrame(img)
File "/home/user/faster_rcnn_inception_v2_coco_2018_01_28/base_model.py", line 76, in processFrame
feed_dict={self.image_tensor: image_np_expanded})
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D (defined at /home/user/faster_rcnn_inception_v2_coco_2018_01_28/base_model.py:36) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, FeatureExtractor/MobilenetV1/Conv2d_0/weights/read/_4__cf__7)]]
[[{{node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_21/Gather/GatherV2_2/_211}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7500_...GatherV2_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Caused by op 'FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D', defined at:
File "detection_app.py", line 94, in
pool = Pool(args.num_workers, worker, (input_q, output_q))
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/context.py", line 119, in Pool
context=self.get_context())
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/pool.py", line 174, in init
self._repopulate_pool()
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/pool.py", line 239, in _repopulate_pool
w.start()
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
code = process_obj._bootstrap()
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "detection_app.py", line 62, in worker
y = DetectorAPI()
File "/home/user/faster_rcnn_inception_v2_coco_2018_01_28/base_model.py", line 36, in init
tf.import_graph_def(od_graph_def, name='')
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3440, in
for c_op in c_api_util.new_tf_operations(self)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/home/user/anaconda3/envs/tf_faust/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D (defined at /home/user/faster_rcnn_inception_v2_coco_2018_01_28/base_model.py:36) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, FeatureExtractor/MobilenetV1/Conv2d_0/weights/read/_4__cf__7)]]
[[{{node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_21/Gather/GatherV2_2/_211}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7500_...GatherV2_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
The text was updated successfully, but these errors were encountered: