-
Notifications
You must be signed in to change notification settings - Fork 74.6k
OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked! #45044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you check the CUDA/CUDNN versions in the image/container against this #43718 (comment)? |
See #44832 (comment) |
@AZdora Can you try to run a Resnet (https://keras.io/api/applications/resnet/) in on your 3080 GPU with your working Docker container? |
try adding this just after importing everthing. |
@king398 When doing that I got the following error:
|
You are using cuda 11.0 which is not compatible with rtx 30 series. Try installing cuda 11.1 And you can also try installing through pip instead of docker. it says in the warning to upgrade your cuda software and Your CUDA software stack is old.Also please tell your driver version |
@king398 I have a lot of issues trying to run it using pip.
I'm not sure what I'm missing. I've downloaded CUDA 11.1 and CUDNN. I find that using a docker container is much better since all of the dependencies are packaged by TensorFlow themselves. If there's an issue with the CUDA version that is provided through the docker image from TensorFlow then that should be looked into. This issue still exists with version rc3. |
same issue with nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 and RTX 3080 using cuda 11.1 cause :
tried with rc0 -> rc4 Edit : Fixed docker image : nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 Need to change LD_LIBRARY_PATH in order to make simlink
Make simlink so libcusolver.so.10 is defined
if you have cublas error you can try this :
|
I've found a temporary solution by using software provided by lambda stack. It works on ubuntu 20.04 for all RTX 30 series GPUs. |
TF 2.4 is built & tested against CUDA 11.0, not 11.1. |
I have the exact same problem trying to make TF work with my RTX 3070. CUDA 11.1 + CUDNN 8.0.5.39 + TF2.4.0 Note: I had to make the symlink trick so TF could find the libcusolver.so.10 which is obviously not available in the CUDA 11.1 package |
I experienced this issue on an MSI GL65 with an RTX2070 on Ubuntu 20.04. Dynamic libraries are the following: In [1]: import tensorflow
2021-01-28 16:05:15.891481: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
In [2]: tensorflow.__version__
Out[2]: '2.4.0'
In [3]: tensorflow.config.experimental.list_physical_devices('GPU')
2021-01-28 16:06:40.579904: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-28 16:06:40.588165: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-28 16:06:40.619240: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-28 16:06:40.619800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.455GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 327.88GiB/s
2021-01-28 16:06:40.619823: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-01-28 16:06:40.627330: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-01-28 16:06:40.627382: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-01-28 16:06:40.631550: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-28 16:06:40.633606: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-28 16:06:40.642000: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-01-28 16:06:40.644472: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-01-28 16:06:40.645649: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-28 16:06:40.645749: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-28 16:06:40.646153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-28 16:06:40.646490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
Out[3]: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] Adding the lines indicated by @king398 solved my issue.
|
Adding the lines indicated by @king398 solved my Issue too on my GL65 with RTX2070 on Ubuntu 20.04 |
If the error persists after setting the GPU memory growth configuration, as indicated by @king398, you might want to try dropping the batch size during training. |
One additional hint since it took me some time to figure it out. The set_memory_growth() didn't take effect in my setup until I added the os.environ['CUDA_VISIBLE_DEVICES']="0" (note I have only one GPU). BTW, this still looks like a workaround to me and ideally we would have to fix this (I didn't face this problem with the older versions of CUDA and cuDNN compatible with the RTX20xx series). |
@Harsh188 |
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you. |
@Saduf2019 there are no issues with v2.6.0 |
System information
Describe the problem
While training custom resnet 50 model I get the following build error:
I don't think the code has any issues. It works fine when training with CPU.
Any other info / logs
nvidia-smi
nvcc --version
tf.test.is_gpu_available()
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The text was updated successfully, but these errors were encountered: