Description
URL(s) with the issue:
https://www.tensorflow.org/guide/profiler#install_the_profiler_and_gpu_prerequisites
Description of issue (what needs changing):
The documentation says to do ldconfig -p | grep libcupti
to check that CUPTI exists on the path, and to do export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
to fix it if it is not on the path. However, the documentation can be misleading in situations where an old install of CUDA 10.0 has been replaced with 10.1 (at least on my installation).
My output when checking the path is as below:
tyler@lambda2:/usr/local/cuda/bin$ ldconfig -p | grep libcupti
libcupti.so.10.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcupti.so.10.0
libcupti.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcupti.so
Reading the documentation, this suggested to me that I did indeed have a version of libcupti on the path, and that everything should work. However, when I trained my model with the profiler on I saw the following error logs in the console.
2020-05-13 15:49:23.364143: I tensorflow/core/profiler/lib/profiler_session.cc:163] Profiler session started.
2020-05-13 15:49:23.364212: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2020-05-13 15:49:23.364588: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.1'; dlerror: libcupti.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-05-13 15:49:23.364606: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1415] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.
After double checking that I had CUDA 10.1 installed and not 10.2, I did the below
tyler@lambda2:~/$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
tyler@lambda2:~/$ echo $LD_LIBRARY_PATH
/usr/local/cuda-10.1/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64
This then allows the profiler to load CUPTI
2020-05-13 18:18:51.560268: I tensorflow/core/profiler/lib/profiler_session.cc:163] Profiler session started.
2020-05-13 18:18:51.560338: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2020-05-13 18:18:51.561266: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcupti.so.10.1
However, rerunning the command from the documentation for checking that CUPTI is on the path gives the same output as before
tyler@lambda2:~/$ ldconfig -p | grep libcupti
libcupti.so.10.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcupti.so.10.0
libcupti.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcupti.so
Desired fixes
After updating my path, I would expect that ldconfig -p | grep libcupti
would update to show that usr/local/cuda/extras/CUPTI/lib64
with version 10.1 is available.
Additionally, I believe the documentation should explicitly state that running ldconfig -p | grep libcupti
should show libcupti.so.10.1
or greater
Submit a pull request?
No, I'm not sure of what the best way to check for 10.1 or 10.2 would be
Activity
ethanyanjiali commentedon May 15, 2020
i think a better instruction is just to check if there's
libcupti.so.10.x
in/usr/local/cuda/extras/CUPTI/lib64
becauseldconfig
won't show CUPTI lib files in some cases even if CUPTI has been correctly installed.animesh-wynk commentedon Nov 18, 2020
how to fix this issue on windows10?
JaneWuNEU commentedon Dec 4, 2020
I met the same problem that
ldconfig -p | grep libcupti
shows the link info of the old install of CUDA instead of the latest install of CUDA in my server. This is because I have no uninstalled the old version of CUDA completely. After following the commands from the official guidance https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#removing-cuda-tk-and-driver, I remove CUDA together with Nvidia driver and install CUDA 10.1 again, and the above problem is solved.However, when I confirm that the CUPTI path has been added to
LD_LIBRARY_PATH
, the program still reminds me that can not load libcupti.so.10.0. Then, I execute the following commands:sudo cp /usr/local/cuda-10.1/extras/CUPTI/lib64/libcupti.so /usr/local/lib/libcupti.so && sudo ldconfig sudo cp /usr/local/cuda-10.1/extras/CUPTI/lib64/libcupti.so.10.1 /usr/local/lib/libcupti.so.10.1 && sudo ldconfig
.Amazing, it solved.
DachuanZhao commentedon Dec 14, 2020
Any progressing ?
My environment is :
output is
output is :
amahendrakar commentedon Feb 25, 2021
@TylerADavis,
The command to check if CUPTI exists on a particular path has been updated.
Could you please take a look at this link and let us know if this is still an issue? Thanks!
23 remaining items