Skip to content

Documentation for ensuring CUPTI for Profiling is Misleading #39526

Closed
@TylerADavis

Description

@TylerADavis
Contributor

URL(s) with the issue:

https://www.tensorflow.org/guide/profiler#install_the_profiler_and_gpu_prerequisites

Description of issue (what needs changing):

The documentation says to do ldconfig -p | grep libcupti to check that CUPTI exists on the path, and to do export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH to fix it if it is not on the path. However, the documentation can be misleading in situations where an old install of CUDA 10.0 has been replaced with 10.1 (at least on my installation).

My output when checking the path is as below:

tyler@lambda2:/usr/local/cuda/bin$ ldconfig -p | grep libcupti
	libcupti.so.10.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcupti.so.10.0
	libcupti.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcupti.so

Reading the documentation, this suggested to me that I did indeed have a version of libcupti on the path, and that everything should work. However, when I trained my model with the profiler on I saw the following error logs in the console.

2020-05-13 15:49:23.364143: I tensorflow/core/profiler/lib/profiler_session.cc:163] Profiler session started.
2020-05-13 15:49:23.364212: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2020-05-13 15:49:23.364588: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.1'; dlerror: libcupti.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2020-05-13 15:49:23.364606: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1415] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found.

After double checking that I had CUDA 10.1 installed and not 10.2, I did the below

tyler@lambda2:~/$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
tyler@lambda2:~/$ echo $LD_LIBRARY_PATH
/usr/local/cuda-10.1/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/extras/CUPTI/lib64

This then allows the profiler to load CUPTI

2020-05-13 18:18:51.560268: I tensorflow/core/profiler/lib/profiler_session.cc:163] Profiler session started.
2020-05-13 18:18:51.560338: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2020-05-13 18:18:51.561266: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcupti.so.10.1

However, rerunning the command from the documentation for checking that CUPTI is on the path gives the same output as before

tyler@lambda2:~/$ ldconfig -p | grep libcupti
	libcupti.so.10.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcupti.so.10.0
	libcupti.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcupti.so

Desired fixes

After updating my path, I would expect that ldconfig -p | grep libcupti would update to show that usr/local/cuda/extras/CUPTI/lib64 with version 10.1 is available.

Additionally, I believe the documentation should explicitly state that running ldconfig -p | grep libcupti should show libcupti.so.10.1 or greater

Submit a pull request?

No, I'm not sure of what the best way to check for 10.1 or 10.2 would be

Activity

ethanyanjiali

ethanyanjiali commented on May 15, 2020

@ethanyanjiali

i think a better instruction is just to check if there's libcupti.so.10.x in /usr/local/cuda/extras/CUPTI/lib64 because ldconfig won't show CUPTI lib files in some cases even if CUPTI has been correctly installed.

animesh-wynk

animesh-wynk commented on Nov 18, 2020

@animesh-wynk

how to fix this issue on windows10?

JaneWuNEU

JaneWuNEU commented on Dec 4, 2020

@JaneWuNEU

I met the same problem that ldconfig -p | grep libcupti shows the link info of the old install of CUDA instead of the latest install of CUDA in my server. This is because I have no uninstalled the old version of CUDA completely. After following the commands from the official guidance https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#removing-cuda-tk-and-driver, I remove CUDA together with Nvidia driver and install CUDA 10.1 again, and the above problem is solved.
However, when I confirm that the CUPTI path has been added to LD_LIBRARY_PATH, the program still reminds me that can not load libcupti.so.10.0. Then, I execute the following commands: sudo cp /usr/local/cuda-10.1/extras/CUPTI/lib64/libcupti.so /usr/local/lib/libcupti.so && sudo ldconfig sudo cp /usr/local/cuda-10.1/extras/CUPTI/lib64/libcupti.so.10.1 /usr/local/lib/libcupti.so.10.1 && sudo ldconfig.
Amazing, it solved.

DachuanZhao

DachuanZhao commented on Dec 14, 2020

@DachuanZhao

Any progressing ?
My environment is :

echo $LD_LIBRARY_PATH

output is

/usr/local/cuda/lib64::/usr/local/cuda/extras/CUPTI/lib64
$ ll /usr/local/cuda/extras/CUPTI/lib64

output is :

drwxr-xr-x 2 root root     4096 10月 21 15:31 ./
drwxr-xr-x 6 root root     4096 10月 21 15:31 ../
lrwxrwxrwx 1 root root       16 10月 21 15:31 libcupti.so -> libcupti.so.10.1*
lrwxrwxrwx 1 root root       20 10月 21 15:31 libcupti.so.10.1 -> libcupti.so.10.1.208*
-rwxr-xr-x 1 root root  5700176 10月 21 15:31 libcupti.so.10.1.208*
-rw-r--r-- 1 root root 13516866 10月 21 15:31 libcupti_static.a
-rwxr-xr-x 1 root root  9716376 10月 21 15:31 libnvperf_host.so*
-rw-r--r-- 1 root root 14726370 10月 21 15:31 libnvperf_host_static.a
-rwxr-xr-x 1 root root  2349848 10月 21 15:31 libnvperf_target.so*
self-assigned this
on Feb 25, 2021
amahendrakar

amahendrakar commented on Feb 25, 2021

@amahendrakar
Contributor

@TylerADavis,
The command to check if CUPTI exists on a particular path has been updated.

Could you please take a look at this link and let us know if this is still an issue? Thanks!

23 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

comp:gpuGPU related issuesstaleThis label marks the issue/pr stale - to be closed automatically if no activitystat:awaiting responseStatus - Awaiting response from authortype:docs-bugDocument issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @lamberta@anirudh161@ethanyanjiali@TylerADavis@DachuanZhao

      Issue actions

        Documentation for ensuring CUPTI for Profiling is Misleading · Issue #39526 · tensorflow/tensorflow