Skip to content

Install error when compile the lib #214

Closed
@Lausannen

Description

@Lausannen

Hi, when I try to build the newest version apex, I met some error like the following info.
" python -u -c "import setuptools, tokenize;file='/tmp/pip-req-build-hq7t6roo/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-ic8t29gs/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-hq7t6roo/ "
I make sure that I follow the readme.md but the error could not be solved. Can you give me some suggestions about how to handle it? Thank you very much!

Activity

Lausannen

Lausannen commented on Mar 21, 2019

@Lausannen
Author

I think I have found the problem. It caused by the wrong CUDA version since my server has multi-CUDA versions. So when I changed CUDA path in .bashrc, apex could be compiled.

mcarilli

mcarilli commented on Mar 21, 2019

@mcarilli
Contributor

I'm currently adding logic to the setup.py that will print a warning if the version of Cuda that's being used to compile the extensions is different from the version of Cuda that was used to compile the Pytorch binaries present on your system, which should help catch cases like this.

moskomule

moskomule commented on Mar 25, 2019

@moskomule

Hi, I have probably the same problem with you...

...
Installing collected packages: apex
  Running setup.py install for apex ... error
    Complete output from command /opt/.miniconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-b0pvvy97/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-4os3snlg/install-record.txt --single-version-externally-managed --compile:
    torch.__version__  =  1.0.1.post2

    Compiling cuda extensions with
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130
    from /usr/local/cuda/bin

    Pytorch binaries were compiled with Cuda 10.0.130

    running install
    running build
    running build_py
    copying apex/__init__.py -> build/lib.linux-x86_64-3.7/apex
    copying apex/parallel/sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/__init__.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/parallel/distributed.py -> build/lib.linux-x86_64-3.7/apex/parallel
    copying apex/amp/_amp_state.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/handle.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/frontend.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/__init__.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/scaler.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/utils.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/wrap.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/rnn_compat.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/_initialize.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/amp.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/amp/opt.py -> build/lib.linux-x86_64-3.7/apex/amp
    copying apex/fp16_utils/__init__.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
    copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
    copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
    creating build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
    copying apex/multi_tensor_apply/__init__.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
    copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
    copying apex/normalization/fused_layer_norm.py -> build/lib.linux-x86_64-3.7/apex/normalization
    copying apex/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.7/apex/optimizers
    copying apex/optimizers/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/optimizers
    copying apex/amp/lists/torch_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
    running build_ext
    building 'amp_C' extension
    gcc -pthread -B /opt/.miniconda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /usr/local/cuda/bin/nvcc -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    /usr/local/cuda/bin/nvcc -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/opt/.miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/opt/.miniconda/include/python3.7m -c csrc/multi_tensor_axpby_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
    g++ -pthread -shared -B /opt/.miniconda/compiler_compat -L/opt/.miniconda/lib -Wl,-rpath=/opt/.miniconda/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.7/amp_C.cpython-37m-x86_64-linux-gnu.so
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    /opt/.miniconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: unable to initialize decompress status for section .debug_info
    build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o: file not recognized: file format not recognized
    collect2: error: ld returned 1 exit status
    error: command 'g++' failed with exit status 1

    ----------------------------------------
Command "/opt/.miniconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-b0pvvy97/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-4os3snlg/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-b0pvvy97/
moskomule

moskomule commented on Mar 25, 2019

@moskomule

Update: Without using tmux, I could install apex.

It works with AMP but warns as Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ImportError('/opt/.miniconda/lib/python3.7/site-packages/amp_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration').

Lausannen

Lausannen commented on Mar 25, 2019

@Lausannen
Author

@moskomule I think install apex with --cuda_ext --cpp_ext is necessary, I guess this problem is related with your cuda setting. Since in my case, firstly I check my path using "nvcc -V", it is CUDA-9.0 but I found the link in ~/.bashrc is invalid. Maybe you should check this.

moskomule

moskomule commented on Mar 25, 2019

@moskomule

Thanks, in case of the warning above, I used --cuda_ext --cpp_ext to install and the installation itself seemed to finish successfully. But when running AMP, the warning above appeared.

mcarilli

mcarilli commented on Mar 25, 2019

@mcarilli
Contributor

@moskomule You should make sure to use the pip install command

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

instead of

$ python setup.py install --cpp_ext --cuda_ext

Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale apex/build and apex.egg-info are removed. Try

$ pip uninstall apex
$ pip uninstall apex (repeat until you're sure it's uninstalled...)
$ cd apex
$ rm -rf build
$ rm -rf apex.egg-info
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
moskomule

moskomule commented on Apr 1, 2019

@moskomule

Thank you. So far, I found it fails to build on Ubuntu18.04 but success on Ubuntu16.04.

DangerousY

DangerousY commented on Sep 26, 2019

@DangerousY

@moskomule You should make sure to use the pip install command

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

instead of

$ python setup.py install --cpp_ext --cuda_ext

Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale apex/build and apex.egg-info are removed. Try

$ pip uninstall apex
$ pip uninstall apex (repeat until you're sure it's uninstalled...)
$ cd apex
$ rm -rf build
$ rm -rf apex.egg-info
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

I meet with this problem
ERROR: Command errored out with exit status 1: /home/zyx/anaconda3/envs/maskrcnn_benchmark/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-i7tiph4m/setup.py'"'"'; file='"'"'/tmp/pip-req-build-i7tiph4m/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-4x7z98cw/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

ptrblck

ptrblck commented on Sep 26, 2019

@ptrblck
Collaborator

@DangerousY Could you please post the complete stack trace so that we could have a look?

chccgiven

chccgiven commented on Sep 30, 2019

@chccgiven

@moskomule You should make sure to use the pip install command

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

instead of

$ python setup.py install --cpp_ext --cuda_ext

Also, before reinstalling Apex, you need to make sure any old conflicting installs are removed, and if you installed using the direct setup.py command, you also need to make sure stale apex/build and apex.egg-info are removed. Try

$ pip uninstall apex
$ pip uninstall apex (repeat until you're sure it's uninstalled...)
$ cd apex
$ rm -rf build
$ rm -rf apex.egg-info
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

Hi,I have executed the above command, but the program is reporting following error.
ERROR: You must give at least one requirement to install (see "pip help install")
My Ubuntu version is 18.04, can you help me? Thank you!

ptrblck

ptrblck commented on Sep 30, 2019

@ptrblck
Collaborator

@chccgiven This error is usually thrown, if you forget the folder location at the end of the pip install command (the dot at the end or ./ alternatively).

maschasap

maschasap commented on Jun 5, 2020

@maschasap

@ptrblck good afternoon! Try to install apex though always get this error:

/tmp/pip-jp2_qt25-build/setup.py:51: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
from /usr/local/cuda/bin

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-jp2_qt25-build/setup.py", line 130, in <module>
    check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
  File "/tmp/pip-jp2_qt25-build/setup.py", line 85, in check_cuda_torch_binary_vs_bare_metal
    "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 10.2.
In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).

error
Cleaning up...
Removing source in /tmp/pip-jp2_qt25-build
Command "/raid/akim/myenv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-jp2_qt25-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-ylohle8g-record/install-record.txt --single-version-externally-managed --compile --install-headers /raid/akim/myenv/include/site/python3.6/apex" failed with error code 1 in /tmp/pip-jp2_qt25-build/
Exception information:
Traceback (most recent call last):
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/commands/install.py", line 360, in run
prefix=options.prefix_path,
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/req/req_install.py", line 878, in install
spinner=spinner,
File "/raid/akim/myenv/lib/python3.6/site-packages/pip/utils/init.py", line 725, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip.exceptions.InstallationError: Command "/raid/akim/myenv/bin/python3.6 -u -c "import setuptools, tokenize;file='/tmp/pip-jp2_qt25-build/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-ylohle8g-record/install-record.txt --single-version-externally-managed --compile --install-headers /raid/akim/myenv/include/site/python3.6/apex" failed with error code 1 in /tmp/pip-jp2_qt25-build/

Do you know what the issue may be? Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mcarilli@ptrblck@moskomule@chccgiven@Lausannen

        Issue actions

          Install error when compile the lib · Issue #214 · NVIDIA/apex