Failed to Build Horovod (Cmake trying to import tensorflow and torch while installing horovod[pytorch]) #3904
Replies: 9 comments
-
You need to ensure that |
Beta Was this translation helpful? Give feedback.
-
You can see that in the errror, Tensorflow is also needed even tough I clearly write Even after I install Pytorch first,
This is really weird because when I install it on another computer, it's working perfectly. |
Beta Was this translation helpful? Give feedback.
-
No, you can disregard that message if you don't want TensorFlow support. It only failed after the pytorch check because you explicitly demanded Are you able to reproduce the problem in a clean virtual environment where you did nothing but |
Beta Was this translation helpful? Give feedback.
-
Like I said before, after running
|
Beta Was this translation helpful? Give feedback.
-
Here is the log from installing horovod |
Beta Was this translation helpful? Give feedback.
-
This is different from your original posting. This time pytorch is found during the Horovod build.
You didn't answer the question. Also it looks like you are using conda. I have no experience with that personally, but I think other people posted related issues here and found conda-specific help. |
Beta Was this translation helpful? Give feedback.
-
I literally already said that "Even after I install Pytorch first, And no. This is not conda spesific problem. If it is, I can not install it in all of my servers. But, I already installed it succesfully in one of my servers. |
Beta Was this translation helpful? Give feedback.
-
You didn't answer whether you tried the following:
So most likely the environment differs on one of your servers in a way that the specific setup is broken there? |
Beta Was this translation helpful? Give feedback.
-
I already tried in a new "conda" environment and still not working. Even when I tried in a new environment on my already successful server, the installation is success. But, when I tried again and again on another server, always failed. The difference is only on OS version. My successful server is using Ubuntu 18.04 where the failure server is using Ubuntu 22.04. Is there a possibility that the OS version might be the problem? |
Beta Was this translation helpful? Give feedback.
-
Environment:
mpirun --version
)nvcc --version
)gcc --version
)cmake --version
)Checklist:
Bug report:
I'm installing horovod using this command
HOROVOD_WITH_PYTORCH=1 HOROVOD_GPU_OPERATIONS=NCCL pip install --no-cache-dir horovod[pytorch]
This error show up when building horovod
Beta Was this translation helpful? Give feedback.
All reactions