Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing dll cudnn_ops_infer64_8.dll does not generate a python error #20605

Open
martinResearch opened this issue May 8, 2024 · 3 comments
Open
Labels
ep:CUDA issues related to the CUDA execution provider

Comments

@martinResearch
Copy link

Describe the issue

When trying to create a session with onnx_sess = InferenceSession(model, providers=["CUDAExecutionProvider"]) with the dll cudnn_ops_infer64_8.dll missing from the path (one can simply rename this file to reproduce), we get an error message printed in the log "Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!" and the code stops its execution, but we do not get a python error.

Why is it a problem?
Because this error message in not a actual python error it not displayed in the log when using pytest for example, which make investigating the cause of the failed test harder when this dll is missing.
Digging in the python code the code stops it execution on line

sess.initialize_session(providers, provider_options, disabled_optimizers)

The pybind binding should throw a python error instead of just stopping its execution.

To reproduce

  • rename the dll cudnn_ops_infer64_8.dll into cudnn_ops_infer64_8_renamed.dll
  • run any code that uses onnx_sess = InferenceSession(model, providers=["CUDAExecutionProvider"])

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-gpu 1.16.3

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.8

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider platform:windows issues related to the Windows platform labels May 8, 2024
@yuslepukhin
Copy link
Member

DLL loading especially indirect dependencies are handled by the OS. The message you are seeing is from a system loader.
Neither ORT nor Python have any control over that.

@martinResearch
Copy link
Author

martinResearch commented May 8, 2024

I understand from your response that neither ORT or python can change the error message that the OS generates when trying to load the dll. But I am not sure to understand why that would imply that ORT has no way to detect that the OS did not manage to load the library and then throw an error if that is the case. It seems to me that if the dll loading fail then we would get out_module == nullptr on this line https://github.com/microsoft/onnxruntime/blob/58d7b1220550f87ad58a195dc5605fa8c23fe98f/winml/lib/Api.Ort/OnnxruntimeEnvironment.cpp#L43C1-L45C4. and we should then be able to throw an error that gets propagated to python. I am missing something?

@yuslepukhin
Copy link
Member

@sophies927 sophies927 removed the platform:windows issues related to the Windows platform label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

3 participants