[Bug] CUBLAS_STATUS_EXECUTION_FAILED error for BasicVSR_PP for tasks with resolutions >~1700x1080 #2124

jacob-stein · 2024-03-09T01:21:51Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (main) or latest version (0.x).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmagic

Environment

[2024-03-09 01:01:02,325] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
sys.platform: linux
Python: 3.11.7 (main, Dec  8 2023, 18:56:58) [GCC 11.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.1, V12.1.105
GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.1.1+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.16.1+cu121
OpenCV: 4.9.0
MMEngine: 0.10.3
MMCV: 2.1.0
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 12.1
MMagic: 1.2.0+0a560bb

Reproduces the problem - code sample

        return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,
                                       self.stride, self.padding,
                                       self.dilation, self.groups,
                                       self.deform_groups)

Running modulated_deform_conv2d seems to cause the error

Reproduces the problem - command or script

python demo/mmagic_inference_demo.py --model-name basicvsr_pp --video /home/paperspace/BasicVSR_PlusPlus/demo/input/full1.mov --result-out-dir ./resources/output/video_restoration/demo_video_restoration_basicvsr_res.mp4 --extra-parameters max_seq_len=5

Causes error, full1.mov is 1920 × 1080

python demo/mmagic_inference_demo.py --model-name basicvsr_pp --video /home/paperspace/BasicVSR_PlusPlus/demo/input/partial3.mov --result-out-dir ./resources/output/video_restoration/demo_video_restoration_basicvsr_res.mp4 --extra-parameters max_seq_len=5
python demo/mmagic_inference_demo.py --model-name basicvsr_pp --video /home/paperspace/BasicVSR_PlusPlus/demo/input/partial3.mov --result-out-dir ./resources/output/video_restoration/demo_video_restoration_basicvsr_res.mp4 --extra-parameters max_seq_len=2

Both cause error, partial3.mov is 1755 × 1080

python demo/mmagic_inference_demo.py --model-name basicvsr_pp --video /home/paperspace/BasicVSR_PlusPlus/demo/input/partial3.mov --result-out-dir ./resources/output/video_restoration/demo_video_restoration_basicvsr_res.mp4 --extra-parameters max_seq_len=1

Does not cause error, same video, but with max_seq_len 1, the network never needs to forward propagate

python demo/mmagic_inference_demo.py --model-name basicvsr_pp --video /home/paperspace/BasicVSR_PlusPlus/demo/input/partial4.mov --result-out-dir ./resources/output/video_restoration/demo_video_restoration_basicvsr_res.mp4 --extra-parameters max_seq_len=5

does not cause error, partial4.mov is 1646x1080

Reproduces the problem - error message

Traceback (most recent call last):
  File "/home/paperspace/mmagic/demo/mmagic_inference_demo.py", line 142, in <module>
    main()
  File "/home/paperspace/mmagic/demo/mmagic_inference_demo.py", line 138, in main
    editor.infer(**user_defined)
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/apis/mmagic_inferencer.py", line 231, in infer
    return self.inferencer(
           ^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/apis/inferencers/__init__.py", line 110, in __call__
    return self.inferencer(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py", line 139, in __call__
    results = self.base_call(**kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/apis/inferencers/base_mmagic_inferencer.py", line 165, in base_call
    preds = self.forward(data, **forward_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/apis/inferencers/video_restoration_inferencer.py", line 134, in forward
    self.model(
  File "/home/paperspace/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/models/base_models/base_edit_model.py", line 109, in forward
    return self.forward_tensor(inputs, data_samples, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/models/base_models/base_edit_model.py", line 167, in forward_tensor
    feats = self.generator(inputs, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/models/editors/basicvsr_plusplus_net/basicvsr_plusplus_net.py", line 348, in forward
    feats = self.propagate(feats, flows, module)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/models/editors/basicvsr_plusplus_net/basicvsr_plusplus_net.py", line 218, in propagate
    feat_prop = self.deform_align[module_name](feat_prop, cond,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmagic/models/editors/basicvsr_plusplus_net/basicvsr_plusplus_net.py", line 416, in forward
    return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/paperspace/.local/lib/python3.11/site-packages/mmcv/ops/modulated_deform_conv.py", line 149, in forward
    ext_module.modulated_deform_conv_forward(
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Additional information

I keep running into the error above when trying to run BasicVSR_PP on videos at least 1746x1080 or larger. The last confirmed resolution that works is 1646x1080. The error seems to be occuring during forward propagation.

I've tried testing this with multiple versions of PyTorch and mmcv/mmcv-full, and they all fail in a similar way.

The GPU has plenty of memory when max_seq_len=2 (only using ~25g/80g). Is there any workaround available without resorting to methods like tiling the video?

The text was updated successfully, but these errors were encountered:

jacob-stein added the kind/bug something isn't working label Mar 9, 2024

mm-assistant bot assigned zengyh1900 Mar 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] CUBLAS_STATUS_EXECUTION_FAILED error for BasicVSR_PP for tasks with resolutions >~1700x1080 #2124

[Bug] CUBLAS_STATUS_EXECUTION_FAILED error for BasicVSR_PP for tasks with resolutions >~1700x1080 #2124

jacob-stein commented Mar 9, 2024

[Bug] CUBLAS_STATUS_EXECUTION_FAILED error for BasicVSR_PP for tasks with resolutions >~1700x1080 #2124

[Bug] CUBLAS_STATUS_EXECUTION_FAILED error for BasicVSR_PP for tasks with resolutions >~1700x1080 #2124

Comments

jacob-stein commented Mar 9, 2024

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information