You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Converging loss curves for all AutoGPTQ Quantized Linear Kernels
Description
CUDA, EXLLAMA, MARLIN do not work
The loss curves for Cuda/Exllama kernels do not converge and the Marlin loss goes to zero mid-training.
I set the self.kernel_switch_threshold attribute to False in the CUDA Qlinear to force the use of the kernel, I see the iteration time increasing dramatically and the loss not converging at all
My input dimensions (BS * seqlen) are however always larger than the default self.kernel_switch_threshold value and will never enter the condition x.shape[0] < self.kernel_switch_threshold to use the kernel in normal settings, it will always use standard torch.matmul and the resulting loss convergence look decent without the kernel
The TritonV2 Qlinear kernel does show an acceptable loss convergence
I suspect this behaviour is with the Autograd backward function (or the lack of it in Cuda, Exllama and Marlin). In TritonV2, we see the backward function applied to the Autograd function in the QuantLinearFunction class (Refer below). However, the backward function isn't apparent in the other 3 kernels.
Expected Behavior
Converging loss curves for all AutoGPTQ Quantized Linear Kernels
Description
CUDA, EXLLAMA, MARLIN do not work
The loss curves for Cuda/Exllama kernels do not converge and the Marlin loss goes to zero mid-training.
I set the
self.kernel_switch_threshold
attribute to False in the CUDA Qlinear to force the use of the kernel, I see the iteration time increasing dramatically and the loss not converging at allMy input dimensions (BS * seqlen) are however always larger than the default
self.kernel_switch_threshold
value and will never enter the conditionx.shape[0] < self.kernel_switch_threshold
to use the kernel in normal settings, it will always use standardtorch.matmul
and the resulting loss convergence look decent without the kernelsee this link for reference:
https://github.com/AutoGPTQ/AutoGPTQ/blob/866b4c8c2cbb893f1156cb6c114625bba2e4d7c5/auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py#L348C1-L350C42
TritonV2 Works
The TritonV2 Qlinear kernel does show an acceptable loss convergence
I suspect this behaviour is with the Autograd backward function (or the lack of it in Cuda, Exllama and Marlin). In TritonV2, we see the backward function applied to the Autograd function in the QuantLinearFunction class (Refer below). However, the backward function isn't apparent in the other 3 kernels.
This problem has also been briefly talked about in issue #530 as well as Unsloth
Hardware details
Software Version
Python Version = 3.10.8
requirements.txt
Reproduce
The text was updated successfully, but these errors were encountered: