New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Removed fallback for lm_head op #1482

Open

PenghuiCheng wants to merge 5 commits into main from penghuic/add_lm_head

Collaborator

PenghuiCheng commented Apr 15, 2024

Type of Change

feature
No API changed

Description

Removed fallback of lm_head op for WOQ

Expected Behavior & Potential Risk

Don't fallback lm_head when weight-only quantization.

How has this PR been tested?

Local tested


          Removed fallback for lm_head op

aef6112

Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>

PenghuiCheng added the WIP label

github-actions bot commented Apr 15, 2024 •

edited

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Format Scan Tests workflow

Check ID	Status	Error details
format-scan (pylint)	success		✅
format-scan (bandit)	success		✅
format-scan (cloc)	success		✅
format-scan (cpplint)	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🔴 Optimize Unit Test workflow

Check ID	Status	Error details
optimize-unit-test-baseline	success		✅
optimize-unit-test-PR-test	failure	download	❌
Genreate-OptimizeUT-Report	skipped		❓

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🟢 NeuralChat Unit Test

Check ID	Status	Error details
neuralchat-unit-test-baseline	success		✅
neuralchat-unit-test-PR-test	success		✅
Generate-NeuralChat-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🟢 Engine Unit Test workflow

Check ID	Status	Error details
engine-unit-test-baseline	success		✅
engine-unit-test-PR-test	success		✅
Genreate-Engine-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🟡 Chat Bot Test workflow

Check ID	Status	Error details
call-inference-llama-2-7b-chat-hf / inference test	queued		⌛
call-inference-mpt-7b-chat / inference test	queued		⌛

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

PenghuiCheng and others added 4 commits

April 15, 2024 07:04


          Fixed load issue

0cbaa50

Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>


          update_script

78a7fc1


          Update run_generation_gpu_woq.py

57a65ec

Signed-off-by: Meng, Hengyu <hengyu.meng@intel.com>


          Update requirements_GPU.txt

1f5c68d

Signed-off-by: Meng, Hengyu <hengyu.meng@intel.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment