Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for DBRX #623

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Add support for DBRX #623

wants to merge 3 commits into from

Conversation

LaaZa
Copy link
Contributor

@LaaZa LaaZa commented Mar 28, 2024

Adds support for databricks/dbrx

NOT tested, way too big for me to test and the tiny dummy model yujiepan/dbrx-tiny-random has too small infeatures.

If possbile please test and report here.

Requires trust_remote_code=True and tiktoken package

Requires transformers>=4.40.0

Closes #621

@Qubitium
Copy link
Contributor

@LaaZa I will test this.

@Qubitium
Copy link
Contributor

Qubitium commented Mar 29, 2024

@LaaZa The quantize process is not working. The end result is 243G which is != pre-quant/8. Also the quantize speed is of ~20 minutes is too fast for model of this size.

@Qubitium
Copy link
Contributor

Qubitium commented Mar 29, 2024

Print of dbrx-instruct.modules

bound method Module.modules of DbrxForCausalLM(
  (transformer): DbrxModel(
    (wte): Embedding(100352, 6144)
    (blocks): ModuleList(
      (0-39): 40 x DbrxBlock(
        (norm_attn_norm): DbrxNormAttentionNorm(
          (norm_1): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
          (attn): DbrxAttention(
            (Wqkv): Linear(in_features=6144, out_features=8192, bias=False)
            (out_proj): Linear(in_features=6144, out_features=6144, bias=False)
            (rotary_emb): DbrxRotaryEmbedding()
          )
          (norm_2): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
        )
        (ffn): DbrxFFN(
          (router): DbrxRouter(
            (layer): Linear(in_features=6144, out_features=16, bias=False)
          )
          (experts): DbrxExperts(
            (mlp): DbrxExpertGLU()
          )
        )
      )
    )
    (norm_f): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=6144, out_features=100352, bias=False)
)>

@maziyarpanahi
Copy link

Hi @LaaZa
I have started the quantization based on this PR. It's at 27/40 now. Shall I also test #625 or this PR will be have the lats changes?

@LaaZa
Copy link
Contributor Author

LaaZa commented Mar 29, 2024

Hi @LaaZa
I have started the quantization based on this PR. It's at 27/40 now. Shall I also test #625 or this PR will be have the lats changes?

Due to the unusual architechture of the MoE implementation, the original model seems to not quantize. There are ways to convert it to something that more likely works, but that requires slightly different version than what this PR currently has. I'll keep this updated according to what is most beneficial way forward, but I would say to wait for support in transformers, so we have a more standardized goal.

@maziyarpanahi
Copy link

Hi @LaaZa
I have started the quantization based on this PR. It's at 27/40 now. Shall I also test #625 or this PR will be have the lats changes?

Due to the unusual architechture of the MoE implementation, the original model seems to not quantize. There are ways to convert it to something that more likely works, but that requires slightly different version than what this PR currently has. I'll keep this updated according to what is most beneficial way forward, but I would say to wait for support in transformers, so we have a more standardized goal.

Perfect, thanks. Please let me know if I still can test something

@LaaZa LaaZa marked this pull request as draft March 31, 2024 14:53
@maziyarpanahi
Copy link

maziyarpanahi commented Apr 4, 2024

Hi @Qubitium
I like both models, unfortunately, I cannot say which one was better.
Thank you for your hard work, and please let me know if I can test any other model.

PS: I made a short doc for anyone else who wanted to quickly test these models: https://gist.github.com/maziyarpanahi/e2e005addef6bb9ca97f9ff6c4c4f0d5

# Conflicts:
#	auto_gptq/modeling/__init__.py
#	auto_gptq/modeling/_const.py
#	auto_gptq/modeling/auto.py
@LaaZa
Copy link
Contributor Author

LaaZa commented Apr 19, 2024

I think this is ready for retesting with transformers 4.40.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] ADD Support DBRX
3 participants