Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] reranker.py 加载模型时输入长度传入错误 #4016

Open
blizzardwj opened this issue May 14, 2024 · 6 comments
Open

[BUG] reranker.py 加载模型时输入长度传入错误 #4016

blizzardwj opened this issue May 14, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@blizzardwj
Copy link

image

应该由1024改为max_length,否则会造成rerank模型predict长度过长的文本时出错。

@blizzardwj blizzardwj added the bug Something isn't working label May 14, 2024
Copy link

dosubot bot commented May 14, 2024

你好,@blizzardwj!很高兴遇到你,我是Dosu,一个可以帮助你解决bug、回答问题并成为贡献者的机器人。在人类维护者回复之前,我在这里为你提供帮助。我已经注意到了你提到的问题,将会很快给你一个完整的回答。

@web3aipro
Copy link

改了还是报错啊

@web3aipro
Copy link

web3aipro commented May 14, 2024

报得错:
image
配置文件里面配置:
image
rerank文件得配置:
image
这么改了,还是报图1错误

Copy link

dosubot bot commented May 14, 2024

To resolve the issue with reranker.py and ensure it correctly handles texts longer than 1024 characters, you should modify the initialization of the CrossEncoder in the LangchainReranker class. Replace the hardcoded max_length=1024 with the max_length parameter that is passed to the constructor of LangchainReranker. Here's the corrected line of code:

self._model = CrossEncoder(model_name=model_name_or_path, max_length=max_length, device=device)

This adjustment allows the CrossEncoder to dynamically use the max_length value provided upon the LangchainReranker instance creation, accommodating texts of various lengths more flexibly.

Keep in mind, when setting the max_length, that transformer models have a maximum sequence length limit and processing very long sequences can be resource-intensive. This means choosing a max_length value requires balancing between the need to process longer texts and the available computational resources, as well as the potential impact on processing speed and memory usage.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@309283175
Copy link

这个原因是reranker模型本身的限制,你这个模型最大支持514个长度,换个支持更长的reranker模型即可

@blizzardwj
Copy link
Author

报得错: image 配置文件里面配置: image rerank文件得配置: image 这么改了,还是报图1错误

RERANKER_MAX_LENGTH = 512,这是模型的输入限制,1024超出了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants