Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: zhipuai相关:GLM-4V支持? #1691

Open
wl223600 opened this issue Apr 8, 2024 · 2 comments
Open

[Feature]: zhipuai相关:GLM-4V支持? #1691

wl223600 opened this issue Apr 8, 2024 · 2 comments

Comments

@wl223600
Copy link

wl223600 commented Apr 8, 2024

Class | 类型

大语言模型

Feature Request | 功能请求

现状

  • 目前GPT Academic已具备完善的智谱AI的glm-4glm-3-turbo支持,上述模型没有直接解读图片的能力。
  • 根据GLM-4文档显示,glm-4v可以解读图片(虽然glm-4v的上下文长度仅有2k)。
  • 目前GPT Academic对glm-4v尚无完整支持。

bridge_zhipu.py (Line 77)

    if llm_kwargs["llm_model"] in ["glm-4v"]:
        have_recent_file, image_paths = have_any_recent_upload_image_files(chatbot)
        if not have_recent_file:
            chatbot.append((inputs, "没有检测到任何近期上传的图像文件,请上传jpg格式的图片,此外,请注意拓展名需要小写"))
            yield from update_ui(chatbot=chatbot, history=history, msg="等待图片") # 刷新界面
            return
        if have_recent_file:
            inputs = make_media_input(inputs, image_paths)
            chatbot[-1] = [inputs, ""]
            yield from update_ui(chatbot=chatbot, history=history)

bridge_all.py

(暂无glm-4v相关代码)

com_zhipuglm.py

(似乎也没有呢)

参考

GLM-4模型文档节选(完整文档

模型编码 描述 上下文长度
glm-4 最新的 GLM-4 、最大支持 128k 上下文、支持 Function Call 、Retreival。 128k tokens
glm-4v 实现了视觉语言特征的深度融合,支持视觉问答、图像字幕、视觉定位、复杂目标检测等各类多模态理解任务。 2k tokens

另:GLM-4V完整文档

@binary-husky
Copy link
Owner

glm-4v的代码尚未通过测试,这部分我们需要帮助

@Menghuan1918
Copy link
Contributor

#1700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants