We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,感谢社区给出的很好的示例。我看到gradio和vllm分布式加速推理是放到了两个地方来示例。如果我想要用gradio来充当大模型的访问界面,同时我还想要用vllm来给部署的大模型加速,这个问题该如何解决?我想到的方法是分别启动两个服务,然后从gradio服务里边去调用vllm服务的api作为处理函数,我这样做对吗,两者结合的标准范式是什么呢
The text was updated successfully, but these errors were encountered:
No branches or pull requests
您好,感谢社区给出的很好的示例。我看到gradio和vllm分布式加速推理是放到了两个地方来示例。如果我想要用gradio来充当大模型的访问界面,同时我还想要用vllm来给部署的大模型加速,这个问题该如何解决?我想到的方法是分别启动两个服务,然后从gradio服务里边去调用vllm服务的api作为处理函数,我这样做对吗,两者结合的标准范式是什么呢
The text was updated successfully, but these errors were encountered: