-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huggingface agent #2599
base: main
Are you sure you want to change the base?
Huggingface agent #2599
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2599 +/- ##
===========================================
+ Coverage 33.60% 44.73% +11.13%
===========================================
Files 87 89 +2
Lines 9336 9641 +305
Branches 1987 2211 +224
===========================================
+ Hits 3137 4313 +1176
+ Misses 5933 4959 -974
- Partials 266 369 +103
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@whiskyboy thanks for the PR! I had a couple of design questions and wanted your opinion on them. Autogen has an image generation capability, which allows anyone to add text-to-image capabilities to any LLM.
What do you think about implementing a new custom
For image-to-text, we also have a capability called
|
@WaelKarkoub Thanks for your comment!
|
@whiskyboy This is very cool and I appreciate your efforts! Your reasoning fits well with what I think now. Both approaches could be beneficial to the autogen community and could coexist. We can have standalone huggingface conversible agents as well as huggingface image generators, audio generators, etc. I look at Autogen as a lego world where users can mix and match different useful tools (lego pieces), and the tools you've developed are valuable and versatile enough to be applicable across many areas (e.g., agent capabilities). For a concrete example, what do you think about breaking down the text-to-image functionality and implementing it as an One last question, is the image-to-image capability the same as image editing? If so, I'm considering improving the image generator capability to allow for this. |
@WaelKarkoub It's glad to know we are working towards the same goal!
Sounds like a versatile lego block that could be utilized by both standalone agents and agent capabilities? I think it's a good idea! As it could enhance the function reusability, and make the code more readable and maintainable.
Yes, some typical user scenarios include style transfer, image inpainting, etc. For instance, the |
@WaelKarkoub @BeibinLi minding take a review of this PR? I'll add the documentation and tests once you approve the design. |
@self._user_proxy.register_for_execution() | ||
@self._assistant.register_for_llm( | ||
name=HuggingFaceCapability.TEXT_TO_IMAGE.name, | ||
description="Generates images from input text.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the idea behind using function registration instead of using the text analyzer agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically I need the agent to identify which capability (text-to-image, image-to-text, etc.) should be called to complete the task, and extract the arguments for the call. Will text analyzer agent be better for this ask?
self._assistant = AssistantAgent( | ||
self.name + "_inner_assistant", | ||
system_message=system_message, | ||
llm_config=inner_llm_config, | ||
is_termination_msg=lambda x: False, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have to expose these two agents to the public by initializing them in the constructor for a couple of reasons:
- Users can apply transform messages capability to limit token count by either truncation or compression.
- Expose to the users that we'll be making extra API calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum... It's a bit odd for me to explicitly pass two agents to the constructor here. Do you have an example code?
BTW, I was following the design pattern in WebSurferAgent
and ImageGeneration
, both have inner agents that are not exposed.
from autogen.agentchat.contrib import img_utils | ||
|
||
|
||
class HuggingFaceClient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be a model client?
Line 64 in 19de99e
class ModelClient(Protocol): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did not implement the model client protocol yet. But it's a good suggestion. I'll make the change.
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
10493810 | Triggered | Generic Password | d422c63 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | d422c63 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
10493810 | Triggered | Generic Password | d422c63 | notebook/agentchat_pgvector_RetrieveChat.ipynb | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Why are these changes needed?
Introducing a new agent named
HuggingFaceAgent
which can connect to models in HuggingFace Hub to achieve several multimodal capabilities.This agent essentially consists of a pairing between an assistant and a user-proxy agent, both are registered with the huggingface-hub models capabilities. Users could seamlessly access this agent to leverage its multimodal capabilities, without the need for manual registration of toolkits for execution.
Some key changes:
HuggingFaceClient
class inautogen/agentchat/contrib/huggingface_utils.py
: this class simplifies calling HuggingFace models locally or remotely.HuggingFaceAgent
class inautogen/agentchat/contrib/huggingface_agent.py
: this agent utilizesHuggingFaceClient
to achieve multimodal capabilities.HuggingFaceImageGenerator
class inautogen/agentchat/contrib/capabilities/generate_images.py
: this class enables text-based LLMs to generate images usingHuggingFaceClient
.Related issue number
The second approach mentioned in #2577
Checks