Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on the call process for training individual portraits Lora in AIGC #88

Open
alexsunxl opened this issue Oct 27, 2023 · 3 comments

Comments

@alexsunxl
Copy link
Contributor

Destination

AIOS integrates AI portrait process. Includes personal ID photos, artistic photos, pictures of hairstyle changes, clothing changes, sense changs, etc

Basic process

  • Connect to a personal stable diffusion.
  • Verify the model and extension of the stable diffusion.
  • Encapsulate the return information of different states of stable diffusion, allowing users to perceive the steps.
  • AIOS kernel load the local directory, read 5 to 10 individual photos, and initiate training for the Lora model.
  • The LORA training process may take 10 to 20 minutes, depending on the GPU.
  • Once training is complete, notify the user and await a new input message.
  • The llm parses the message to a prompt and sends a request to the AIGC model. The AIGC can then generate ID photos, artistic photos, pictures of hairstyle changes, clothing changes, etc.

Other

More consideration should be given to using LLM to parse requirements and invoke commands, reducing the commands users need to input in the process, so that users only need to input their requirements.

@waterflier @lurenpluto

@waterflier
Copy link
Collaborator

Good idea~

Using an Agent to replace complex SD operations is a good idea. I thought about the process, is it like this: users can first pass a template (usually seen on social networks) to the Agent, and then the Agent guides the user to enter enough information to generate prompts through dialogue interaction, and automatically completes the SD model download, generation, and selection based on the existing LoRA model (this may rely on the GPT-V API).

@alexsunxl
Copy link
Contributor Author

This part of individual portraits is not easy to handle in terms of model detection and initialization. I'm still on hold.

On the other hand, regarding AGI or AIGC,
GPT4's API has many new features, including dall-e3, vision, and text to speech. You can consider integrating it first. Is it better to place these function calls in the agent or in the function?

@waterflier

@waterflier
Copy link
Collaborator

I am focus on New Agent working cycle right now~ If you could add these new features to the openai_node, I would be very pleased, but make sure to follow the pattern we currently use for enhancing AI capabilities.

You wourld create some new issue about that features~~ Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants