Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Feature Requests, STT, TTS, custom commands #49

Open
mwnu opened this issue Apr 24, 2024 · 5 comments
Open

Some Feature Requests, STT, TTS, custom commands #49

mwnu opened this issue Apr 24, 2024 · 5 comments

Comments

@mwnu
Copy link

mwnu commented Apr 24, 2024

Thank you, @hibobmaster , for developing this incredible project.
Now I would like to ask if it is possible to add some features?
Like text-to-speech, speech-to-text, and custom commands using different prompts and agents would be perfect.
If that were the case, it would be perfect.

@hibobmaster
Copy link
Owner

hibobmaster commented Apr 24, 2024

STT: https://github.com/hibobmaster/matrix-stt-bot

It's not perfect but can somehow meet your need.
different prompts and agents:
https://github.com/hibobmaster/matrix_chatgpt_bot/wiki/Langchain-(flowise)
related: #36

@mwnu
Copy link
Author

mwnu commented Apr 24, 2024

matrix-stt-bot

The matrix-stt-bot is great, but it can only transcribe and does not support voice dialogue. Flowise is a bit complex, and unfortunately, it also does not support voice features.

@hibobmaster
Copy link
Owner

voice dialogue: you mean TTS funtion?

For custom commands, this is the entrypoint:

async def message_callback(self, room: MatrixRoom, event: RoomMessageText) -> None:

It's hard to maintain new commands at runtime.

So which custom commands do you need?

@mwnu
Copy link
Author

mwnu commented Apr 25, 2024

voice dialogue: you mean TTS funtion?

For custom commands, this is the entrypoint:

async def message_callback(self, room: MatrixRoom, event: RoomMessageText) -> None:

It's hard to maintain new commands at runtime.
So which custom commands do you need?

Voice dialogue involves the use of speech-to-text (STT) and text-to-speech (TTS) technologies. A user speaks a message, and the robot responds in voice, initially generating a text message which is then converted to speech by TTS. Another common practice is to display a widget on the message entry that, when clicked, plays the message in text form. However, Matrix does not have this feature (although Matrix spec 1.4 includes MSC protocols for widgets, it seems no service has implemented this yet.). Thus, asking the robot to generate voice like how it handles images by quoting and tagging the robot makes it inconvenient, as voice dialogue is usually used when typing is not feasible. Therefore, outputting voice directly in a conversation is appropriate, and perhaps displaying two or three messages simultaneously would be clearer: one for the user's voice converted to text, one for the AI-generated text, and one for the TTS voice.

One can also envision a scenario where voice calls are used, similar to how chagpt and coilot operate on mobile apps, without the need for text interaction. The program automatically recognizes pauses in the user's tone (some third-party clients, like lobechat, have implemented this), and then responds with voice.

Of course, this would involve extensive coding work. I am eager to participate in this project, but unfortunately, I am not familiar with Python, which makes it difficult for me to understand the entire project. Maybe when I have time, I will study it more thoroughly.

@mwnu
Copy link
Author

mwnu commented Apr 25, 2024

which custom commands do you need?

This's another idea: I envision a default dialogue model that can temporarily switch to other models using custom commands, such as !g35(gpt-3.5) or !c3g(claude-3-opus-20240229). Does this project implement models from providers other than OpenAI? By using a baseurl proxy, it is possible to support models from multiple vendors on a single platform (e.g., one-api), though I haven't tested this yet.
This also includes temporarily switching to other agents, such as using !ss for web searches or !rag to consult one's own knowledge base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants