Some Feature Requests, STT, TTS, custom commands #49

mwnu · 2024-04-24T13:44:29Z

Thank you, @hibobmaster , for developing this incredible project.
Now I would like to ask if it is possible to add some features?
Like text-to-speech, speech-to-text, and custom commands using different prompts and agents would be perfect.
If that were the case, it would be perfect.

hibobmaster · 2024-04-24T13:51:22Z

STT: https://github.com/hibobmaster/matrix-stt-bot

It's not perfect but can somehow meet your need.
different prompts and agents:
https://github.com/hibobmaster/matrix_chatgpt_bot/wiki/Langchain-(flowise)
related: #36

mwnu · 2024-04-24T14:00:41Z

matrix-stt-bot

The matrix-stt-bot is great, but it can only transcribe and does not support voice dialogue. Flowise is a bit complex, and unfortunately, it also does not support voice features.

hibobmaster · 2024-04-24T16:35:31Z

voice dialogue: you mean TTS funtion?

For custom commands, this is the entrypoint:

matrix_chatgpt_bot/src/bot.py

Line 241 in 81543d5

    
           async def message_callback(self, room: MatrixRoom, event: RoomMessageText) -> None:

It's hard to maintain new commands at runtime.

So which custom commands do you need?

mwnu · 2024-04-25T05:35:01Z

voice dialogue: you mean TTS funtion?

For custom commands, this is the entrypoint:

matrix_chatgpt_bot/src/bot.py

Line 241 in 81543d5

async def message_callback(self, room: MatrixRoom, event: RoomMessageText) -> None:

It's hard to maintain new commands at runtime.
So which custom commands do you need?

Voice dialogue involves the use of speech-to-text (STT) and text-to-speech (TTS) technologies. A user speaks a message, and the robot responds in voice, initially generating a text message which is then converted to speech by TTS. Another common practice is to display a widget on the message entry that, when clicked, plays the message in text form. However, Matrix does not have this feature (although Matrix spec 1.4 includes MSC protocols for widgets, it seems no service has implemented this yet.). Thus, asking the robot to generate voice like how it handles images by quoting and tagging the robot makes it inconvenient, as voice dialogue is usually used when typing is not feasible. Therefore, outputting voice directly in a conversation is appropriate, and perhaps displaying two or three messages simultaneously would be clearer: one for the user's voice converted to text, one for the AI-generated text, and one for the TTS voice.

One can also envision a scenario where voice calls are used, similar to how chagpt and coilot operate on mobile apps, without the need for text interaction. The program automatically recognizes pauses in the user's tone (some third-party clients, like lobechat, have implemented this), and then responds with voice.

Of course, this would involve extensive coding work. I am eager to participate in this project, but unfortunately, I am not familiar with Python, which makes it difficult for me to understand the entire project. Maybe when I have time, I will study it more thoroughly.

mwnu · 2024-04-25T05:42:32Z

which custom commands do you need?

This's another idea: I envision a default dialogue model that can temporarily switch to other models using custom commands, such as !g35(gpt-3.5) or !c3g(claude-3-opus-20240229). Does this project implement models from providers other than OpenAI? By using a baseurl proxy, it is possible to support models from multiple vendors on a single platform (e.g., one-api), though I haven't tested this yet.
This also includes temporarily switching to other agents, such as using !ss for web searches or !rag to consult one's own knowledge base.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Feature Requests, STT, TTS, custom commands #49

Some Feature Requests, STT, TTS, custom commands #49

mwnu commented Apr 24, 2024 •

edited

hibobmaster commented Apr 24, 2024 •

edited

mwnu commented Apr 24, 2024

hibobmaster commented Apr 24, 2024

mwnu commented Apr 25, 2024

mwnu commented Apr 25, 2024

Some Feature Requests, STT, TTS, custom commands #49

Some Feature Requests, STT, TTS, custom commands #49

Comments

mwnu commented Apr 24, 2024 • edited

hibobmaster commented Apr 24, 2024 • edited

mwnu commented Apr 24, 2024

hibobmaster commented Apr 24, 2024

mwnu commented Apr 25, 2024

mwnu commented Apr 25, 2024

mwnu commented Apr 24, 2024 •

edited

hibobmaster commented Apr 24, 2024 •

edited