GitHub - agent87/RW-DEEPSPEECH-API: An end to end deep speech REST API containing speech to text and text speech services for Kinyarwanda.

RW DEEPSPEECH API

A Kinyarwanda based end to end deepspeech with speech to text and text to speech services!
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

Welcome to the Kinyarwanda DeepSpeech API repository! This comprehensive guide provides an in-depth exploration of this powerful end-to-end solution for speech processing in Kinyarwanda. With our DeepSpeech API, you can effortlessly convert spoken Kinyarwanda into text and transform text into natural-sounding Kinyarwanda speech. Introduction

In today's digital age, seamless communication across diverse languages is crucial. Our DeepSpeech API for Kinyarwanda bridges language barriers by offering robust speech-to-text and text-to-speech capabilities tailored specifically for the Kinyarwanda language. Whether you are building interactive voice applications, transcribing audio content, or enhancing accessibility features, our API empowers you to achieve your goals with ease. Key Features

Accurate Speech-to-Text Conversion: Leverage our advanced deep learning models to accurately transcribe spoken Kinyarwanda into written text. Our models have been trained on extensive Kinyarwanda speech datasets, ensuring high accuracy and reliability.

Natural Text-to-Speech Synthesis: Generate lifelike Kinyarwanda speech from textual input. Our text-to-speech engine produces natural intonation, rhythm, and pronunciation, creating a seamless and engaging user experience.

End-to-End Processing: Perform both speech-to-text and text-to-speech operations within a single API, streamlining your workflow and saving development time.

Customization: Fine-tune our models to adapt them to specific accents, dialects, or domains, ensuring optimal performance for your unique use case.

Scalability: Our API is designed to handle a high volume of requests, making it suitable for applications ranging from small-scale projects to large-scale enterprise solutions.

Speech to Text Model by Nvidia

This model transcribes speech into lowercase Latin alphabet including spaces, and apostroph, and is trained on around 2000 hours of Kinyarwanda speech data by Nvidia. It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters. See the model architecture and NeMo documentation for complete architecture details.

Text to Speech Model by Digital Umuganda

This model is an end-to-end deep-learning-based Kinyarwanda Text-to-Speech (TTS) developed by Digital Umuganda. Due to its zero-shot learning capabilities, new voices can be introduced with 1min speech. The model was trained using the Coqui's TTS library, and the YourTTS[1] architecture. It was trained on 67 hours of Kinyarwanda bible data, for 100 epochs.

(back to top)

Built With

(back to top)

Getting Started

This is a simpple implmentation requiring few lines of code to run.

Prerequisites

It is highly recomended to run the application in docker container to avoid dependency errors but it also possible to run it without docker In terms of specifications needed

With Docker:
- DISK SPACE >= 10GB
- RAM >= 2GB
Without Docker:
- RAM >= 2GB free/spare

Setup SSL Certificates on Server

Installation with docker

Follow the steps bellow to set up your project on server/machine running docker.

Clone the repo

git clone https://github.com/agent87/RW-DEEPSPEECH-API.git

Pull the large files with git lfs. Make sure you have git lfs installed or refer to git lfs for installation instructions
```
git lfs pull
```

create an environment file named as ".env" with "touch .env" and paste the variables. Make sure the file is in the root directory of the project

MONGO_INITDB_ROOT_USERNAME="admin"
MONGO_INITDB_ROOT_PASSWORD="Bingo123"
MONGO_HOST="mongo"
MONGO_PORT=27017
MONGO_INITDB_DATABASE="Inference"
MONGO_STT_COLLECTION="STT_INFERENCE_LOGS"
MONGO_TTS_COLLECTION="TTS_INFERENCE_LOGS"
MAX_SPEECH_AUDIO_FILE_SIZE=1000
TTS_MAX_TXT_LEN=1000
LOG_LEVEL="INFO"
PYTHONUNBUFFERED=1
DOMAIN=<Replace your DOMAIN here>
SERVER_IP_ADDRESS=<Replace your SERVER_IP_ADDRESS here>

NOTE: For security purposes, make sure to change the variables above!

build the docker image
```
docker compose build
```
Note: if you have an earlier docker version use "docker-compose build"
Start the docker containers and let the magic begin
```
docker compose up
```

(back to top)

Usage

If you happen not to have speciazed hardware(GPU) you can run the application on Google Colab. Use the following link to open the notebook and follow the instructions in the notebook to run the application.

Speech to Text (STT) usage

curl -X POST "http://server_url/stt" -H  "accept: application/json" -H  "Content-Type: multipart/form-data" -F "file=@/path/to/audio/file"

Text to Speech (TTS) usage

curl -X POST "http://server_url/tts" -H  "accept: application/json" -H  "Content-Type: application/json" -d "{\"text\":\"string\"}"

(back to top)

Roadmap

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the GNU GENERAL PUBLIC LICENSE. See LICENSE.txt for more information.

(back to top)

Contact

Arnaud Kayonga - @kayarn - arnauldkayonga1@gmail.com

Project Link: https://github.com/agent87/RW-DEEPSPEECH-API

(back to top)

Acknowledgments

Use this space to list resources you find helpful and would like to give credit to. I've included a few of my favorites to kick things off!

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.docker		.docker
.github/workflows		.github/workflows
demo		demo
docs		docs
k8s		k8s
stt		stt
tts		tts
utils		utils
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

agent87/RW-DEEPSPEECH-API

Folders and files

Latest commit

History

Repository files navigation

RW DEEPSPEECH API

About The Project

Built With

Getting Started

Prerequisites

Setup SSL Certificates on Server

Installation with docker

Usage

Speech to Text (STT) usage

Text to Speech (TTS) usage

Roadmap

Contributing

License

Contact

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages