Skip to content

An end to end deep speech REST API containing speech to text and text speech services for Kinyarwanda.

License

Notifications You must be signed in to change notification settings

agent87/RW-DEEPSPEECH-API

Repository files navigation

Contributors Forks Stargazers Issues


RW DEEPSPEECH API

A Kinyarwanda based end to end deepspeech with speech to text and text to speech services!
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

Welcome to the Kinyarwanda DeepSpeech API repository! This comprehensive guide provides an in-depth exploration of this powerful end-to-end solution for speech processing in Kinyarwanda. With our DeepSpeech API, you can effortlessly convert spoken Kinyarwanda into text and transform text into natural-sounding Kinyarwanda speech. Introduction

In today's digital age, seamless communication across diverse languages is crucial. Our DeepSpeech API for Kinyarwanda bridges language barriers by offering robust speech-to-text and text-to-speech capabilities tailored specifically for the Kinyarwanda language. Whether you are building interactive voice applications, transcribing audio content, or enhancing accessibility features, our API empowers you to achieve your goals with ease. Key Features

Accurate Speech-to-Text Conversion: Leverage our advanced deep learning models to accurately transcribe spoken Kinyarwanda into written text. Our models have been trained on extensive Kinyarwanda speech datasets, ensuring high accuracy and reliability.

Natural Text-to-Speech Synthesis: Generate lifelike Kinyarwanda speech from textual input. Our text-to-speech engine produces natural intonation, rhythm, and pronunciation, creating a seamless and engaging user experience.

End-to-End Processing: Perform both speech-to-text and text-to-speech operations within a single API, streamlining your workflow and saving development time.

Customization: Fine-tune our models to adapt them to specific accents, dialects, or domains, ensuring optimal performance for your unique use case.

Scalability: Our API is designed to handle a high volume of requests, making it suitable for applications ranging from small-scale projects to large-scale enterprise solutions.

This model transcribes speech into lowercase Latin alphabet including spaces, and apostroph, and is trained on around 2000 hours of Kinyarwanda speech data by Nvidia. It is a non-autoregressive "large" variant of Conformer, with around 120 million parameters. See the model architecture and NeMo documentation for complete architecture details.

This model is an end-to-end deep-learning-based Kinyarwanda Text-to-Speech (TTS) developed by Digital Umuganda. Due to its zero-shot learning capabilities, new voices can be introduced with 1min speech. The model was trained using the Coqui's TTS library, and the YourTTS[1] architecture. It was trained on 67 hours of Kinyarwanda bible data, for 100 epochs.

(back to top)

Built With

  • Python
  • FastAPI
  • WebSockets
  • Transformers
  • TTS
  • Uvicorn
  • Nemo

(back to top)

Getting Started

This is a simpple implmentation requiring few lines of code to run.

Prerequisites

It is highly recomended to run the application in docker container to avoid dependency errors but it also possible to run it without docker In terms of specifications needed

  • With Docker:
    • DISK SPACE >= 10GB
    • RAM >= 2GB
  • Without Docker:
    • RAM >= 2GB free/spare

Setup SSL Certificates on Server

Installation with docker

Follow the steps bellow to set up your project on server/machine running docker.

  1. Clone the repo
    git clone https://github.com/agent87/RW-DEEPSPEECH-API.git
  2. Pull the large files with git lfs. Make sure you have git lfs installed or refer to git lfs for installation instructions
    git lfs pull
  3. create an environment file named as ".env" with "touch .env" and paste the variables. Make sure the file is in the root directory of the project
    MONGO_INITDB_ROOT_USERNAME="admin"
    MONGO_INITDB_ROOT_PASSWORD="Bingo123"
    MONGO_HOST="mongo"
    MONGO_PORT=27017
    MONGO_INITDB_DATABASE="Inference"
    MONGO_STT_COLLECTION="STT_INFERENCE_LOGS"
    MONGO_TTS_COLLECTION="TTS_INFERENCE_LOGS"
    MAX_SPEECH_AUDIO_FILE_SIZE=1000
    TTS_MAX_TXT_LEN=1000
    LOG_LEVEL="INFO"
    PYTHONUNBUFFERED=1
    DOMAIN=<Replace your DOMAIN here>
    SERVER_IP_ADDRESS=<Replace your SERVER_IP_ADDRESS here>
    NOTE: For security purposes, make sure to change the variables above!
  4. build the docker image
    docker compose build
    Note: if you have an earlier docker version use "docker-compose build"
  5. Start the docker containers and let the magic begin
    docker compose up

(back to top)

Usage

If you happen not to have speciazed hardware(GPU) you can run the application on Google Colab. Use the following link to open the notebook and follow the instructions in the notebook to run the application. Open In Colab

Speech to Text (STT) usage

curl -X POST "http://server_url/stt" -H  "accept: application/json" -H  "Content-Type: multipart/form-data" -F "file=@/path/to/audio/file"

Text to Speech (TTS) usage

curl -X POST "http://server_url/tts" -H  "accept: application/json" -H  "Content-Type: application/json" -d "{\"text\":\"string\"}"

(back to top)

Roadmap

  • Add database
  • Add Authentication
  • Testing
  • CI/CD Setup tutorial
  • Automated audio conversion
  • OpenAPI Documentation/ Swagger
  • Usage Feedback incorporation into the readme.MD

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the GNU GENERAL PUBLIC LICENSE. See LICENSE.txt for more information.

(back to top)

Contact

Arnaud Kayonga - @kayarn - arnauldkayonga1@gmail.com

Project Link: https://github.com/agent87/RW-DEEPSPEECH-API

(back to top)

Acknowledgments

Use this space to list resources you find helpful and would like to give credit to. I've included a few of my favorites to kick things off!

(back to top)