Skip to content

Latest commit

 

History

History
72 lines (50 loc) · 6.5 KB

awesome_audio_encoding.md

File metadata and controls

72 lines (50 loc) · 6.5 KB

Awesome Audio Encoding

Papers and Projects

  • ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers, arXiv, 2404.19441, arxiv, pdf, cication: -1

    Yuzhe Gu, Enmao Diao · (efficient-speech-codec - yzGuu830) Star

  • SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound, arXiv, 2405.00233, arxiv, pdf, cication: -1

    Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley · (haoheliu.github)

  • PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders, arXiv, 2404.02702, arxiv, pdf, cication: -1

    Yu Pan, Lei Ma, Jianjun Zhao

  • Amphion - open-mmlab Star

    Speech Codec with Attribute Factorization used for NaturalSpeech 3

  • Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models, arXiv, 2402.12208, arxiv, pdf, cication: -1

    Shengpeng Ji, Minghui Fang, Ziyue Jiang, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao · (languagecodec - jishengpeng) Star · (languagecodec.github)

  • funcodec - alibaba-damo-academy Star

  • sonar - facebookresearch Star

    SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

  • High-Fidelity Audio Compression with Improved RVQGAN, arXiv, 2306.06546, arxiv, pdf, cication: -1

    Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar · (descript-audio-codec - descriptinc) Star

  • SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models, arXiv, 2308.16692, arxiv, pdf, cication: -1

    Xin Zhang, Dong Zhang, Shimin Li, Yaqian Zhou, Xipeng Qiu · (speechtokenizer - zhangxinfd) Star

  • SoundStorm: Efficient Parallel Audio Generation, arXiv, 2305.09636, arxiv, pdf, cication: -1

    Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi

  • DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning, arXiv, 2305.10005, arxiv, pdf, cication: -1

    Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass

  • High Fidelity Neural Audio Compression, arXiv, 2210.13438, arxiv, pdf, cication: -1

    Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

  • EnCodec

  • SoundStream: An End-to-End Neural Audio Codec, arXiv, 2107.03312, arxiv, pdf, cication: -1

    Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi

  • HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, arXiv, 2106.07447, arxiv, pdf, cication: -1

    Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed

  • wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations, arXiv, 2006.11477, arxiv, pdf, cication: -1

    Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli

References