[Help]: may i ask what is the diffrent between TTA and TTM? #190

rainbowjack · 2024-04-24T07:10:55Z

Is TTA included in TTM?

viewfinder-annn · 2024-04-25T04:37:30Z

Hi @rainbowjack, nice question!

tl;dr: text-to-audio (TTA) includes text-to-music (TTM). You can use text-music pairs to train a TTA model, which turns into a TTM model actually, but it may require large amounts of data due to the inner structure(tempo, harmony, melody, etc.) in music piece.

Theoretically, audio includes music. AudioLDM[1] denotes audio sound effects, music, or speech, AudioLM[2] says AUDIO signals, be they speech, music or environmental, AudioGen[3] refers audio as soundscapes to music or speech. Thus, it is not that "TTA is included in TTM," but rather, TTM is a subclass of TTA, where music represents a more abstract form of audio, including more inner structures(tempo, harmony, melody, etc.)

Currently, under Amphion framework, there is a TTA model based on latent diffusion. If you obtain text-music data pairs, you can use them directly to train a TTM model. However, it is important to note that music generation models generally require vast amounts of data (340k hours for noise2music[4], 280k hours for MusicLM[5], 20k hours for musicgen[6], 46k hours for singsong[7]), so if the results are not satisfactory, it is likely due to the quality of data rather than the model itself.

Furthermore, we are also developing some music generation framework, stay tuned if you are interested :)

[1] AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
[2] AudioLM: a Language Modeling Approach to Audio Generation
[3] AudioGen: Textually Guided Audio Generation
[4] Noise2Music: Text-conditioned Music Generation with Diffusion Models
[5] MusicLM: Generating Music From Text
[6] Simple and Controllable Music Generation
[7] SingSong: Generating musical accompaniments from singing

rainbowjack · 2024-04-25T05:45:41Z

嗨，好问题！

tl;dr：**文本转音频（TTA）包括文本转音乐（TTM）。**您可以使用文本-音乐对来训练 TTA 模型，该模型实际上变成了 TTM 模型，但由于音乐作品的内部结构（速度、和声、旋律等），它可能需要大量数据。

从理论上讲，音频包括音乐。AudioLDM[1]表示音频，AudioLM[2]表示，AudioGen[3]表示音频。因此，并不是说“TTA 包含在 TTM 中”，而是 TTM 是 TTA 的一个子类，其中音乐代表了一种更抽象的音频形式，包括更多的内部结构（速度、和声、旋律等）。sound effects, music, or speech``AUDIO signals, be they speech, music or environmental``soundscapes to music or speech

目前，在Amphion框架下，有一个基于潜在扩散的TTA模型。如果获取文本-音乐数据对，则可以直接使用它们来训练 TTM 模型。然而，需要注意的是，音乐生成模型通常需要大量的数据（noise2music[4]为340k小时，MusicLM[5]为280k小时，musicgen[6]为20k小时，singsong[7]为46k小时），因此，如果结果不令人满意，很可能是由于数据质量而不是模型本身。

此外，我们还在开发一些音乐生成框架，如果您有兴趣，请继续关注:)

[1] AudioLDM：使用潜在扩散模型生成文本到音频 [2] AudioLM：音频生成的语言建模方法 [3] AudioGen：文本引导音频生成 [4] Noise2Music：使用扩散模型生成文本条件音乐 [5] MusicLM：从文本生成音乐 [6] 简单可控的音乐生成 [7] SingSong：从歌唱中产生音乐伴奏

Thank you very much, I just need to do some research on music synthesis or genre

viewfinder-annn self-assigned this Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help]: may i ask what is the diffrent between TTA and TTM? #190

[Help]: may i ask what is the diffrent between TTA and TTM? #190

rainbowjack commented Apr 24, 2024

viewfinder-annn commented Apr 25, 2024 •

edited

rainbowjack commented Apr 25, 2024

[Help]: may i ask what is the diffrent between TTA and TTM? #190

[Help]: may i ask what is the diffrent between TTA and TTM? #190

Comments

rainbowjack commented Apr 24, 2024

viewfinder-annn commented Apr 25, 2024 • edited

rainbowjack commented Apr 25, 2024

viewfinder-annn commented Apr 25, 2024 •

edited