You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
a897456
changed the title
[BUG]:
[BUG]: ns2_data.py does not have this two part, phone and num_frames, which must be need in ns2_trainer.py
Mar 30, 2024
a897456
changed the title
[BUG]: ns2_data.py does not have this two part, phone and num_frames, which must be need in ns2_trainer.py
[BUG]: ns2_dataset.py does not have this two part, phone and num_frames, which must be need in ns2_trainer.py
Mar 30, 2024
a897456
changed the title
[BUG]: ns2_dataset.py does not have this two part, phone and num_frames, which must be need in ns2_trainer.py
[BUG]: ns2_dataset.py does not have this two part, phones and num_frames, which must be need in ns2_trainer.py
Mar 30, 2024
You can just comment out the parts containing frame counts because that is only being used to perform dynamic batching. Also, set "use_dynamic_batchsize": false in exp_config.json
does number of frames mean the number of phones in the phone sequence?
Hi @shreeshailgan , according to the NS2 paper, "As shown in Figure 2, our neural audio codec consists of an audio encoder, a residual vector-quantizer (RVQ), and an audio decoder: 1) The audio encoder consists of several convolutional blocks with a total downsampling rate of 200 for 16KHz audio, i.e., each frame corresponds to a 12.5ms speech segment." You could refer to https://arxiv.org/pdf/2304.09116.pdf for more details.
Amphion/models/tts/naturalspeech2/ns2_dataset.py
Line 121 in 5cb75d8
Amphion/models/tts/naturalspeech2/ns2_dataset.py
Line 131 in 5cb75d8
Amphion/models/tts/naturalspeech2/ns2_trainer.py
Line 269 in 5cb75d8
These two elements are not integrated into train.json which will be used in ns2_trainer.py
The text was updated successfully, but these errors were encountered: