Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: ns2_dataset.py does not have this two part, phones and num_frames, which must be need in ns2_trainer.py #171

Open
a897456 opened this issue Mar 30, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@a897456
Copy link

a897456 commented Mar 30, 2024

self.utt2phone[utt] = utt_info["phones"]

self.utt2len[utt] = utt_info["num_frames"]

train_dataset.num_frame_indices,

These two elements are not integrated into train.json which will be used in ns2_trainer.py

@a897456 a897456 added the bug Something isn't working label Mar 30, 2024
@a897456 a897456 changed the title [BUG]: [BUG]: ns2_data.py does not have this two part, phone and num_frames, which must be need in ns2_trainer.py Mar 30, 2024
@a897456 a897456 changed the title [BUG]: ns2_data.py does not have this two part, phone and num_frames, which must be need in ns2_trainer.py [BUG]: ns2_dataset.py does not have this two part, phone and num_frames, which must be need in ns2_trainer.py Mar 30, 2024
@a897456 a897456 changed the title [BUG]: ns2_dataset.py does not have this two part, phone and num_frames, which must be need in ns2_trainer.py [BUG]: ns2_dataset.py does not have this two part, phones and num_frames, which must be need in ns2_trainer.py Mar 30, 2024
@shreeshailgan
Copy link

I am also facing the same problem. You can work around this problem temporarily:

self.utt2phone[utt] = utt_info["phones"]

You can replace the above line with

with open(os.path.join(self.phone_dir, uid + ".phone"), "r") as f:
    self.utt2phone[utt] = f.read().strip()

while setting

self.phone_dir = os.path.join(processed_data_dir, 'phones')

in the __init__ of NS2Dataset

You can just comment out the parts containing frame counts because that is only being used to perform dynamic batching. Also, set "use_dynamic_batchsize": false in exp_config.json

@HeCheng0625
Copy link
Collaborator

Hi, you need to generate the phone sequence and record the number of frames of samples.

@shreeshailgan
Copy link

does number of frames mean the number of phones in the phone sequence?

@HarryHe11
Copy link
Collaborator

does number of frames mean the number of phones in the phone sequence?

Hi @shreeshailgan , according to the NS2 paper, "As shown in Figure 2, our neural audio codec consists of an audio encoder, a residual vector-quantizer (RVQ), and an audio decoder: 1) The audio encoder consists of several convolutional blocks with a total downsampling rate of 200 for 16KHz audio, i.e., each frame corresponds to a 12.5ms speech segment." You could refer to https://arxiv.org/pdf/2304.09116.pdf for more details.

@HarryHe11 HarryHe11 self-assigned this Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants