-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSE with Librimix: mismatch in number of speakers #5728
Comments
Thanks for raising the issue. |
@AntoineBlanot Could you paste the content of |
Sure ! Here there are: #!/usr/bin/env bash
# Set bash to 'debug' mode, it will exit on :
# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
set -e
set -u
set -o pipefail
sample_rate=16k # 8k or 16k
min_or_max=min # "min" or "max". This is to determine how the mixtures are generated in local/data.sh.
train_set="train"
valid_set="dev"
test_sets="test "
CUDA_VISIBLE_DEVICES=0,1 ./enh.sh \
--is_tse_task true \
--train_set "${train_set}" \
--valid_set "${valid_set}" \
--test_sets "${test_sets}" \
--fs "${sample_rate}" \
--ref_num 2 \
--local_data_opts "--sample_rate ${sample_rate} --min_or_max ${min_or_max}" \
--lang en \
--ngpu 2 \
--enh_config ./conf/train.yaml \
"$@"
optim: adam
max_epoch: 100
batch_type: folded
batch_size: 16
iterator_type: chunk
chunk_length: 48000
# exclude keys "enroll_ref", "enroll_ref1", "enroll_ref2", ...
# from the length consistency check in ChunkIterFactory
chunk_excluded_key_prefixes:
- "enroll_ref"
num_workers: 4
optim_conf:
lr: 1.0e-03
eps: 1.0e-08
weight_decay: 0
unused_parameters: true
patience: 20
accum_grad: 1
grad_clip: 5.0
val_scheduler_criterion:
- valid
- loss
best_model_criterion:
- - valid
- snr
- max
- - valid
- loss
- min
keep_nbest_models: 1
scheduler: reducelronplateau
scheduler_conf:
mode: min
factor: 0.7
patience: 3
model_conf:
num_spk: 2
share_encoder: true
train_spk2enroll: data/train-100/spk2enroll.json
enroll_segment: 48000
load_spk_embedding: false
load_all_speakers: false
encoder: conv
encoder_conf:
channel: 256
kernel_size: 32
stride: 16
decoder: conv
decoder_conf:
channel: 256
kernel_size: 32
stride: 16
extractor: td_speakerbeam
extractor_conf:
layer: 8
stack: 4
bottleneck_dim: 256
hidden_dim: 512
skip_dim: 256
kernel: 3
causal: False
norm_type: gLN
pre_nonlinear: prelu
nonlinear: relu
# enrollment related
i_adapt_layer: 7
adapt_layer_type: mul
adapt_enroll_dim: 256
use_spk_emb: false
# A list for criterions
# The overlall loss in the multi-task learning will be:
# loss = weight_1 * loss_1 + ... + weight_N * loss_N
# The default `weight` for each sub-loss is 1.0
criterions:
# The first criterion
- name: snr
conf:
eps: 1.0e-7
wrapper: fixed_order
wrapper_conf:
weight: 1.0 |
Thank you! I think the error is caused by the default value of the argument To avoid this error, you could modify its value to True in Sorry about this mistake, I will also make a PR to update the related files. |
Describe the bug
There is a mismatch in the number of speech references and the number of speakers (which is 2 for the Librimix dataset).
Because of this issue, we cannot run the recipe training.
Basic environments:
3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
espnet 202402
pytorch 2.0.1
3858d84051d6bed263cefb968bb1727452012cf2
Thu Mar 28 13:55:11 2024 +0000
Environments from
torch.utils.collect_env
:Task information:
To Reproduce
Steps to reproduce the behavior:
cd egs2/librimix/tse1
Error logs
The text was updated successfully, but these errors were encountered: