-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Couldn't find appropriate audio backend to handle URI" when training with WSJ03_mix Sepformer model #2287
Comments
It appears the solution is to install sox_io using: apt-get install sox libsox-dev RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 1439552 but got size 1382016 for tensor number 1 in the list. |
Can you provide the backtrace to this error? |
Thank you for responding! Me and my professor think it might have to do with the "duration" set in my .csv files, we don't know if its in seconds or audio samples, here is the backtrace: ) karson@Le-Ubuntu-Laptop: |
I am not too familiar with this task and dataset but it looks like the code is written with the assumption that all segments in a batch are of the same length. I don't really seem to be able to find code that tries to work around this issue so I'm assuming the segments in the original dataset are already of a fixed length...? Either way, as a workaround, I think you could try truncating the different inputs within the batch to the shortest one. |
Hey! What is the length of the signal? To answer @asumagic , no, the signals in the WSJ0Mix are of variable length. So I think there is something else going on here. Does the same signal signal work if you try it with the pretrained sepformer? (You can try that using the model on huggingface ) |
The same signals work both on Hugging-face and natively using the code snippet provided on hugging-face. Could it be formatting of my .csv files? This is for a project in which we try to improve the model with additional signal made by my group. I appreciate all of yalls help through this. |
Actually, thinking about it, it might be a length issue. We do support variable length signals, but this might be due to the positional embeddings that we are adding. Could you put a breakpoint here, to see if it's here?
|
Also, could you print the shape of the tensors in |
Describe the bug
I am trying to "fine tune" this separation model using some audio files I have prepared. I have looked at the correct .csv file naming and ordering scheme but it can't read my .wav file correctly.
When running the "test" code for the model (not train) by using the audio provided I got a similar error that went away once I installed pysoundfile, but the issue still persists when training.
Expected behaviour
I am expecting it to run through all the required Epochs and complete the training, I have all the necessary libraries needed and I have ensured that CUDA is up to date and installed.
To Reproduce
sudo python3 train.py hparams/sepformer.yaml --data_folder=/home/karson/honors/Dataset\
The "Dataset" contains the correct folders the model needs to look at.
Environment Details
I am using miniconda to set up my environment and have installed the latest version of speechbrain.
Relevant Log Output
Additional Context
I am doing a project where I am looking into "fine-tuning" a speech separation model for a professor at my University, I have learned quite a bit about AI, and any help someone could provide would be much appreciated!
The text was updated successfully, but these errors were encountered: