Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inverse transform from logscale to linear scale stft #7

Open
adrienchaton opened this issue Nov 26, 2019 · 5 comments
Open

inverse transform from logscale to linear scale stft #7

adrienchaton opened this issue Nov 26, 2019 · 5 comments
Labels
question Further information is requested

Comments

@adrienchaton
Copy link

Hi !

Your repo is a pretty awesome find, I am especially interested in using the stft in log frequency.
Mel operations I was already doing myself using torch.stft and librosa filterbanks, but the more .. the better to experiment with.

May I ask, is there any way to transform a stft computed on log frequency scale back to linear frequency scale please ?

The use case I consider is putting some waveforms into log frequency spectrograms, filtering it and then putting back to linear frequency to then use the inverse stft back to time domain.

Thanks !

@KinWaiCheuk
Copy link
Owner

I need to double check first. Unfortunately it seems log frequency scale is not invertible mathematically. Only the STFT with the original frequency bin (k) scale is invertible, the moment you change the scale of the frequency bin spacing, the Fourier basis vectors are no longer orthogonal. That is why I think it would not be invertible.
But I need further investigate and discussion with other people who is more familiar with it before I can assert whether it is doable or not.
It is a cool suggestion, I hope I can implement it if it is possible.

@adrienchaton
Copy link
Author

Thank you for discussion !

When I meant invertible, I did mean from STFT to STFT, across frequency scales.
Indeed it would mean then being able to take a log frequency STFT, put it back to linear scale and then use the inverse transform of the linear scale back to time domain.

I had in mind maybe some kind of transpose (like when using a mel filter bank), possibly with some approximate.

But I am either an expert with that .. interested to hear about your possible finds on it !

@mpariente
Copy link

It is not invertible but the pseudo-inverse of the forward transform matrix is the way to go. See the back-propagable pseudo-inverse here.
Same goes for the mel-spectrogram by the way.

@adrienchaton
Copy link
Author

Thank you for pointing the torch.pinverse operator that I didn't know !

I seems straight-forward for inverting the mel-spectrogram, however since nnAudio computes STFT through 1d convolution kernels, I am not sure if that applies as well for inverting the log scale to the linear scale .. or computing the log scale through a matrix multiplication similar to mels and using the pseudo-inverse of this matrix ?

@mpariente
Copy link

Ah actually it's a bit different than for the mel-spectrogram, you're right.
I'd suggest taking the pseudo-inverse of the filterbank to invert the log-scaled transform directly, without going through linear scale STFT.

You can find some implementation about this in asteroid, where pseudo inverse can also be computed on the fly for each forward if you want learnable filters.

@KinWaiCheuk KinWaiCheuk added the question Further information is requested label Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants