Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When transcribing Chinese audio, using whisper_full_get_segment_text can return the correct text, but using whisper_full_get_token_text might result in NULL. #2114

Open
ppcfan opened this issue May 1, 2024 · 0 comments

Comments

@ppcfan
Copy link

ppcfan commented May 1, 2024

I encountered an issue while transcribing Chinese audio. After transcribing a segment of Chinese audio with whisper_full(...), I can obtain the correct Chinese text using whisper_full_get_segment_text. However, when I iterate over each token and call whisper_full_get_token_text, some tokens return NULL. I suspect this might be due to a single Chinese character corresponding to multiple tokens. If this is the case, how does whisper_full_get_segment_text map multiple tokens to a single Chinese character? Is there a method I can use to merge tokens and then output the correct token text? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant