When transcribing Chinese audio, using whisper_full_get_segment_text can return the correct text, but using whisper_full_get_token_text might result in NULL. #2114

ppcfan · 2024-05-01T00:20:13Z

I encountered an issue while transcribing Chinese audio. After transcribing a segment of Chinese audio with whisper_full(...), I can obtain the correct Chinese text using whisper_full_get_segment_text. However, when I iterate over each token and call whisper_full_get_token_text, some tokens return NULL. I suspect this might be due to a single Chinese character corresponding to multiple tokens. If this is the case, how does whisper_full_get_segment_text map multiple tokens to a single Chinese character? Is there a method I can use to merge tokens and then output the correct token text? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When transcribing Chinese audio, using whisper_full_get_segment_text can return the correct text, but using whisper_full_get_token_text might result in NULL. #2114

When transcribing Chinese audio, using whisper_full_get_segment_text can return the correct text, but using whisper_full_get_token_text might result in NULL. #2114

ppcfan commented May 1, 2024

When transcribing Chinese audio, using whisper_full_get_segment_text can return the correct text, but using whisper_full_get_token_text might result in NULL. #2114

When transcribing Chinese audio, using whisper_full_get_segment_text can return the correct text, but using whisper_full_get_token_text might result in NULL. #2114

Comments

ppcfan commented May 1, 2024