Resolve output characters garbled #1422

fireyanci · 2024-05-15T13:24:33Z

hello ， Because I want my model to have Chinese language ability, but the language and training resources required for full parameter training of Chinese language are huge, I used the Chinese llama model trained by other open-source projects. However, I think the litgpt project is very convenient, so I converted the models from other open-source projects to a lit model. The output of the lit model has Chinese language ability, but there is character garbled phenomenon in the output text. How can I solve the character garbled phenomenon? I look forward to your reply. Thank you
(Appendix: Chinese Open Source Model GitHub Address: https://github.com/LlamaFamily/Llama-Chinese?tab=readme-ov-file
Chinese Open Source Model File Hugging Face Address: https://huggingface.co/FlagAlpha/Llama3-Chinese-8B-Instruct/tree/main

fireyanci · 2024-05-15T13:58:33Z

It can understand my question and provide corresponding output, but there are some characters whose output is in garbled form

rasbt · 2024-05-20T22:25:31Z

I wonder perhaps if it is related to the tokenizer? It could also be a limitation of the terminal outputting certain characters. Unfortunately, I am not super familiar with working with those characters. One thing you could try is perhaps adding a print(<expected special characters>) to the script to see if this is maybe an issue with the terminal output?

fireyanci · 2024-05-21T11:43:45Z

Thank you very much for your reply

fireyanci · 2024-05-23T13:20:16Z

Because I've been too busy lately, I've only started trying this method now. I can output those garbled characters from the terminal, and I'm not sure if this is related to the tokenizer，In order to make the model master the Chinese language ability, the developers of the Chinese llama repository have expanded the tokenizer

fireyanci changed the title ~~Resolve output garbled characters~~ Resolve output characters garbled May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve output characters garbled #1422

Resolve output characters garbled #1422

fireyanci commented May 15, 2024

fireyanci commented May 15, 2024

rasbt commented May 20, 2024

fireyanci commented May 21, 2024

fireyanci commented May 23, 2024

Resolve output characters garbled #1422

Resolve output characters garbled #1422

Comments

fireyanci commented May 15, 2024

fireyanci commented May 15, 2024

rasbt commented May 20, 2024

fireyanci commented May 21, 2024

fireyanci commented May 23, 2024