Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ansj如何对英文粘连起来的词分词? #802

Open
wanggenAi opened this issue Nov 2, 2023 · 1 comment
Open

ansj如何对英文粘连起来的词分词? #802

wanggenAi opened this issue Nov 2, 2023 · 1 comment

Comments

@wanggenAi
Copy link

ansj如何对英文分词? 比如这个term: iwantto
然后我想分成:i/自定义词性 want/自定义词性 to/自定义词性
这样改如何配置,需要改代码吗?

@wanggenAi wanggenAi changed the title ansj如何对英文分词? ansj如何对英文粘连起来的词分词? Nov 2, 2023
@shi-yuan
Copy link
Member

shi-yuan commented Feb 25, 2024

可以先分出来,词性是en,构造自定义词典,继承SmartGetWord,重写getAllWords、getFrontWords,处理父类SmartGetWord的checkNumberOrEnglish,之后自定义Recognition,在Recognition实现里,可通过以下代码拿到结果

MyGetWord getWord = new MyGetWord(myforest, "iwantto".toCharArray());
String word;
while ((word = getWord.getFrontWords()) != null) {
    // 词
    System.out.println(word);

    // 词性,权重,...
    String[] param = getWord.getParam();
    System.out.println(Arrays.toString(param));
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants