Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据格式问题 #772

Open
sevenandseven opened this issue May 13, 2024 · 2 comments
Open

数据格式问题 #772

sevenandseven opened this issue May 13, 2024 · 2 comments

Comments

@sevenandseven
Copy link

你好,在评估msmarco指标时,是将content数据变为:
{"content": "A is ...", "B is ...", "C is ..."} 这种格式是吗?

每一个content后有多个候选的段落。

@staoxiao
Copy link
Collaborator

The data format is:

{"content": "A is ..."}
{"content": "B is ..."}
{"content": "C is ..."}
{"content": "Panda is ..."}
{"content": "... is A"}

, where each line is a dict containing a text instead of a list of text

You can refer to our example data: https://github.com/FlagOpen/FlagEmbedding/blob/master/examples/finetune/toy_evaluation_data/toy_corpus.json

@sevenandseven
Copy link
Author

The data format is:

{"content": "A is ..."}
{"content": "B is ..."}
{"content": "C is ..."}
{"content": "Panda is ..."}
{"content": "... is A"}

, where each line is a dict containing a text instead of a list of text

You can refer to our example data: https://github.com/FlagOpen/FlagEmbedding/blob/master/examples/finetune/toy_evaluation_data/toy_corpus.json

"Thank you for your reply, I have succeeded in making it."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants