-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
测试结果复现(LLaMA Portability bug已修复,结果待更正) #227
Comments
Thank you for your time to reproduce the results of reliability, generalization, and locality. EasyEdit's metrics have gone through several versions of iterations, which may be the reason for the inconsistency in measuring portability. Please give me some time, I will evaluate the corresponding indicators and give you feedback in time. 🥹 |
EasyEdit appreciates your feedback. I reviewed the experimental results and found that the performance on Using Llama, I reproduced your results. I think the reason is that there is a special token in the tokenizer of llama, and there was no special processing in the original version, resulting in poor results. No need to be confused, you can refer to the currently reproduced metrics. I will repeat the experiment on llama and update ReadME and EasyEdit Paper in time. Apologize for our oversight. 😔 |
Okay, thank you so much!😀 |
Hello, the results of the portability metrics for various editing methods on LLaMA-2-7B have been updated in the README. The corresponding EasyEdit paper will also be updated soon. Thank you very much for pointing out this issue. I will acknowledge you in the paper. May I have your name, please? |
If all your issues have been resolved, please help close this issue. |
Thank you for your positive reply. I have no further issues. |
您好,我在复现你们使用EasyEdit在LlaMA-2-7B上的指标的时候遇到了问题。如下图是你们在README中给出的测试结果
我尝试复现其中FT的结果,我使用命令
llama-7b.yaml的内容是
使用的模型来自https://huggingface.co/meta-llama/Llama-2-7b-hf
测试数据集是zsre_mend_eval_portability_gpt4.json的前100项
我添加了四行代码用于测试最终结果:
其中Reliability, Generalization和Locality的测试结果和您的结果接近,但是Portability却达到了54.05
我还尝试在edit函数中加入了summary_metrics=True的参数,以此来调用你们这边的指标总结功能,发现测出来的portability的one-hop-acc同样是这个值。
我还去测试了MEND的结果,发现Portability同样是高了非常多,是53.75.
我感到很困惑,是我的测试方法不对吗?
The text was updated successfully, but these errors were encountered: