You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to reproduce the results for LLaMA-2-7b on LIMAEval for the discard method.
I ran the evaluation script after generating with the release model KCA_Llama_2_7B_Discarding_Tuning using the default setting which calls gpt-4.
My results were slightly different from the result reported in the paper (30.95).
I initially thought this was caused by the API model update.
However, the snapshot of gpt-4 points to gpt-4-0613 according to the OpenAI documentation.
Do you have a guess at why this might be happening?
For the record, on MS MACRO the ROUGE scores are also slightly off compared to Table 2
Hi,
thank you for the interesting work!
I am trying to reproduce the results for LLaMA-2-7b on LIMAEval for the discard method.
I ran the evaluation script after generating with the release model
KCA_Llama_2_7B_Discarding_Tuning
using the default setting which callsgpt-4
.My results were slightly different from the result reported in the paper (30.95).
I initially thought this was caused by the API model update.
However, the snapshot of
gpt-4
points togpt-4-0613
according to the OpenAI documentation.Do you have a guess at why this might be happening?
For the record, on MS MACRO the ROUGE scores are also slightly off compared to Table 2
Thanks!
The text was updated successfully, but these errors were encountered: