You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently trying to comprehend the attention flop calculation as defined here. However, I am facing confusion regarding this specific section, which pertains to the flop calculation when 'casual' is set to True. It seems that the flop is incorrect when query's length is different from key-value' s length.
The text was updated successfully, but these errors were encountered:
It seems that the flop is incorrect when query's length is different from key-value' s length
Yes indeed, you are right.
I guess we also need to distinguish between causal from topleft / bottomright when num_kv != num_q. This is not passed in the API at the moment.
Out of curiosity, what are you using this function for?
I'm trying to calculate mfu and understand how flop is calculated. Many papers describe their system's efficiency using mfu, but few explain how to calculate flop.
❓ Questions and Help
I'm currently trying to comprehend the attention flop calculation as defined here. However, I am facing confusion regarding this specific section, which pertains to the flop calculation when 'casual' is set to True. It seems that the flop is incorrect when query's length is different from key-value' s length.
The text was updated successfully, but these errors were encountered: