Cannot do inference in float32 #595

borisdayma · 2024-04-16T04:19:50Z

If we try to perform inference in float32, we get the error:

AssertionError: Key and Value Dtypes should match

This error comes from this line.

The origin of the error is that the cache dtype is set to jnp.int8 if quantize_kvcache else jnp.bfloat16 but never to jnp.float32.

The text was updated successfully, but these errors were encountered:

rwitten · 2024-04-18T05:55:16Z

What are you setting that triggets this? (Activations to float32?)

borisdayma · 2024-04-19T04:19:13Z

Yes it's the dtype:

Line 61 in f52e6f7

dtype: "bfloat16"

Provide feedback