Default matrix-multiplication precision on TPUs

Currently, matrix multiplication on TPUs (with float32 dtypes) defaults to bfloat16 multiplication with float32 accumulation. This allows for [really fantastic performance for neural nets](https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus).

Using higher precision requires explicitly setting it, e.g., `jax.numpy.matmul(x, y, precision=jax.lax.Precision.HIGHEST)`. (We could conceivably shorten this by supporting strings, e.g., `jax.numpy.matmul(x, y, precision='highest')`)

Is this the right default behavior for functions like `jax.numpy.matmul` and the `@` matmul operator? Or should we switch `precision` to default to full float32 precision, at the price of extra matrix-multiplication passes? This would probably be a little more annoying for neural network users, but it is arguably less surprising, especially for users who are using matrix-multiplication for other uses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Default matrix-multiplication precision on TPUs #1856

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Default matrix-multiplication precision on TPUs #1856

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions