verbose cache entries for gemm tunings #126221

nmacchioni · 2024-05-14T22:34:12Z

add an option to switch triton hash key to a more verbose output that can help with performance debugging; the hash key now includes Triton template configs like BLOCK_M, BLOCK_N, BLOCK_K, num_stages, num_warps, etc.

new cache entries look like:
"ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, B_PROLOGUE_CAST_TYPE=None, EVEN_K=True, GROUP_M=8, num_stages=3, num_warps=2"

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

add an option to switch triton hash key to a more verbose output that can help with performance debugging; the hash key now includes Triton template configs like BLOCK_M, BLOCK_N, BLOCK_K, num_stages, num_warps, etc.

pytorch-bot · 2024-05-14T22:34:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126221

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (9 Unrelated Failures)

As of commit c9db2cf with merge base b522e65 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 1, 3, linux.8xlarge.nvidia.gpu) (gh) (trunk failure)
distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 2, 3, linux.8xlarge.nvidia.gpu) (gh) (trunk failure)
distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_parity_powerSGD

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (aot_inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#120841)
hf_BigBird
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_timm, 2, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#120841)
sebotnet33ts_256
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#120841)
hf_BigBird
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#120841)
hf_BigBird

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eellison

would you mind posting what new output is?

nmacchioni · 2024-05-17T19:49:10Z

would you mind posting what new output is?

updated the summary

eellison

It's for debugging, maybe we could add these as log entries ? ideally we centralize the debugging apis under TORCH_LOGS

eellison · 2024-05-21T01:18:44Z

torch/_inductor/select_algorithm.py

-                self.name.rsplit("_", 1)[0],
-                self.bmreq.module_cache_key,
-            ]
+        return (


Hmm, would it make more sense to log the correspondence of full key and key as log entry ? It's could be a bit of a foot gun that being more verbose here causes local/global cache misses.

What if we added a super clear "DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE DOING" warning to the config? In my mind, this is meant for a very small subset of people (us) who want to do things like pruning/heuristic analysis/etc. for which they need the template arguments.

nmacchioni · 2024-05-21T01:32:20Z

It's for debugging, maybe we could add these as log entries ? ideally we centralize the debugging apis under TORCH_LOGS

Generally I would agree, but I think parsing these values from the log could be an absolutely awful mess. Especially if we use this to debug/investigate super expansive tunings.

verbose cache entries for gemm tunings

77f869b

add an option to switch triton hash key to a more verbose output that can help with performance debugging; the hash key now includes Triton template configs like BLOCK_M, BLOCK_N, BLOCK_K, num_stages, num_warps, etc.

pytorch-bot bot added ciflow/inductor module: inductor labels May 14, 2024

changes to select_algorithm

663aeda

nmacchioni marked this pull request as ready for review May 14, 2024 22:36

nmacchioni requested a review from eellison May 15, 2024 01:24

nmacchioni added 3 commits May 16, 2024 15:25

fix lint for config.py

8ff7e12

fix lint for select_algorithm.py

a9a897e

fix lint for config.py, second try

c9db2cf

eellison reviewed May 17, 2024

View reviewed changes

nmacchioni requested a review from eellison May 18, 2024 03:30

eellison reviewed May 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

verbose cache entries for gemm tunings #126221

verbose cache entries for gemm tunings #126221

nmacchioni commented May 14, 2024 •

edited

pytorch-bot bot commented May 14, 2024 •

edited

eellison left a comment

nmacchioni commented May 17, 2024

eellison left a comment

eellison May 21, 2024

nmacchioni May 21, 2024

nmacchioni commented May 21, 2024

verbose cache entries for gemm tunings #126221

Are you sure you want to change the base?

verbose cache entries for gemm tunings #126221

Conversation

nmacchioni commented May 14, 2024 • edited

pytorch-bot bot commented May 14, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126221

✅ You can merge normally! (9 Unrelated Failures)

eellison left a comment

Choose a reason for hiding this comment

nmacchioni commented May 17, 2024

eellison left a comment

Choose a reason for hiding this comment

eellison May 21, 2024

Choose a reason for hiding this comment

nmacchioni May 21, 2024

Choose a reason for hiding this comment

nmacchioni commented May 21, 2024

nmacchioni commented May 14, 2024 •

edited

pytorch-bot bot commented May 14, 2024 •

edited