Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verbose cache entries for gemm tunings #126221

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

nmacchioni
Copy link
Contributor

@nmacchioni nmacchioni commented May 14, 2024

add an option to switch triton hash key to a more verbose output that can help with performance debugging; the hash key now includes Triton template configs like BLOCK_M, BLOCK_N, BLOCK_K, num_stages, num_warps, etc.

new cache entries look like:
"ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, B_PROLOGUE_CAST_TYPE=None, EVEN_K=True, GROUP_M=8, num_stages=3, num_warps=2"

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

add an option to switch triton hash key to a more verbose output that can help with performance debugging; the hash key now includes Triton template configs like BLOCK_M, BLOCK_N, BLOCK_K, num_stages, num_warps, etc.
Copy link

pytorch-bot bot commented May 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126221

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (9 Unrelated Failures)

As of commit c9db2cf with merge base b522e65 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@nmacchioni nmacchioni marked this pull request as ready for review May 14, 2024 22:36
@nmacchioni nmacchioni requested a review from eellison May 15, 2024 01:24
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind posting what new output is?

@nmacchioni
Copy link
Contributor Author

would you mind posting what new output is?

updated the summary

@nmacchioni nmacchioni requested a review from eellison May 18, 2024 03:30
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for debugging, maybe we could add these as log entries ? ideally we centralize the debugging apis under TORCH_LOGS

self.name.rsplit("_", 1)[0],
self.bmreq.module_cache_key,
]
return (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, would it make more sense to log the correspondence of full key and key as log entry ? It's could be a bit of a foot gun that being more verbose here causes local/global cache misses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we added a super clear "DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE DOING" warning to the config? In my mind, this is meant for a very small subset of people (us) who want to do things like pruning/heuristic analysis/etc. for which they need the template arguments.

@nmacchioni
Copy link
Contributor Author

It's for debugging, maybe we could add these as log entries ? ideally we centralize the debugging apis under TORCH_LOGS

Generally I would agree, but I think parsing these values from the log could be an absolutely awful mess. Especially if we use this to debug/investigate super expansive tunings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants