-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
verbose cache entries for gemm tunings #126221
base: main
Are you sure you want to change the base?
Conversation
add an option to switch triton hash key to a more verbose output that can help with performance debugging; the hash key now includes Triton template configs like BLOCK_M, BLOCK_N, BLOCK_K, num_stages, num_warps, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you mind posting what new output is?
updated the summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for debugging, maybe we could add these as log entries ? ideally we centralize the debugging apis under TORCH_LOGS
self.name.rsplit("_", 1)[0], | ||
self.bmreq.module_cache_key, | ||
] | ||
return ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, would it make more sense to log the correspondence of full key and key as log entry ? It's could be a bit of a foot gun that being more verbose here causes local/global cache misses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we added a super clear "DO NOT USE THIS UNLESS YOU KNOW WHAT YOU ARE DOING" warning to the config? In my mind, this is meant for a very small subset of people (us) who want to do things like pruning/heuristic analysis/etc. for which they need the template arguments.
Generally I would agree, but I think parsing these values from the log could be an absolutely awful mess. Especially if we use this to debug/investigate super expansive tunings. |
add an option to switch triton hash key to a more verbose output that can help with performance debugging; the hash key now includes Triton template configs like BLOCK_M, BLOCK_N, BLOCK_K, num_stages, num_warps, etc.
new cache entries look like:
"ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=16, BLOCK_M=16, BLOCK_N=32, B_PROLOGUE_CAST_TYPE=None, EVEN_K=True, GROUP_M=8, num_stages=3, num_warps=2"
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang