New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce the size of commit info file #125091
base: master
Are you sure you want to change the base?
Conversation
rustbot has assigned @Mark-Simulacrum. Use |
} | ||
|
||
impl Info { | ||
pub fn sha_short(&self) -> &str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not actually 100% equivalent to rev-parse --short
, because that will return a unique prefix with at least 9
characters (https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt---shortltlengthgt). Not sure if we ever hit the case where a 9 character SHA prefix wouldn't be unique though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I happen to have an old blog post exploring git abbreviations:
https://blog.cuviper.com/2013/11/10/how-short-can-git-abbreviate/
(Maybe I should rewrite that in Rust!)
That script on the Rust repo gives me:
2581360 objects
4: 2581360 / 65536
5: 2360776 / 738816
6: 368483 / 179538
7: 24480 / 12224
8: 1488 / 744
9: 102 / 51
10: 6 / 3
11: 0 / 0
130c811393499854d9ed66edb4060797ea3a8b81 blob 11235
130c8113935c7402fc5fc1eab0d778c54c1aa302 tree 142
31c7d4e8041d065214c8098090b9c7d34d2c5268 tree 236
31c7d4e804ee40192f72a79237ddd6636c251575 blob 7811
afb64da8b61738966ee1171e5b2f96c8e004c054 blob 140593
afb64da8b6f34d1df1ecc1eb4ae289e115cb9dfe blob 8706
If you drop the initial --objects
in the script, you'll get only commits:
259050 objects
4: 254083 / 59335
5: 57089 / 27377
6: 3847 / 1919
7: 210 / 105
8: 8 / 4
9: 0 / 0
60cc6b931996b65a053699973bb08487e31bf9f0 commit 1084
60cc6b93f6490eab960e90664215c989515d2f06 commit 1504
dcbb27aa60f5105a49d5e416490d6549385c325d commit 310
dcbb27aab60eac01a666a2d12a64683a8f237e8b commit 707
eb4860c77c62450f721cdefab22a2bde3ae62ec0 commit 994
eb4860c7e1e4109d94e84adc2794f720120604e3 commit 1647
faa280ceb9cd440f462349cd5bbaaff83c269213 commit 250
faa280cee69a34462e43b1a5df7c06faf97e56be commit 287
So 9 characters are currently enough to uniquely identify commits, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is not actually 100% equivalent to
rev-parse --short
, because that will return a unique prefix with at least9
characters (https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt---shortltlengthgt). Not sure if we ever hit the case where a 9 character SHA prefix wouldn't be unique though.
I did not know that, thanks!
So 9 characters are currently enough to uniquely identify commits, at least.
Taking the 10 characters should be fairly enough seems like.
816acff
to
1df0ab2
Compare
Signed-off-by: onur-ozkan <work@onurozkan.dev>
I'm not terribly happy with just truncating to 10 characters, but I guess it's true that we likely have considerable headroom, at least without a malicious attacker. What do we use the short format for today? Just displaying a change log? Or are there other places it shows up, such as rustc -V output or similar? |
It's widely used for many purposes (e.g., it's passed as |
As we already have the complete version of the commit hash, we don't need to keep its short version in the commit file and memory (during bootstrap).