Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce the size of commit info file #125091

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

onur-ozkan
Copy link
Member

As we already have the complete version of the commit hash, we don't need to keep its short version in the commit file and memory (during bootstrap).

@rustbot
Copy link
Collaborator

rustbot commented May 13, 2024

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels May 13, 2024
}

impl Info {
pub fn sha_short(&self) -> &str {
Copy link
Contributor

@Kobzol Kobzol May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not actually 100% equivalent to rev-parse --short, because that will return a unique prefix with at least 9 characters (https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt---shortltlengthgt). Not sure if we ever hit the case where a 9 character SHA prefix wouldn't be unique though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happen to have an old blog post exploring git abbreviations:
https://blog.cuviper.com/2013/11/10/how-short-can-git-abbreviate/
(Maybe I should rewrite that in Rust!)

That script on the Rust repo gives me:

2581360 objects
 4: 2581360 / 65536
 5: 2360776 / 738816
 6: 368483 / 179538
 7: 24480 / 12224
 8: 1488 / 744
 9: 102 / 51
10: 6 / 3
11: 0 / 0
130c811393499854d9ed66edb4060797ea3a8b81 blob 11235
130c8113935c7402fc5fc1eab0d778c54c1aa302 tree 142
31c7d4e8041d065214c8098090b9c7d34d2c5268 tree 236
31c7d4e804ee40192f72a79237ddd6636c251575 blob 7811
afb64da8b61738966ee1171e5b2f96c8e004c054 blob 140593
afb64da8b6f34d1df1ecc1eb4ae289e115cb9dfe blob 8706

If you drop the initial --objects in the script, you'll get only commits:

259050 objects
 4: 254083 / 59335
 5: 57089 / 27377
 6: 3847 / 1919
 7: 210 / 105
 8: 8 / 4
 9: 0 / 0
60cc6b931996b65a053699973bb08487e31bf9f0 commit 1084
60cc6b93f6490eab960e90664215c989515d2f06 commit 1504
dcbb27aa60f5105a49d5e416490d6549385c325d commit 310
dcbb27aab60eac01a666a2d12a64683a8f237e8b commit 707
eb4860c77c62450f721cdefab22a2bde3ae62ec0 commit 994
eb4860c7e1e4109d94e84adc2794f720120604e3 commit 1647
faa280ceb9cd440f462349cd5bbaaff83c269213 commit 250
faa280cee69a34462e43b1a5df7c06faf97e56be commit 287

So 9 characters are currently enough to uniquely identify commits, at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not actually 100% equivalent to rev-parse --short, because that will return a unique prefix with at least 9 characters (https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt---shortltlengthgt). Not sure if we ever hit the case where a 9 character SHA prefix wouldn't be unique though.

I did not know that, thanks!

So 9 characters are currently enough to uniquely identify commits, at least.

Taking the 10 characters should be fairly enough seems like.

Signed-off-by: onur-ozkan <work@onurozkan.dev>
@Mark-Simulacrum
Copy link
Member

I'm not terribly happy with just truncating to 10 characters, but I guess it's true that we likely have considerable headroom, at least without a malicious attacker.

What do we use the short format for today? Just displaying a change log? Or are there other places it shows up, such as rustc -V output or similar?

@onur-ozkan
Copy link
Member Author

What do we use the short format for today? Just displaying a change log?

It's widely used for many purposes (e.g., it's passed as CFG_SHORT_COMMIT_HASH env to cargo).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants