Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][gitlab] The data in the commits table is different from the data in GitLab #7480

Open
2 of 3 tasks
Shikanor opened this issue May 17, 2024 · 4 comments
Open
2 of 3 tasks
Assignees
Labels
needs-triage An issue that hasn't had any proper look severity/p1 This bug affects functionality or significantly affect ux type/bug This issue is a bug

Comments

@Shikanor
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

While attempting to use the commits table for a code volume analysis, I encountered an unexpected scenario: a portion of the additions in the commits table did not match the additions displayed on the GitLab page, especially following merge operations. To confirm whether this was an issue with the token, I used the same token to write a Python script for verification. The results showed that indeed, there are occasional discrepancies between the data in the commits table and the data on GitLab.

image
image
image

What do you expect to happen

I hope to minimize these discrepancies as much as possible, as I'm not sure if one day they might affect the code volume statistics for an entire individual.

How to reproduce

First, create a feature branch from the main branch, then commit some code on the feature branch. After that, switch back to the main branch and commit changes to a different file (ensuring there are no merge conflicts). Finally, merge the feature branch back into the main branch. After the merge, click the "Collect Data" button to retrieve the commits data from the commits table and compare it with the data on GitLab.

Anything else

No response

Version

v1.0.0-beta7@2cc08c7

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Shikanor Shikanor added the type/bug This issue is a bug label May 17, 2024
@dosubot dosubot bot added needs-triage An issue that hasn't had any proper look severity/p1 This bug affects functionality or significantly affect ux labels May 17, 2024
Copy link

dosubot bot commented May 17, 2024

To address the discrepancies between the commits table and GitLab data, especially after merge operations, it's crucial to dive deeper into the specifics of your situation. Here's how we can proceed:

  1. Share the Python Script: The script you used for verification could provide insights into how data is being fetched and compared. This will help in understanding the methodology behind your findings.

  2. Identify Patterns: Knowing if these discrepancies follow any specific patterns, such as occurring more with larger commits or certain file types, can help in pinpointing the cause.

  3. Provide Examples: Examples of commit hashes with their discrepancies in additions count between the commits table and GitLab would be invaluable for a detailed investigation.

  4. Scope of the Issue: Clarifying whether this issue is isolated to specific repositories or is more widespread will aid in diagnosing the problem more effectively.

This information will be instrumental in uncovering the root cause of the discrepancies you're observing.

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@Startrekzky
Copy link
Contributor

Thanks for reporting.

@klesh
Copy link
Contributor

klesh commented May 20, 2024

It appears the libgit2 library we're using has a different diff algorithm compared to GitLab. Unfortunately, this is likely an expected behavior rather than a bug and wouldn't be easily configurable.

@Startrekzky
Copy link
Contributor

Startrekzky commented May 21, 2024

Hi, we checked the logic. The difference between additions from GitLab APIs and DevLake is because:

  • DevLake doesn't use GitLab APIs, but the gitextractor plugin to collect commits
  • The gitextractor plugin uses the libgit2 library to calculate the commits additions and deletions.
  • The logic of calculating the additions in gogit and GitLab APIs are slightly different.

Thus, it's a problem that can not be addressed in DevLake for now. We can't switch gitextractor to GitLab APIs or GitHub APIs or Bitbucket APIs or Azure DevOps APIs to collect commits (Git) data, as it will increase the collection time by 10x times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage An issue that hasn't had any proper look severity/p1 This bug affects functionality or significantly affect ux type/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

3 participants