Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash aliasing support #16

Open
unreadablewxy opened this issue Jan 5, 2021 · 0 comments
Open

Hash aliasing support #16

unreadablewxy opened this issue Jan 5, 2021 · 0 comments
Labels
planned Want, but not sure how

Comments

@unreadablewxy
Copy link
Owner

There's been cases where crawlers drag in files that looks the same and definitely are the same but hashes slightly differently either due to re-encoding or whatever other transformative processes.

  • We should be able to easily build up a list of known hash aliases by creating hard links into say collection/by-id-alias where files are named based on alais_size.alias_hash.extension
  • Would require a new conflict resolution option file = alias HASH|GROUP+INDEX
  • Would require a new test at import time against known aliases
@unreadablewxy unreadablewxy added the planned Want, but not sure how label Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
planned Want, but not sure how
Projects
None yet
Development

No branches or pull requests

1 participant