New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] deeplake.util.exceptions.ReadSampleFromChunkError #2741
Comments
There seems to be an issue with the That exists on your private s3 bucket and was previously created with different code, right? Had it been originally working, and then that file started throwing an exception? Or was it failing starting from original load? |
Thank you very much for your response, for the same dataset that exists in MinIO(private s3 bucket), I have no problem accessing it using python single thread, meaning that my data is not corrupted,
0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): The above exception was the direct cause of the following exception: Traceback (most recent call last): The above exception was the direct cause of the following exception:
|
The attachment contains the code to create torch.utils.dataset(LoadDeeplakeImagesAndLabels) and dataloader(create_dataloader) using deeplake |
I found that the above problem can be solved by using deeplake.load() with the parameter access_method set to f'local:4', but this way there is no guarantee that the most recent dataset is used every time. |
Severity
P0 - Critical breaking issue or missing functionality
Current Behavior
I am using torch.distributed.DistributedSampler(dataset, shuffle=shuffle) to write the dataloader where dataset needs to be read from deeplake, And I load deeplake dataset with def init() in the dataset class. But when I iteratively access the dataloader, I get the following error: deeplake.util.exceptions.ReadSampleFromChunkError: Unable to read sample at index 97 from chunk 'images/chunks/bc4c02f9eec3464e' in tensor images.
Steps to Reproduce
If I load the deeplake dataset in the init__()__ function of the dataset, and then access it in the getitem function I have a problem.
Expected/Desired Behavior
Customize a dataloader that reads data from deeplake and supports distributed training.
Python Version
Python 3.10.9 (main, Mar 8 2023, 10:47:38) [GCC 11.2.0] on linux
OS
No response
IDE
No response
Packages
No response
Additional Context
No response
Possible Solution
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: