Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Unexpected segmentation fault encountered in worker. #370

Open
howardgriffin opened this issue May 6, 2024 · 2 comments
Open

ERROR: Unexpected segmentation fault encountered in worker. #370

howardgriffin opened this issue May 6, 2024 · 2 comments
Labels
question Further information is requested stale

Comments

@howardgriffin
Copy link

howardgriffin commented May 6, 2024

When running the v1.1 training(using bucket), I encountered this error. Any suggestions?

Traceback (most recent call last):
File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1133, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/root/miniconda3/envs/opensora/lib/python3.10/queue.py", line 180, in get
self.not_empty.wait(remaining)
File "/root/miniconda3/envs/opensora/lib/python3.10/threading.py", line 324, in wait
gotit = waiter.acquire(True, timeout)
File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 1871778) is killed by signal: Segmentation fault.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Open-Sora/scripts/train.py", line 330, in
main()
File "/Open-Sora/scripts/train.py", line 239, in main
for step, batch in pbar:
File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/tqdm/std.py", line 1169, in iter
for obj in iterable:
File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1329, in _next_data
idx, data = self._get_data()
File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1285, in _get_data
success, data = self._try_get_data()
File "/root/miniconda3/envs/opensora/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1146, in _try_get_data
raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e
RuntimeError: DataLoader worker (pid(s) 1871778) exited unexpectedly

@zhengzangw
Copy link
Collaborator

Could you provide some rows in your csv file, and also the command you run? The csv must be processed so that the video has height, width, etc. information.

@zhengzangw zhengzangw added the question Further information is requested label May 9, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

2 participants