You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm a computer novice, and when fine-tuning using the google/civil_comments dataset, I implemented another preprocessing function modeled after get_preprocessed_samsum, but I kept having problems.
Can you help me find out what's wrong? Here are my code
Traceback (most recent call last):
File "examples/finetuning.py", line 8, in
fire.Fire(main)
...
File "/data/llama-recipes/src/llama_recipes/data/concatenator.py", line 24, in
buffer = {k: v + sample[k] for k,v in buffer.items()}
KeyError: 'input_ids'
[2024-02-28 23:34:18,646] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 60594) of binary: /data/miniconda3/envs/dachuang/bin/python
Traceback (most recent call last):
File "/data/miniconda3/envs/dachuang/bin/torchrun", line 8, in
sys.exit(main())
...
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
I'm a computer novice, and when fine-tuning using the google/civil_comments dataset, I implemented another preprocessing function modeled after get_preprocessed_samsum, but I kept having problems.
Can you help me find out what's wrong?
Here are my code
Traceback (most recent call last):
File "examples/finetuning.py", line 8, in
fire.Fire(main)
...
File "/data/llama-recipes/src/llama_recipes/data/concatenator.py", line 24, in
buffer = {k: v + sample[k] for k,v in buffer.items()}
KeyError: 'input_ids'
[2024-02-28 23:34:18,646] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 60594) of binary: /data/miniconda3/envs/dachuang/bin/python
Traceback (most recent call last):
File "/data/miniconda3/envs/dachuang/bin/torchrun", line 8, in
sys.exit(main())
...
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
examples/finetuning.py FAILED
...
Root Cause (first observed failure):
[0]:
time : 2024-02-28_23:34:18
host : amax
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 60594)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
The text was updated successfully, but these errors were encountered: