Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model training does not work on CPU #99

Open
saurabhhssaurabh opened this issue Jan 27, 2021 · 1 comment
Open

Model training does not work on CPU #99

saurabhhssaurabh opened this issue Jan 27, 2021 · 1 comment

Comments

@saurabhhssaurabh
Copy link

I have cloned code from dev branch and executing following command to fine-tune model on CPU:
python run_ner.py --cache_dir=path_to_cache --data_dir=path_to_data --bert_model=bert-base-uncased --task_name=ner --output_dir=path_to_output --no_cuda --do_train --do_eval --warmup_proportion=0.1

But I am facing the following error:
Traceback (most recent call last):
File "run_ner.py", line 611, in
main()
File "run_ner.py", line 503, in main
loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask)
File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "run_ner.py", line 43, in forward
logits = self.classifier(sequence_output)
File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear
output = input.matmul(weight.t())
RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

I am not getting when I am passing CPU flag, why is it expecting a tensor to be on GPU?

@brandonrobertz
Copy link

--no_cuda has an error with the NER task, because the device can still be set to GPU here:

class Ner(BertForTokenClassification):

    def forward(self, input_ids,
                token_type_ids=None,
                attention_mask=None,
                labels=None,
                valid_ids=None,
                attention_mask_label=None):
        # ... skipping to line 47
        valid_output = torch.zeros(batch_size,
                                   max_len,
                                   feat_dim,
                                   dtype=torch.float32,
                                   device='gpu')

I changed the default device arg to cpu when I wasn't using CUDA and everything worked as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants