Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Again, but different) AssertionError: assert model_dim % head_count == 0 #2571

Open
James-Decatur opened this issue Mar 13, 2024 · 2 comments

Comments

@James-Decatur
Copy link

Hello,

I'm a graduate student at Indiana University and am trying to run OpenNMT on one of our supercomputers. I keep getting the same error listed here: #952, but I already made the suggested changes. Any idea what the issue could be?

The one change I made was the switch to one GPU (and it runs on Google Colab just fine).

Beforehand, I got an error message saying something along the lines of this 'A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.' I do not remember what I did to correct this problem.


new_no.yaml

Data configurations

save_data: drive/MyDrive/MT_DATA/
src_vocab: drive/MyDrive/MT_DATA/vocab.src
tgt_vocab: drive/MyDrive/MT_DATA/vocab.tgt
save_model: drive/MyDrive/MT_DATA/
overwrite: True
data:
corpus_1:
path_src: drive/MyDrive/MT_DATA/train_set_11_char.txt
path_tgt: drive/MyDrive/MT_DATA/train_set_2_char.txt
valid:
path_src: drive/MyDrive/MT_DATA/dev_set_11_char.txt
path_tgt: drive/MyDrive/MT_DATA/dev_set_2_char.txt

Training settings

save_checkpoint_steps: 10000
valid_steps: 10000
train_steps: 200000

Batching

bucket_size: 262144
world_size: 1 # Since only one GPU is available
gpu_ranks: [0] # Adjusted for single GPU
num_workers: 2
batch_type: "tokens"
batch_size: 4096
valid_batch_size: 2048
accum_count: [4]
accum_steps: [0]

Optimization

model_dtype: "fp16"
optim: "adam"
learning_rate: 2
warmup_steps: 8000
decay_method: "noam"
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: "tokens"

Model architecture

encoder_type: transformer
decoder_type: transformer
position_encoding: true
enc_layers: 6
dec_layers: 6
heads: 8
hidden_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout_steps: [0]
dropout: [0.1]
attention_dropout: [0.1]


@vince62s
Copy link
Member

vince62s commented Mar 13, 2024

If your model's dimension is evenly divisible by the head count, the assertion model_dim % head_count == 0 should not cause an error regardless of the computer you are using. Therefore, ensure to verify the paths and configurations you are referencing.
If you are getting this error, since this is the only check, it means your config is wrong.

@James-Decatur
Copy link
Author

Hello Vincent,

Could you elaborate more? I don't quite understand what you are trying to say.

Thank you,
Jim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants