(Again, but different) AssertionError: assert model_dim % head_count == 0 #2571

James-Decatur · 2024-03-13T18:17:49Z

Hello,

I'm a graduate student at Indiana University and am trying to run OpenNMT on one of our supercomputers. I keep getting the same error listed here: #952, but I already made the suggested changes. Any idea what the issue could be?

The one change I made was the switch to one GPU (and it runs on Google Colab just fine).

Beforehand, I got an error message saying something along the lines of this 'A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.' I do not remember what I did to correct this problem.

new_no.yaml

Data configurations

save_data: drive/MyDrive/MT_DATA/
src_vocab: drive/MyDrive/MT_DATA/vocab.src
tgt_vocab: drive/MyDrive/MT_DATA/vocab.tgt
save_model: drive/MyDrive/MT_DATA/
overwrite: True
data:
corpus_1:
path_src: drive/MyDrive/MT_DATA/train_set_11_char.txt
path_tgt: drive/MyDrive/MT_DATA/train_set_2_char.txt
valid:
path_src: drive/MyDrive/MT_DATA/dev_set_11_char.txt
path_tgt: drive/MyDrive/MT_DATA/dev_set_2_char.txt

Training settings

save_checkpoint_steps: 10000
valid_steps: 10000
train_steps: 200000

Batching

bucket_size: 262144
world_size: 1 # Since only one GPU is available
gpu_ranks: [0] # Adjusted for single GPU
num_workers: 2
batch_type: "tokens"
batch_size: 4096
valid_batch_size: 2048
accum_count: [4]
accum_steps: [0]

Optimization

model_dtype: "fp16"
optim: "adam"
learning_rate: 2
warmup_steps: 8000
decay_method: "noam"
adam_beta2: 0.998
max_grad_norm: 0
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: "tokens"

Model architecture

encoder_type: transformer
decoder_type: transformer
position_encoding: true
enc_layers: 6
dec_layers: 6
heads: 8
hidden_size: 512
word_vec_size: 512
transformer_ff: 2048
dropout_steps: [0]
dropout: [0.1]
attention_dropout: [0.1]

vince62s · 2024-03-13T18:27:22Z

If your model's dimension is evenly divisible by the head count, the assertion model_dim % head_count == 0 should not cause an error regardless of the computer you are using. Therefore, ensure to verify the paths and configurations you are referencing.
If you are getting this error, since this is the only check, it means your config is wrong.

James-Decatur · 2024-03-14T00:23:46Z

Hello Vincent,

Could you elaborate more? I don't quite understand what you are trying to say.

Thank you,
Jim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Again, but different) AssertionError: assert model_dim % head_count == 0 #2571

(Again, but different) AssertionError: assert model_dim % head_count == 0 #2571

James-Decatur commented Mar 13, 2024

vince62s commented Mar 13, 2024 •

edited

James-Decatur commented Mar 14, 2024

(Again, but different) AssertionError: assert model_dim % head_count == 0 #2571

(Again, but different) AssertionError: assert model_dim % head_count == 0 #2571

Comments

James-Decatur commented Mar 13, 2024

new_no.yaml

Data configurations

Training settings

Batching

Optimization

Model architecture

vince62s commented Mar 13, 2024 • edited

James-Decatur commented Mar 14, 2024

vince62s commented Mar 13, 2024 •

edited