Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowError: can't convert negative int to unsigned[finetuning XLNet] #30817

Open
4 tasks
ZHAOFEGNSHUN opened this issue May 15, 2024 · 1 comment
Open
4 tasks

Comments

@ZHAOFEGNSHUN
Copy link

ZHAOFEGNSHUN commented May 15, 2024

System Info

File "/home/luban/.conda/envs/my/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 592, in tokenize
return self._first_module().tokenize(texts, **kwargs)
File "/home/luban/.conda/envs/my/lib/python3.9/site-packages/sentence_transformers/models/Transformer.py", line 146, in tokenize
self.tokenizer(
File "/home/luban/.conda/envs/my/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2858, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/luban/.conda/envs/my/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2944, in _call_one
return self.batch_encode_plus(
File "/home/luban/.conda/envs/my/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3135, in batch_encode_plus
return self._batch_encode_plus(
File "/home/luban/.conda/envs/my/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 496, in _batch_encode_plus
self.set_truncation_and_padding(
File "/home/luban/.conda/envs/my/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 451, in set_truncation_and_padding
self._tokenizer.enable_truncation(**target)
OverflowError: can't convert negative int to unsigned

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from torch.utils.data import DataLoader
import math
from sentence_transformers import SentenceTransformer, LoggingHandler, losses, models, util
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator
from sentence_transformers.readers import InputExample
import logging
from datetime import datetime
import sys
import os
import gzip
import csv
from transformers import AutoTokenizer
from transformers import XLNetTokenizer
import logging
logging.getLogger().setLevel(logging.INFO)


model_name = sys.argv[1] if len(sys.argv) > 1 else "/nfs/XLNet/XLNet/xlnet-base-cased"



train_batch_size = 4
num_epochs = 4
model_save_path = (
    "output/training_stsbenchmark_" + model_name.replace("/", "-") + "-" + datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
)

word_embedding_model = models.Transformer(model_name)

pooling_model = models.Pooling(
    word_embedding_model.get_word_embedding_dimension(),
    pooling_mode_mean_tokens=True,
    pooling_mode_cls_token=False,
    pooling_mode_max_tokens=False,
)

model = SentenceTransformer(modules=[word_embedding_model, pooling_model])




train_samples = []
dev_samples = []

with open('/nfs/XLNet/XLNet/al.csv', "r", encoding="utf-8") as csvfile:
    reader = csv.DictReader(csvfile)
    
    for row in reader:
        score = float(row["score"]) / 5.0 
        sentence1 = row["sentence1"]
        sentence2 = row["sentence2"]
      

        inp_example = InputExample(texts=[sentence1, sentence2], label=score)
        train_samples.append(inp_example)
        dev_samples.append(inp_example)



train_dataloader = DataLoader(train_samples, shuffle=True, batch_size=train_batch_size)
train_loss = losses.CosineSimilarityLoss(model=model)

evaluator = EmbeddingSimilarityEvaluator.from_input_examples(dev_samples, name="sts-dev")


warmup_steps = math.ceil(len(train_dataloader) * num_epochs * 0.1)  
print("Warmup-steps: {}".format(warmup_steps))



model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    
    epochs=num_epochs,
    
    warmup_steps=warmup_steps,
    output_path=model_save_path,
    save_best_model = True,
)

Expected behavior

I hope it can run

@ZHAOFEGNSHUN ZHAOFEGNSHUN changed the title OverflowError: can't convert negative int to unsigned(for XLNet) OverflowError: can't convert negative int to unsigned[finetuning XLNet] May 15, 2024
@amyeroberts
Copy link
Collaborator

Hi @ZHAOFEGNSHUN, thanks for raising an issue!

This is a question best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants