Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix maximum input token count exceeds limit of 4096 #244

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion 03_Model_customization/01_fine-tuning-titan-lite.ipynb
Expand Up @@ -191,7 +191,7 @@
"\n",
"Amazon Titan text model customization hyperparameters: \n",
"- `epochs`: The number of iterations through the entire training dataset and can take up any integer values in the range of 1-10, with a default value of 5.\n",
"- `batchSize`: The number of samples processed before updating model parametersand can take up any integer values in the range of 1-64, with a default value of 1.\n",
"- `batchSize`: The number of samples processed before updating model parameters and can take up any integer values in the range of 1-64, with a default value of 1.\n",
"- `learningRate`:\tThe rate at which model parameters are updated after each batch\twhich can take up a float value betweek 0.0-1.0 with a default value set to\t1.00E-5.\n",
"- `learningRateWarmupSteps`: The number of iterations over which the learning rate is gradually increased to the specified rate and can take any integer value between 0-250 with a default value of 5.\n",
"\n",
Expand Down
Expand Up @@ -268,8 +268,8 @@
"# - in our testing Character split works better with this PDF data set\n",
"text_splitter = RecursiveCharacterTextSplitter(\n",
" # Set a really small chunk size, just to show.\n",
" chunk_size = 20000, # 4096 tokens * 6 chars per token = 24,576 \n",
" chunk_overlap = 2000, # overlap for continuity across chunks\n",
" chunk_size = 4000, # when set to 20000, got error "Maximum input token count 4919 exceeds limit of 4096". Orginal comment was 4096 tokens * 6 chars per token = 24,576 \n",
" chunk_overlap = 1000, # overlap for continuity across chunks\n",
")\n",
"\n",
"docs = text_splitter.split_documents(document)"
Expand Down