-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Model Struggling with Character Recognition in Custom Sports Fonts #1253
Comments
Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time. |
200K images with 40,000 Iteratios about 1 hour. |
Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours. |
I use 4060Ti 16GB. How much learning rate & Batch size you choose ? |
I am using the following config:
|
I am working on an OCR project aimed at accurately reading player numbers and names from sports images. These images feature 10 different custom fonts, predominantly thick and bold, which cater to a sports aesthetic. The primary challenge is the model's ability to distinguish between similar characters, particularly under the constraints of these stylized fonts.
Fonts: 10 custom sports fonts (English A-Z, a-z, 0-9).
Training Data: Generated dataset of ~200K images, including mixed cases of 3-10 characters with stroke and non-stroke and numbers (00-99) for each font.
After I train with num_iter: 750000 , loss: 0.00126, Valid loss: 0.15052
Problems Encountered:
Request for Help:
I am seeking advice on improving my model’s performance in differentiating similar-looking characters,. Any suggestions on training strategies, network adjustments, or data preprocessing techniques would be greatly appreciated.
here are acutal image I want to predict
here are generated image I used as training set.
here is the config I use
batch_size: 32
FT: False
optim: False
lr: 1
beta1: 0.9
total_data_usage_ratio: 1.0
batch_max_length: 34
imgH: 64
imgW: 600
rgb: False
contrast_adjust: 0.0
sensitive: True
PAD: True
data_filtering_off: False
Transformation: None
FeatureExtraction: VGG
SequenceModeling: BiLSTM
Prediction: CTC
num_fiducial: 20
input_channel: 1
output_channel: 256
hidden_size: 256
decode: greedy
THANK YOU IN ADVANCE !!
The text was updated successfully, but these errors were encountered: