Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Model Struggling with Character Recognition in Custom Sports Fonts #1253

Open
Suriyapongmax opened this issue May 13, 2024 · 5 comments

Comments

@Suriyapongmax
Copy link

Suriyapongmax commented May 13, 2024

I am working on an OCR project aimed at accurately reading player numbers and names from sports images. These images feature 10 different custom fonts, predominantly thick and bold, which cater to a sports aesthetic. The primary challenge is the model's ability to distinguish between similar characters, particularly under the constraints of these stylized fonts.

Fonts: 10 custom sports fonts (English A-Z, a-z, 0-9).
Training Data: Generated dataset of ~200K images, including mixed cases of 3-10 characters with stroke and non-stroke and numbers (00-99) for each font.

After I train with num_iter: 750000 , loss: 0.00126, Valid loss: 0.15052

Problems Encountered:

  • I test with Validate the accuracy is around 94% but if I test with train set the accuracy is still 85-90% which I think it should more than that because I train with 750K iterations and it should higher than Validate.
  • Confusion Between Similar Characters: The model often gets mixed up with characters that look alike. For instance, it reads 'NATALIE' as 'NATALlE,' mixing up 'I' and 'l'. It also confuses 'Q' with 'O' and 'Z' with 'z'. I think this might be happening because the training set mixes up upper and lower case randomly. If I switch to using words in all upper case like 'AAAA' or all lower case like 'aaaa' or 'Aaaa" , do you think that could make it better?
  • Random incorrect predictions with no apparent pattern (e.g., 'EGH' to 'DF').

Request for Help:
I am seeking advice on improving my model’s performance in differentiating similar-looking characters,. Any suggestions on training strategies, network adjustments, or data preprocessing techniques would be greatly appreciated.

here are acutal image I want to predict
Adihaus Bold_nostroke (7)
Adihaus Bold_nostroke123 (12)
Adihaus Bold_stroke123 (10)
Alexandria_nostroke123 (29)

here are generated image I used as training set.
nbv5231665978
oc210030322606
oc220080106966
oc220080146883
oc556143399804

here is the config I use
batch_size: 32

FT: False
optim: False
lr: 1
beta1: 0.9
total_data_usage_ratio: 1.0
batch_max_length: 34
imgH: 64
imgW: 600
rgb: False
contrast_adjust: 0.0
sensitive: True
PAD: True
data_filtering_off: False
Transformation: None
FeatureExtraction: VGG
SequenceModeling: BiLSTM
Prediction: CTC
num_fiducial: 20
input_channel: 1
output_channel: 256
hidden_size: 256
decode: greedy

THANK YOU IN ADVANCE !!

@alikhalil98771
Copy link

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

@Suriyapongmax
Copy link
Author

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

@alikhalil98771
Copy link

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

@Suriyapongmax
Copy link
Author

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

I use 4060Ti 16GB. How much learning rate & Batch size you choose ?

@alikhalil98771
Copy link

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

I use 4060Ti 16GB. How much learning rate & Batch size you choose ?

I am using the following config:

manualSeed: 1111
workers: 4
batch_size: 128 #32
num_iter: 3000
valInterval: 200
FT: False
optim: False # default is Adadelta
lr: 1.
beta1: 0.9
rho: 0.95
eps: 0.00000001
grad_clip: 5
#Data processing
select_data: 'e' # this is dataset folder in train_data
batch_ratio: '1' 
total_data_usage_ratio: 1.0
batch_max_length: 35 
imgH: 64
imgW: 600
rgb: False
contrast_adjust: False
sensitive: True
PAD: True
contrast_adjust: 0.0
data_filtering_off: False
# Model Architecture
Transformation: 'None'
FeatureExtraction: 'ResNet'
SequenceModeling: 'BiLSTM'
Prediction: 'CTC'
num_fiducial: 20
input_channel: 1
output_channel: 512
hidden_size: 512
decode: 'greedy'
new_prediction: False
freeze_FeatureFxtraction: False
freeze_SequenceModeling: False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants