Training Model Struggling with Character Recognition in Custom Sports Fonts #1253

Suriyapongmax · 2024-05-13T04:29:29Z

I am working on an OCR project aimed at accurately reading player numbers and names from sports images. These images feature 10 different custom fonts, predominantly thick and bold, which cater to a sports aesthetic. The primary challenge is the model's ability to distinguish between similar characters, particularly under the constraints of these stylized fonts.

Fonts: 10 custom sports fonts (English A-Z, a-z, 0-9).
Training Data: Generated dataset of ~200K images, including mixed cases of 3-10 characters with stroke and non-stroke and numbers (00-99) for each font.

After I train with num_iter: 750000 , loss: 0.00126, Valid loss: 0.15052

Problems Encountered:

I test with Validate the accuracy is around 94% but if I test with train set the accuracy is still 85-90% which I think it should more than that because I train with 750K iterations and it should higher than Validate.
Confusion Between Similar Characters: The model often gets mixed up with characters that look alike. For instance, it reads 'NATALIE' as 'NATALlE,' mixing up 'I' and 'l'. It also confuses 'Q' with 'O' and 'Z' with 'z'. I think this might be happening because the training set mixes up upper and lower case randomly. If I switch to using words in all upper case like 'AAAA' or all lower case like 'aaaa' or 'Aaaa" , do you think that could make it better?
Random incorrect predictions with no apparent pattern (e.g., 'EGH' to 'DF').

Request for Help:
I am seeking advice on improving my model’s performance in differentiating similar-looking characters,. Any suggestions on training strategies, network adjustments, or data preprocessing techniques would be greatly appreciated.

here are acutal image I want to predict

here are generated image I used as training set.

here is the config I use
batch_size: 32

FT: False
optim: False
lr: 1
beta1: 0.9
total_data_usage_ratio: 1.0
batch_max_length: 34
imgH: 64
imgW: 600
rgb: False
contrast_adjust: 0.0
sensitive: True
PAD: True
data_filtering_off: False
Transformation: None
FeatureExtraction: VGG
SequenceModeling: BiLSTM
Prediction: CTC
num_fiducial: 20
input_channel: 1
output_channel: 256
hidden_size: 256
decode: greedy

THANK YOU IN ADVANCE !!

alikhalil98771 · 2024-05-13T08:12:29Z

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

Suriyapongmax · 2024-05-13T09:29:33Z

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

alikhalil98771 · 2024-05-13T09:35:00Z

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

Suriyapongmax · 2024-05-13T10:00:07Z

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

I use 4060Ti 16GB. How much learning rate & Batch size you choose ?

alikhalil98771 · 2024-05-13T10:05:00Z

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

I use 4060Ti 16GB. How much learning rate & Batch size you choose ?

I am using the following config:

manualSeed: 1111
workers: 4
batch_size: 128 #32
num_iter: 3000
valInterval: 200
FT: False
optim: False # default is Adadelta
lr: 1.
beta1: 0.9
rho: 0.95
eps: 0.00000001
grad_clip: 5
#Data processing
select_data: 'e' # this is dataset folder in train_data
batch_ratio: '1' 
total_data_usage_ratio: 1.0
batch_max_length: 35 
imgH: 64
imgW: 600
rgb: False
contrast_adjust: False
sensitive: True
PAD: True
contrast_adjust: 0.0
data_filtering_off: False
# Model Architecture
Transformation: 'None'
FeatureExtraction: 'ResNet'
SequenceModeling: 'BiLSTM'
Prediction: 'CTC'
num_fiducial: 20
input_channel: 1
output_channel: 512
hidden_size: 512
decode: 'greedy'
new_prediction: False
freeze_FeatureFxtraction: False
freeze_SequenceModeling: False

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Model Struggling with Character Recognition in Custom Sports Fonts #1253

Training Model Struggling with Character Recognition in Custom Sports Fonts #1253

Suriyapongmax commented May 13, 2024 •

edited

alikhalil98771 commented May 13, 2024

Suriyapongmax commented May 13, 2024

alikhalil98771 commented May 13, 2024

Suriyapongmax commented May 13, 2024

alikhalil98771 commented May 13, 2024

Training Model Struggling with Character Recognition in Custom Sports Fonts #1253

Training Model Struggling with Character Recognition in Custom Sports Fonts #1253

Comments

Suriyapongmax commented May 13, 2024 • edited

alikhalil98771 commented May 13, 2024

Suriyapongmax commented May 13, 2024

alikhalil98771 commented May 13, 2024

Suriyapongmax commented May 13, 2024

alikhalil98771 commented May 13, 2024

Suriyapongmax commented May 13, 2024 •

edited