Skip to content

This repository includes all computer vision, audio, document AI, and multimodal projects.

Notifications You must be signed in to change notification settings

DunnBC22/Vision_Audio_and_Multimodal_Projects

Repository files navigation

Computer Vision, Audio, & Multimodal Projects

This repository houses both semi-structured and non-structured projects that both were not completed using Spark and are not Natural Language (NLP) projects.

Binary Image Classification (Computer Vision)
Project Name Accuracy F1-Score Precision Recall
Bart vs Homer 0.9863 0.9841 0.9688 1.0
Brain Tumor MRI Images 0.9216 0.9375 0.8824 1.0
COVID19 Lung CT Scans 0.94 0.9379 0.9855 0.8947
Car or Motorcycle 0.9938 0.9939 0.9951 0.9927
Dogs or Cats Image Classification 0.99 0.9897 0.9885 0.9909
Male or Female Eyes 0.9727 0.9741 0.9818 0.9666
Breast Histopathology Image Classification 0.8202 0.8151 0.8141 0.8202
Multiclass & Multilabel Image Classification

Multiclass Image Classification

Project Name Accuracy Macro F1-Score Macro Precision Macro Recall Best Algorithm
Brain Tumors Image Classification1 0.8198 0.8054 0.8769 0.8149 Vision Transformer (ViT)
Diagnoses from Colonoscopy Images 0.9375 0.9365 0.9455 0.9375 -
Human Activity Recognition 0.8381 0.8394 0.8424 0.839 -
Intel Image Classification 0.9487 0.9497 0.9496 0.95 -
Landscape Recognition 0.8687 0.8694 0.8714 0.8687 -
Lung & Colon Cancer 0.9994 0.9994 0.9994 0.9994 -
Mango Leaf Disease Dataset 1.0 1.0 1.0 1.0 -
Simpsons Family Images 0.953 0.9521 0.9601 0.9531 -
Vegetable Image Classification 1.0 1.0 1.0 1.0 -
Weather Images 0.934 0.9372 0.9398 0.9354 -
Hyper Kvasir Labeled Image Classification 0.8756 0.5778 0.5823 0.5746 -

Multilabel Image Classification

Project Name Subset Accuracy F1 Score ROC AUC
Futurama - ML Image CLF 0.9672 0.9818 0.9842
Object Detection (Computer Vision)
Project Name Avg. Precision2 Avg. Recall3
License Plate Object Detection 0.513 0.617
Pedestrian Object Detection 0.560 0.745
ACL X-Rays 0.09 0.308
Abdomen MRIs 0.453 0.715
Axial MRIs 0.284 0.566
Blood Cell Object Detection 0.344 0.448
Brain Tumors 0.185 0.407
Cell Tower Object Detection 0.287 0.492
Stomata Cells 0.340 0.547
Excavator Object Detection 0.386 0.748
Forklift Object Detection 0.136 0.340
Hard Hat Object Detection 0.346 0.558
Liver Disease Object Detection 0.254 0.552
  • There are other Object Detection projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.
Image Segmentation (Computer Vision)
Project Name Mean IoU Mean Accuracy Overall Accuracy Use PEFT?
Carvana Image Modeling 0.9917 0.9962 0.9972 Yes
Dominoes 0.9198 0.9515 0.9778 Yes
CMP Facade (V2) 0.3102 0.4144 0.6267 Yes
  • There are other Image Segmentation projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.
Document AI Projects

Multiclass Classification

Project Name Accuracy Macro F1 Score Macro Precision Macro Recall
Document Classification - Desafio_1 0.9865 0.9863 0.9870 0.9861
Document Classification RVL-CDIP 0.9767 0.9154 0.9314 0.9019
Real World Documents Collections 0.767 0.7704 0.7767 0.7707
Real World Documents Collections_v2 0.826 0.8242 0.8293 0.8237
Tobacco-Related Documents 0.7532 0.722 - -
Tobacco-Related Documents_v2 0.8666 0.8308 - -
Tobacco-Related Documents_v3 0.9419 0.9278 - -
Audio Projects
Project Name Project Type
Vinyl Scratched or Not Binary Audio Classification
Audio-Drum Kit Sounds Multiclass Audio Classification
Speech Emotion Detection Emotion Detection
Toronto Emotional Speech Set (TESS) Emotion Detection
ASR Speech Recognition Dataset Automatic Speech Recognition
Optical Character Recognition Projects
Project Name CER4
20,000 Synthetic Samples Dataset 0.0029
Captcha 0.0075
Handwriting Recognition (v1) 0.0533
Handwriting Recognition (v2) 0.0360
OCR License Plate Text Recognition 0.0368
Tesseract E13B 0.0036
Tesseract CMC7 0.0050

Footnotes:

Footnotes

  1. This project is part of a transformer comparison.

  2. Average Precision (AP) @[IoU=0.50:0.95 | area=all | maxDets=100]

  3. Average Recall (AR) @[IoU=0.50:0.95 | area=all | maxDets=100]

  4. CER stands for Character Error Rate.