Performance improvement to improve training time #11718

edkazcarlson · 2024-05-07T04:30:26Z

In this PR I am cleaning up some existing code to help improve training times.
The main speedups come from not running transforms that are passed with a 0% chance of applying their transform but I also introduced jit compiled methods with the numba library to help handle some operations that are ran frequently.
If the team doesn't want to use numba (code clutter, licensing, etc) I'll remove as while they did introduce some speedup, a majority of the speedup was from the changes to the image transformations.

Tested with the following code

start_time = time.time()

modelName = f'yolov8n.yaml' 
overrides = {'epochs': 6, 'imgsz': 640, 'data': 'coco.yaml', 'model': modelName, 'batch': 8, 'close_mosaic': 3}
trainer = DetectionTrainer(overrides=overrides)
trainer.train()

end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time: {execution_time} seconds")

Overall with my changes it took 9754 seconds to train for 6 epochs (3 with mosaic, 3 without) where it would take 10531 without my changes.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Optimizations and improvements in YOLOv8 model processing and data augmentation techniques.

📊 Key Changes

Modified bounding box scaling to be more flexible and specific.
Implemented enhancements in data augmentation through numba for faster execution.
Streamlined and optimized various operations (flipping, translation, etc.) using numba.
Adjustments in dataset handling for more efficient processing and augmentation application.

🎯 Purpose & Impact

Enhanced Performance: The use of numba for just-in-time compilation significantly speeds up data processing, especially in augmentation tasks, leading to reduced training times.
Improved Accuracy: By refining how bounding boxes and images are scaled and transformed, the model can potentially achieve better training accuracy.
Flexible Data Handling: Changes in the way datasets and images are augmented allow for more complex and varied transformations, which can help the model generalize better over diverse data sets.

glenn-jocher · 2024-05-07T06:36:54Z

Thanks for this pull request and your efforts to enhance the training performance of the YOLOv8 models! 🚀 The optimizations to the image transformations seem particularly promising as they effectively cut down operational redundancy. Using numba for jit compilation is an intriguing choice, and while it does add external dependency, the speed improvements you've observed might justify its use.

Could you provide some benchmarks comparing the performance with and without the use of numba? This data could help in making a more informed decision about integrating numba more broadly within the codebase.

Here’s a slight modification to the test snippet to include timing for both scenarios:

import time
import numba  # Remove if not using numba

def train_model(use_numba):
    start_time = time.time()

    modelName = 'yolov8n.yaml'
    overrides = {'epochs': 6, 'imgsz': 640, 'data': 'coco.yaml', 'model': modelName, 'batch': 8, 'close_mosaic': 3}
    trainer = DetectionTrainer(overrides=overrides)
    trainer.train()

    end_time = time.time()
    execution_time = end_time - start_time
    print(f"Execution time with{'out' if not use_numba else ''} numba: {execution_time} seconds")

train_model(use_numba=True)
train_model(use_numba=False)

This modification helps to directly compare the impact of numba on the overall training process. Looking forward to your findings!

glenn-jocher · 2024-05-08T14:08:49Z

@Laughing-q interesting training speedup PR. I think we'd like to merge this without the numba addition as it may be hardware specific and we'd strongly prefer to avoid adding additional dependencies.

codecov · 2024-05-09T01:57:24Z

Codecov Report

Attention: Patch coverage is 78.43137% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 70.54%. Comparing base (b87ea6a) to head (4f1f18c).

❗ Current head 4f1f18c differs from pull request most recent head 92c63fe. Consider uploading reports for the commit 92c63fe to get more accurate results

Files	Patch %	Lines
ultralytics/utils/instance.py	63.63%	8 Missing ⚠️
ultralytics/data/augment.py	94.73%	1 Missing ⚠️
ultralytics/utils/loss.py	0.00%	1 Missing ⚠️
ultralytics/utils/ops.py	87.50%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main   #11718       +/-   ##
===========================================
+ Coverage   37.29%   70.54%   +33.24%     
===========================================
  Files         122      122               
  Lines       15635    15660       +25     
===========================================
+ Hits         5831    11047     +5216     
+ Misses       9804     4613     -5191

Flag	Coverage Δ
Benchmarks	`35.31% <39.21%> (?)`
GPU	`37.27% <31.37%> (-0.03%)`	⬇️
Tests	`66.73% <78.43%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

edkazcarlson · 2024-05-09T20:20:59Z

slightly slower than before when removing numba but still faster than the current state of main

glenn-jocher · 2024-05-10T06:47:30Z

Thanks for the update! It's great to hear that the performance improved even without numba. If you can share the specific metrics or any additional insights from your latest tests, that would be helpful for finalizing the merge. Let's aim for the best balance of dependency minimization and performance enhancement. 🚀

edkazcarlson · 2024-05-11T03:57:49Z

Thanks for the update! It's great to hear that the performance improved even without numba. If you can share the specific metrics or any additional insights from your latest tests, that would be helpful for finalizing the merge. Let's aim for the best balance of dependency minimization and performance enhancement. 🚀

In short my main changes currently are:

Taking better advantage of vectorized methods in order to boost perf
Not applying transforms if they are passed a 0% probability of running, thus saving us (# of transforms x images x epochs x cpu clocks it takes to run random.random())

I don't have any specific metrics outside of wallclock time for the epochs. Is there something specific the team would want?
Thanks :)

glenn-jocher · 2024-05-11T07:17:45Z

Thanks for detailing the changes! The approach sounds solid, particularly your method to skip transformations when their probability is zero—definitely a smart optimization. 😊

For metrics, if you could provide us with a comparison in wall-clock time between the main branch and your changes (i.e., how much total time each epoch takes on average) across a few runs, that would be ideal. This will help us quantitatively assess the impact of your improvements.

Keep up the fantastic work! Looking forward to integrating these enhancements. 🌟

Laughing-q · 2024-05-13T13:00:29Z

@edkazcarlson Hi, thanks for the PR!
I tested your changes locally but the results I got are almost the same as main branch. Here's my results:
on main branch:

on current PR:

And my testing command:

yolo train detect data=coco.yaml model=yolov8m.yaml batch=64 epochs=4 close_mosaic=2 device=0,1,2,3

edkazcarlson · 2024-05-15T04:15:12Z

@Laughing-q Thank you for the tests on your hardware, could you try doing this through python itself? Not sure where exactly the yolo command is pointing, could it be that it's pointing at the install you have through pip and not my branch (can you confirm through using which ?)
I haven't had much of a chance between work and other stuff so haven't gotten much of a chance to do in depth tests but from what I can see just initially comparing my old tests to some new perf tests it seems like the new main is slower, checking out an old commit (ex: 1365fe9 ) seems to be faster than the branch thats merged w/ main.
I plan to keep investigating this slowdown in the merged commits but should we merge this into main for the time being just so this branch doesnt get stale? I don't see this PR hurting perf in any way (can change title just for better record keeping)

edkazcarlson · 2024-05-15T20:28:57Z

Also side question, I know numba got denied due to hardware compat reasons, but does the team accept cython improvements?

glenn-jocher · 2024-05-20T06:22:26Z

Hi there! Thanks for your continued contributions and for checking in about Cython. Yes, we're open to considering Cython improvements as they can be a great way to enhance performance while maintaining compatibility across various hardware setups. If you have specific optimizations in mind using Cython, feel free to share them or open a PR. We'd love to take a look! 🚀

edkazcarlson and others added 30 commits December 11, 2023 20:39

push prog

7499ddb

stash

4a7e4de

re add basic torch transforms

8baee42

backup

3a1b098

stash

e437c44

stash

7b9f5a3

stash

bd779a0

allow for 3+ channel inputs, fixed gitignores

f679f1b

stash

cdd3243

stash

78e4e4f

stash

894dc12

working on local, 20 min epochs

f8e684c

merge w main

0f662f4

stash

08ef26f

stash

37a5879

stash

1edf71f

perm check

6424d14

stash

54ee366

space

4fabad1

Auto-format by https://ultralytics.com/actions

a312520

stash

7d974b2

initial round of numba

65dd749

update mosaic to use jit compilation

7e10775

re align changes

4c77816

Merge branch 'main' into user/ecarlson/isolated-numba-improvements

0d6093a

merge w main

bb5a2e1

update stragglers

7fde974

finish other mosaic numbaing

394b410

working across all 3 4 (

4c46b2e

clean up prints

75ad712

UltralyticsAssistant and others added 2 commits May 7, 2024 04:30

Auto-format by https://ultralytics.com/actions

9bdacb2

Merge branch 'main' into user/ecarlson/isolated-numba-improvements

cc59d80

glenn-jocher added 2 commits May 8, 2024 15:24

Merge branch 'main' into user/ecarlson/isolated-numba-improvements

e964638

Merge branch 'main' into user/ecarlson/isolated-numba-improvements

8f60896

glenn-jocher added the TODO Items that needs completing label May 8, 2024

edkazcarlson and others added 4 commits May 8, 2024 18:19

revert numba

6514f8f

Auto-format by https://ultralytics.com/actions

60d44c4

merge w upstream

c06d3c2

Auto-format by https://ultralytics.com/actions

8c8f57d

updating ops to use np pi

1365fe9

Burhan-Q added the enhancement New feature or request label May 10, 2024

glenn-jocher and others added 5 commits May 11, 2024 19:26

Merge branch 'main' into user/ecarlson/isolated-numba-improvements

890460d

Update default.yaml

f01d68e

Merge branch 'main' into user/ecarlson/isolated-numba-improvements

d398d6f

Merge branch 'main' into user/ecarlson/isolated-numba-improvements

b46a882

Merge branch 'main' into user/ecarlson/isolated-numba-improvements

4f1f18c

glenn-jocher removed the TODO Items that needs completing label May 15, 2024

Merge branch 'main' into user/ecarlson/isolated-numba-improvements

92c63fe

edkazcarlson changed the title ~~Performance improvement to improve training time by up to ~14%~~ Performance improvement to improve training time May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement to improve training time #11718

Performance improvement to improve training time #11718

edkazcarlson commented May 7, 2024 •

edited by github-actions bot

glenn-jocher commented May 7, 2024

glenn-jocher commented May 8, 2024

codecov bot commented May 9, 2024 •

edited

edkazcarlson commented May 9, 2024

glenn-jocher commented May 10, 2024

edkazcarlson commented May 11, 2024

glenn-jocher commented May 11, 2024

Laughing-q commented May 13, 2024 •

edited

edkazcarlson commented May 15, 2024 •

edited

edkazcarlson commented May 15, 2024

glenn-jocher commented May 20, 2024

Performance improvement to improve training time #11718

Are you sure you want to change the base?

Performance improvement to improve training time #11718

Conversation

edkazcarlson commented May 7, 2024 • edited by github-actions bot

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented May 7, 2024

glenn-jocher commented May 8, 2024

codecov bot commented May 9, 2024 • edited

Codecov Report

edkazcarlson commented May 9, 2024

glenn-jocher commented May 10, 2024

edkazcarlson commented May 11, 2024

glenn-jocher commented May 11, 2024

Laughing-q commented May 13, 2024 • edited

edkazcarlson commented May 15, 2024 • edited

edkazcarlson commented May 15, 2024

glenn-jocher commented May 20, 2024

edkazcarlson commented May 7, 2024 •

edited by github-actions bot

codecov bot commented May 9, 2024 •

edited

Laughing-q commented May 13, 2024 •

edited

edkazcarlson commented May 15, 2024 •

edited