New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement to improve training time #11718
base: main
Are you sure you want to change the base?
Performance improvement to improve training time #11718
Conversation
Thanks for this pull request and your efforts to enhance the training performance of the YOLOv8 models! 🚀 The optimizations to the image transformations seem particularly promising as they effectively cut down operational redundancy. Using numba for jit compilation is an intriguing choice, and while it does add external dependency, the speed improvements you've observed might justify its use. Could you provide some benchmarks comparing the performance with and without the use of numba? This data could help in making a more informed decision about integrating numba more broadly within the codebase. Here’s a slight modification to the test snippet to include timing for both scenarios: import time
import numba # Remove if not using numba
def train_model(use_numba):
start_time = time.time()
modelName = 'yolov8n.yaml'
overrides = {'epochs': 6, 'imgsz': 640, 'data': 'coco.yaml', 'model': modelName, 'batch': 8, 'close_mosaic': 3}
trainer = DetectionTrainer(overrides=overrides)
trainer.train()
end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time with{'out' if not use_numba else ''} numba: {execution_time} seconds")
train_model(use_numba=True)
train_model(use_numba=False) This modification helps to directly compare the impact of numba on the overall training process. Looking forward to your findings! |
@Laughing-q interesting training speedup PR. I think we'd like to merge this without the |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11718 +/- ##
===========================================
+ Coverage 37.29% 70.54% +33.24%
===========================================
Files 122 122
Lines 15635 15660 +25
===========================================
+ Hits 5831 11047 +5216
+ Misses 9804 4613 -5191
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks for the update! It's great to hear that the performance improved even without numba. If you can share the specific metrics or any additional insights from your latest tests, that would be helpful for finalizing the merge. Let's aim for the best balance of dependency minimization and performance enhancement. 🚀 |
In short my main changes currently are:
I don't have any specific metrics outside of wallclock time for the epochs. Is there something specific the team would want? |
Thanks for detailing the changes! The approach sounds solid, particularly your method to skip transformations when their probability is zero—definitely a smart optimization. 😊 For metrics, if you could provide us with a comparison in wall-clock time between the main branch and your changes (i.e., how much total time each epoch takes on average) across a few runs, that would be ideal. This will help us quantitatively assess the impact of your improvements. Keep up the fantastic work! Looking forward to integrating these enhancements. 🌟 |
@edkazcarlson Hi, thanks for the PR! yolo train detect data=coco.yaml model=yolov8m.yaml batch=64 epochs=4 close_mosaic=2 device=0,1,2,3 |
@Laughing-q Thank you for the tests on your hardware, could you try doing this through python itself? Not sure where exactly the yolo command is pointing, could it be that it's pointing at the install you have through pip and not my branch (can you confirm through using |
Also side question, I know numba got denied due to hardware compat reasons, but does the team accept cython improvements? |
Hi there! Thanks for your continued contributions and for checking in about Cython. Yes, we're open to considering Cython improvements as they can be a great way to enhance performance while maintaining compatibility across various hardware setups. If you have specific optimizations in mind using Cython, feel free to share them or open a PR. We'd love to take a look! 🚀 |
In this PR I am cleaning up some existing code to help improve training times.
The main speedups come from not running transforms that are passed with a 0% chance of applying their transform but I also introduced jit compiled methods with the numba library to help handle some operations that are ran frequently.
If the team doesn't want to use numba (code clutter, licensing, etc) I'll remove as while they did introduce some speedup, a majority of the speedup was from the changes to the image transformations.
Tested with the following code
Overall with my changes it took 9754 seconds to train for 6 epochs (3 with mosaic, 3 without) where it would take 10531 without my changes.
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Optimizations and improvements in YOLOv8 model processing and data augmentation techniques.
📊 Key Changes
🎯 Purpose & Impact