Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEW - YOLOv8 🚀 Multi-Object Tracking #1429

Open
glenn-jocher opened this issue Mar 14, 2023 · 170 comments
Open

NEW - YOLOv8 🚀 Multi-Object Tracking #1429

glenn-jocher opened this issue Mar 14, 2023 · 170 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@glenn-jocher
Copy link
Member

YOLOv8 Multi-Object Tracking

Object tracking is a task that involves identifying the location and class of objects, then assigning a unique ID to that detection in video streams.

The output of tracker is the same as detection with an added object ID.

Available Trackers

The following tracking algorithms have been implemented and can be enabled by passing tracker=tracker_type.yaml

The default tracker is BoT-SORT.

Tracking

Use a trained YOLOv8n/YOLOv8n-seg model to run tracker on video streams.

Python

from ultralytics import YOLO

# Load a model
model = YOLO("yolov8n.pt")  # load an official detection model
model = YOLO("yolov8n-seg.pt")  # load an official segmentation model
model = YOLO("path/to/best.pt")  # load a custom model

# Track with the model
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", show=True) 
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", show=True, tracker="bytetrack.yaml") 

CLI

yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc"  # official detection model
yolo track model=yolov8n-seg.pt source=...   # official segmentation model
yolo track model=path/to/best.pt source=...  # custom model
yolo track model=path/to/best.pt  tracker="bytetrack.yaml" # bytetrack tracker

As in the above usage, we support both the detection and segmentation models for tracking and the only thing you need to do is loading the corresponding (detection or segmentation) model.

Configuration

Tracking

Tracking shares the configuration with predict, i.e conf, iou, show. More configurations please refer to predict page.
!!! example ""

Python

from ultralytics import YOLO

model = YOLO("yolov8n.pt")
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", conf=0.3, iou=0.5, show=True) 

CLI

yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" conf=0.3, iou=0.5 show

Tracker

We also support using a modified tracker config file, just copy a config file i.e. custom_tracker.yaml from ultralytics/tracker/cfg and modify any configurations(expect the tracker_type) you need to.

Python

from ultralytics import YOLO

model = YOLO("yolov8n.pt")
results = model.track(source="https://youtu.be/Zgi9g1ksQHc", tracker='custom_tracker.yaml') 

CLI

yolo track model=yolov8n.pt source="https://youtu.be/Zgi9g1ksQHc" tracker='custom_tracker.yaml'

Please refer to ultralytics/tracker/cfg page.

@glenn-jocher glenn-jocher added the documentation Improvements or additions to documentation label Mar 14, 2023
@glenn-jocher glenn-jocher pinned this issue Mar 14, 2023
@glenn-jocher glenn-jocher changed the title NEW - YOLOv8 Multi-Object Tracking NEW - YOLOv8 🚀 Multi-Object Tracking Mar 14, 2023
@akashAD98
Copy link
Contributor

@glenn-jocher its supporting .onnx, .trt or openvino.xml weights for tracking? instead of only .pt weight

@Laughing-q
Copy link
Member

@akashAD98 I haven't tested yet but technically track mode supports whatever format predict mode supports. So yes it's supporting .onnx, .trt and other formats.

@glenn-jocher
Copy link
Member Author

@akashAD98 @Laughing-q yes that's right! Tracking supports any predict or segment models in any of the following formats (TF.js is not supported for inference, but all other formats are).

Available YOLOv8 export formats are in the table below. You can predict, track or val directly on exported models,
i.e. yolo track model=yolov8n.onnx.

Format format Argument Model Metadata
PyTorch - yolov8n.pt
TorchScript torchscript yolov8n.torchscript
ONNX onnx yolov8n.onnx
OpenVINO openvino yolov8n_openvino_model/
TensorRT engine yolov8n.engine
CoreML coreml yolov8n.mlmodel
TF SavedModel saved_model yolov8n_saved_model/
TF GraphDef pb yolov8n.pb
TF Lite tflite yolov8n.tflite
TF Edge TPU edgetpu yolov8n_edgetpu.tflite
TF.js tfjs yolov8n_web_model/
PaddlePaddle paddle yolov8n_paddle_model/

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Mar 15, 2023

@zldrobit I've managed to include metadata additions into all YOLOv8 model formats above except for TF *.pb models. Do you know if this is possible for this format? The metadata is a dictionary here. For directory exports I simply place a metadata.yaml inside the directory, for file formats like ONNX I've found methods to embed the metadata dict inside the file.

self.metadata = {
'description': description,
'author': 'Ultralytics',
'license': 'GPL-3.0 https://ultralytics.com/license',
'version': __version__,
'stride': int(max(model.stride)),
'task': model.task,
'batch': self.args.batch,
'imgsz': self.imgsz,
'names': model.names} # model metadata

@akashAD98
Copy link
Contributor

akashAD98 commented Mar 24, 2023

@akashAD98 @Laughing-q yes that's right! Tracking supports any predict or segment models in any of the following formats (TF.js is not supported for inference, but all other formats are).

Available YOLOv8 export formats are in the table below. You can predict, track or val directly on exported models, i.e. yolo track model=yolov8n.onnx.

Format format Argument Model Metadata
PyTorch - yolov8n.pt
TorchScript torchscript yolov8n.torchscript
ONNX onnx yolov8n.onnx
OpenVINO openvino yolov8n_openvino_model/
TensorRT engine yolov8n.engine
CoreML coreml yolov8n.mlmodel
TF SavedModel saved_model yolov8n_saved_model/
TF GraphDef pb yolov8n.pb
TF Lite tflite yolov8n.tflite
TF Edge TPU edgetpu yolov8n_edgetpu.tflite
TF.js tfjs yolov8n_web_model/
PaddlePaddle paddle yolov8n_paddle_model/

if i pass openvino weights, its not support

model = YOLO("yolov8n.xml")

should i need to do different processing? @glenn-jocher

@glenn-jocher
Copy link
Member Author

@akashAD98 your openvino usage is not aligned with the usage example we've shown (and that's you've pasted).

@akashAD98
Copy link
Contributor

@akashAD98 your openvino usage is not aligned with the usage example we've shown (and that's you've pasted).

yes got it thanks

@akashAD98
Copy link
Contributor

@glenn-jocher tracker is not working for custom trained models,
I have trained the model,i have 20 classes & simply passed them to tracker & its gives error, while default weights are working fine

video 1/1 (1/1517) /home/ak/s/v8/notebook_office.mp4: 384x640 1 blackboard, 66.3ms
Traceback (most recent call last):
  File "yolov8_tracker.py", line 4, in <module>
    results = model.track(data='custom_data.yaml',source="notebook_office.mp4", conf=0.5, iou=0.5, show=False,device='CPU',project='anushka_yolov8Nano50conf',save=True)
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/ultralytics/yolo/engine/model.py", line 236, in track
    return self.predict(source=source, stream=stream, **kwargs)
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/ultralytics/yolo/engine/model.py", line 227, in predict
    return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/ultralytics/yolo/engine/predictor.py", line 114, in __call__
    return list(self.stream_inference(source, model))  # merge list of Result into one
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 64, in generator_context
    response = gen.send(request)
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/ultralytics/yolo/engine/predictor.py", line 177, in stream_inference
    self.run_callbacks('on_predict_postprocess_end')
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/ultralytics/yolo/engine/predictor.py", line 267, in run_callbacks
    callback(self)
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/ultralytics/tracker/track.py", line 35, in on_predict_postprocess_end
    tracks = predictor.trackers[i].update(det, im0s[i])
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/ultralytics/tracker/trackers/byte_tracker.py", line 212, in update
    warp = self.gmc.apply(img, dets)
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/ultralytics/tracker/utils/gmc.py", line 78, in apply
    return self.applySparseOptFlow(raw_frame, detections)
  File "/home/refx/DS/yolov7_tracker_yono/openvino_env/lib/python3.8/site-packages/ultralytics/tracker/utils/gmc.py", line 272, in applySparseOptFlow
    matchedKeypoints, status, err = cv2.calcOpticalFlowPyrLK(self.prevFrame, frame, self.prevKeyPoints, None)
cv2.error: OpenCV(4.7.0) /io/opencv/modules/video/src/lkpyramid.cpp:1260: error: (-215:Assertion failed) (npoints = prevPtsMat.checkVector(2, CV_32F, true)) >= 0 in function 'calc'

command im using

from ultralytics import YOLO

model = YOLO("best_custom20_yolov8s.pt")
results = model.track(data='custom_data.yaml',source="notebook_office.mp4", conf=0.5, iou=0.5, show=False,device='CPU',project='opvidetracker',save=True)

@Laughing-q
Copy link
Member

@akashAD98 I just tested with a custom trained model that detect human-head and it works fine me, no errors

from ultralytics import YOLO

model = YOLO("best.pt")
results = model.track(
    source="test.mp4",
    conf=0.5,
    iou=0.5,
    show=False,
    device="CPU",
    save=True,
)

pic-full-230325-1550-06

your error seems like a cv2 issue that related to the gmc module in trakcer, it's probably related to your detected boxes and original frame.

@akashAD98
Copy link
Contributor

@Laughing-q i tried to install ultrlytics & defult its using opencv,
also, i tested my weight on google collab,still, I'm getting the same error
for yolov8s,yolov8m its working fine.but for custom getting cv2 error

which OpenCV version i need to install,

error: OpenCV(4.7.0) /io/opencv/modules/video/src/lkpyramid.cpp:1260: error: (-215:Assertion failed) (npoints = prevPtsMat.checkVector(2, CV_32F, true)) >= 0 in function 'calc'

@Laughing-q
Copy link
Member

@akashAD98 I suppose the 4.7.0 in your error log is the version right? I'm also using this version.
image
We use bot-sort tracker as default which includes the gmc module but bytetrack do not have it. For this case that I'm not able to reproduce the error, I suggest you to use btyetrack for now.

@akashAD98
Copy link
Contributor

akashAD98 commented Mar 25, 2023

@Laughing-q thanks its working fine for bytetracker, for bot tracker im getting that error

also i have one quetions, by defult its taking all files from ultrlytics YOLO
if i want to modify any files, shoul i directly upload that file there & give path of that file? im correct

for the custom model, I need to pass
data.yaml file which contains names, for tracker also i can do the same

data.yaml file has my custom names of classes
model.track(data='data.yaml',tracker='bytetracker.yaml')

@zldrobit
Copy link
Contributor

@glenn-jocher

@zldrobit I've managed to include metadata additions into all YOLOv8 model formats above except for TF *.pb models. Do you know if this is possible for this format? The metadata is a dictionary here. For directory exports I simply place a metadata.yaml inside the directory, for file formats like ONNX I've found methods to embed the metadata dict inside the file.

self.metadata = {
'description': description,
'author': 'Ultralytics',
'license': 'GPL-3.0 https://ultralytics.com/license',
'version': __version__,
'stride': int(max(model.stride)),
'task': model.task,
'batch': self.args.batch,
'imgsz': self.imgsz,
'names': model.names} # model metadata

Sorry for the late reply. To the best of my knowledge, a GraphDef *.pb file does not include any meta information. It contains only the computation graph (network structure) and the name/weights of each node. There's no official tutorial of TensorFlow to add metadata in GraphDef *.pb files. However, it is possible to use protobuf api pb.MergeFrom(metadata) to merge metadata into pb (https://googleapis.dev/python/protobuf/latest/google/protobuf/any_pb2.html#google.protobuf.any_pb2.Any.MergeFrom).

@Laughing-q
Copy link
Member

@akashAD98 well you don't need to pass data to get the names when you do tracking, names would be remembered as an attribute of model while training and it would be read from model when you do tracking.

@mohamedamine99
Copy link

mohamedamine99 commented Mar 29, 2023

Greetings,

I have a question since this is my first time working with object trackers in general : Can YOLOv8 builtin trackers be used for multi-object tracking on video frames read by OpenCV?

I am trying to use the YOLOv8 Builtin Tracker for multi-object tracking, but I am unsure if it is possible to use it on video frames read one-by-one from OpenCV using cap.read() instead of a pre-existing full video or video stream.

I have searched the documentation and the GitHub repository for YOLOv8, but I could not find any information on this topic. I would appreciate it if you could clarify whether it is possible to use the YOLOv8 Builtin Tracker for multi-object tracking on video frames read by OpenCV.

Thank you for your time and attention. I look forward to your responses.

@glenn-jocher
Copy link
Member Author

Hello! Yes, you can use the YOLOv8 Builtin Tracker for multi-object tracking on video frames read by OpenCV. The tracker can be initialized on a single frame and then updated on subsequent frames.

Here is a brief overview of how you can do it:

  1. Initialize the detector and the tracker
  2. Read a frame from OpenCV using cap.read()
  3. Pass the frame through the detector and get the detections
  4. Pass the detections to the tracker and update the tracks
  5. Repeat steps 2-4 for each frame

Here is some sample code to get you started:

import cv2
from yolov5 import Detector
from yolov5.utils.bbox import xyxy2xywh
from yolov5.utils.tracker import Tracker

# Initialize the detector and the tracker
detector = Detector(weights='yolov5s.pt')
tracker = Tracker(threshold=0.5)

# Open the video capture
cap = cv2.VideoCapture('test.mp4')
while cap.isOpened():
    # Read a frame
    ret, frame = cap.read()
    if not ret:
        break

    # Pass the frame through the detector and get the detections
    detections = detector.detect(frame)

    # Pass the detections to the tracker and update the tracks
    tracker.update(xyxy2xywh(detections))

    # Draw the tracks on the frame
    for track in tracker.tracks:
        cv2.rectangle

@mohamedamine99
Copy link

Hello! Yes, you can use the YOLOv8 Builtin Tracker for multi-object tracking on video frames read by OpenCV. The tracker can be initialized on a single frame and then updated on subsequent frames.

Here is a brief overview of how you can do it:

  1. Initialize the detector and the tracker
  2. Read a frame from OpenCV using cap.read()
  3. Pass the frame through the detector and get the detections
  4. Pass the detections to the tracker and update the tracks
  5. Repeat steps 2-4 for each frame

Here is some sample code to get you started:

import cv2
from yolov5 import Detector
from yolov5.utils.bbox import xyxy2xywh
from yolov5.utils.tracker import Tracker

# Initialize the detector and the tracker
detector = Detector(weights='yolov5s.pt')
tracker = Tracker(threshold=0.5)

# Open the video capture
cap = cv2.VideoCapture('test.mp4')
while cap.isOpened():
    # Read a frame
    ret, frame = cap.read()
    if not ret:
        break

    # Pass the frame through the detector and get the detections
    detections = detector.detect(frame)

    # Pass the detections to the tracker and update the tracks
    tracker.update(xyxy2xywh(detections))

    # Draw the tracks on the frame
    for track in tracker.tracks:
        cv2.rectangle

Hi Glenn , thank you for your quick response. I appreciate your help. However, I noticed that the code you provided is for YOLOv5, while I am specifically looking to use the YOLOv8 Builtin Tracker. Is there an example with YOLOv8 trackers using the track() method as in model.track(....) ?

@glenn-jocher
Copy link
Member Author

@mohamedamine99 yes, apologies for the confusion please see https://docs.ultralytics.com/modes/track for Python tracker usage :)

@mohamedamine99
Copy link

@mohamedamine99 yes, apologies for the confusion please see https://docs.ultralytics.com/modes/track for Python tracker usage :)

thanks, I'll check it out

@glenn-jocher
Copy link
Member Author

You're welcome! Let us know if you have any further questions or concerns. We're always here to help.

@akashAD98
Copy link
Contributor

@glenn-jocher its possible to use tracker using cv2 ,instead of directly using model.track() which is already mentioned in document.

I want to print & get detail of each bounding box,score,label from tracker .using cv2 method. Thanks

@akashAD98
Copy link
Contributor

@glenn-jocher thanks same thing exactly i want to use it for yolov8 model & there tracker bot ,bytetravker . If I replace yolov5 with yolov8 weights will it work?

@glenn-jocher
Copy link
Member Author

Yes, if you replace the YOLOv5 weights with YOLOv8 weights, the model architecture should remain the same and the model should work as expected with the YOLOv8 weights. However, keep in mind that the performance and accuracy of the model may differ when using different weights. Also, make sure that the input image size and other hyperparameters are adjusted accordingly when switching between models.

@glenn-jocher
Copy link
Member Author

@RwGrid ah it's simpler than that, you can use model.track(), and if you want to specify a tracker you can use the tracker arg:

from ultralytics import YOLO

model = YOLO('yolov8n.pt')
results = model.track(source='video.mp4', tracker='bot_sort.yaml', stream=True)

for result in results:
    # process results...

@glenn-jocher
Copy link
Member Author

@luan1412167 sure, I'm happy to explain!

YOLO Nano, like other YOLO versions, is comprised of two main components: the backbone and the head.

The backbone is responsible for feature extraction from the input image. In YOLO Nano, the backbone architecture is smaller and more efficient compared to the larger versions like YOLOv8. It contains fewer convolutional layers to make the model lightweight, which is particularly suitable for edge devices that have lower computational resources.

On the other hand, the head, which takes the feature maps from the backbone and makes the actual predictions, undergoes a series of transformations. It involves several more convolutional layers, upsampling, and concatenation operations before finally converting those high-dimensional feature maps into the final output tensors that represent object bounding boxes, class scores, and objectness scores.

The specific layers that are responsible for the predictions are generally the last layers in the head. These layers transform the feature maps into the desired output shape and apply activation functions to ensure the output values represent valid probabilities and bounding box coordinates.

Unfortunately, without referring to the specific implementation or code of the YOLO Nano version, I cannot provide the exact number of layers in the backbone and the head. It can vary based on many factors including the specific YOLO Nano variation, feature extractor used, and any specific model optimizations applied.

I hope this gives you a clearer understanding of the architecture of YOLO Nano and its operations! Let me know if you need further clarification.

@luan1412167
Copy link

@luan1412167 sure, I'm happy to explain!

YOLO Nano, like other YOLO versions, is comprised of two main components: the backbone and the head.

The backbone is responsible for feature extraction from the input image. In YOLO Nano, the backbone architecture is smaller and more efficient compared to the larger versions like YOLOv8. It contains fewer convolutional layers to make the model lightweight, which is particularly suitable for edge devices that have lower computational resources.

On the other hand, the head, which takes the feature maps from the backbone and makes the actual predictions, undergoes a series of transformations. It involves several more convolutional layers, upsampling, and concatenation operations before finally converting those high-dimensional feature maps into the final output tensors that represent object bounding boxes, class scores, and objectness scores.

The specific layers that are responsible for the predictions are generally the last layers in the head. These layers transform the feature maps into the desired output shape and apply activation functions to ensure the output values represent valid probabilities and bounding box coordinates.

Unfortunately, without referring to the specific implementation or code of the YOLO Nano version, I cannot provide the exact number of layers in the backbone and the head. It can vary based on many factors including the specific YOLO Nano variation, feature extractor used, and any specific model optimizations applied.

I hope this gives you a clearer understanding of the architecture of YOLO Nano and its operations! Let me know if you need further clarification.

Thanks for your quick support. It seems not for my question. I really asked about ByteTrack with multiclass tracking.
#1429 (comment)

@glenn-jocher
Copy link
Member Author

Hi @luan1412167,

Thank you for your question! I'd be happy to provide some insights on the YOLOv8 Nano version.

The YOLOv8 Nano version, as the name suggests, is a smaller, lightweight model setup compared to its larger counterparts. The goal is to achieve decent performance on object detection tasks while being computationally efficient and hence suitable for edge devices or environments with limited computational resources.

In terms of architecture, YOLOv8 Nano usually consists of a backbone for feature extraction and a head for detecting objects and making predictions. The backbone is typically shallower compared to larger models due to the focus on computational efficiency.

The backbone usually consists of a smaller number of convolution layers, often coupled with attention mechanisms or other structure optimizations to maintain a balance between efficiency and performance. The exact count and configuration of the layers depends on the specific setup of YOLOv8 Nano.

The head of the network, responsible for object detection, performs the prediction task. The head typically comprises of additional layers that take features extracted from the backbone and generate object bounding box coordinates and class probabilities. Again, as a part of Nano setup, the design aims to be lightweight while keeping the model's predictive performance.

Unfortunately, I don't have the exact count of the layers in backbone and head of YOLOv8 Nano as it tends to vary slightly depending on the specific configuration.

I hope this general explanation gives you some idea about the architecture of YOLOv8 Nano version. If you have further questions, feel free to ask.

@phipsi369
Copy link

@glenn-jocher Quite new to Github, so i hope I am right here :)
I am using yolov8L to track cars in a video. However no matter how low i adjust my track_buffer there is always an ID regiven after a car leaves the picture and another car enters it (lets say even 5 seconds later). Since they are on the same Road ( going opposite directions) their tracks are nearly the same and the ID gets reused. Meaning I count one car instead of two . Would it be advisable to lower new_track_thresh close to 0( increasing the risk a car gets recognized 5 or more times) and make the match_thresh as close to 1 as possible?
Thanks for all your help guys

@glenn-jocher
Copy link
Member Author

Hi @phipsi369,

Thanks for your question, I'd be happy to provide more details about the structure of YOLOv8 Nano.

Similar to other versions, YOLOv8 Nano consists of a backbone and a head. The backbone is used for feature extraction and the head is responsible for making predictions, including bounding boxes for object detection and class probabilities.

For the Nano version, the backbone is designed to be simpler and lighter to run efficiently on devices with lower computational power. Instead of using a Darknet-53 architecture (which is used in larger YOLO versions), it uses a much smaller network. Exact number of layers in the backbone can vary depending on the specific implementation and configuration of YOLOv8 Nano, but it is designed to be minimal.

The head of YOLOv8 Nano is responsible for predicting the bounding boxes and class probabilities, similar to other YOLO versions. It is composed of few layers that take the output from the backbone, process it, and provide the final object detection results.

The model's prediction is based on the output from the head, which is processed from the features it received from the backbone. The backbone's purpose is to convert the input image into a rich set of features, and the head's purpose is to translate these features into detectable objects and their attributes.

The architecture of YOLOv8 Nano is designed to strike a good balance between inference speed and accuracy, allowing it to perform object detection reasonably well even on low-power devices.

I hope this provides some insight into the structure of YOLOv8 Nano. Please feel free to ask if there's anything else you'd like to know!

@phipsi369
Copy link

Thanks @glenn-jocher it somewhat helps. However I have no problems with detections. I am using the model 8L and tracking the movement and direction of cars. For that I am using ByteTrack so far. The detection works just fine, with some little tweaks to adjust. My problem lies in the right allocation of IDs by the Tracker. It often results in 2 or more cars being given the same ID, after they move through the picture one after the other. Since I am evaluating each car based on it's ID, it produces counting errors when the ID gets regiven. However I can't find a good adjustment for the ByteTrack.yaml and was wondering if you guys had an Idea what to change. I already lowered the track_buffer. However I think the problem lies in the Matching tracks, since every car is on the same road, their movement in the picture is nearly identical. Any Idea how to work around this? So far its giving me nearly 20% errors.

@glenn-jocher
Copy link
Member Author

Hi @phipsi369,

The YOLOv8 Nano is a more compact version of the YOLOv8 model designed to be computationally efficient while maintaining a good balance between speed and accuracy.

The backbone of YOLOv8 Nano typically consists of a modified version of the Darknet-53 architecture. However, it uses fewer layers to reduce the computational complexity and memory usage. The number of layers in the backbone can vary depending on the specific configuration, but it's generally significantly less than the 53 layers used in the full Darknet-53 architecture.

On the other hand, the head of the YOLOv8 Nano model, like the original YOLOv8 model, is responsible for predicting the object bounding boxes and class probabilities. It consists of convolutional, upsampling, and linear layers. It also utilizes multiple detection layers at different scales to increase the accuracy of detection for various object sizes.

The model's prediction layers are usually located within the head. These layers use the feature maps produced by the backbone and apply a set of transformations to generate the bounding box coordinates and class probability scores.

Please note that the precise model structure, including the number and type of layers in the backbone and head, can vary depending on the specific implementation and configuration of the YOLOv8 Nano model.

Thank you for your understanding and let me know if you need more information.

@20157m
Copy link

20157m commented Oct 10, 2023

Hello, I am a student in Japan. I am a beginner and my knowledge is limited, so sorry if my English is not correct.
I am interested in using YOLOv8 to detect and track fruit ripeness. I have prepared original data, trained a model, and successfully detected fruits using the trained model.
Learning

from ultralytics import YOLO
model = YOLO("yolov8l.pt")
model.train(data="dataset.yaml", epochs=135, batch=20, degrees=90.0)
results = model.val()

Detection

from ultralytics import YOLO

model= YOLO('yolov8l.pt') 
model = YOLO('last.pt') #my model

results = model.predict(source = 'mv_path', save=True, imgsz=1280, rect=True, save_txt = True)

However, the tracking does not work. I am running the following source code and the detection is done successfully, but the tracking is not. I would like to know how to solve this problem. Thank you in advance for your help.

from ultralytics import YOLO

model = YOLO('yolov8n.pt')
model = YOLO('last.pt') #my model 

results = model.track(source= 'mv_path', tracker=r'ultralytics\ultralytics\cfg\trackers\bytetrack.yaml', save=True, imgsz=1280, rect=True, save_txt = True)  # Tracking with ByteTrack tracker

I respect you guys for being able to create such a great system.
The version of python is 3.10.11
The version of ultralytics is 8.0.191

@breadrone
Copy link

@20157m can you show the error message?

@20157m
Copy link

20157m commented Oct 10, 2023

Thank you for your reply.
I am not getting any errors. What I am having trouble with now is that even if I run the source code that does the tracking, it does not track and I get the same result as when I run the normal object detection.

Tracking : results = model.track()
Normal : results = model.predict()

I ask because I am not sure if there is a mistake in the source code or in the preparation of the tracking.
I have found many examples using YOLOv8 where object detection is done with the original model, but I could not find any examples where tracking is done with the original model.I prepared about 500 images and trained them in YOLOv8.

@phipsi369
Copy link

When I have ID assignment errors, meaning more than one object gets the same ID( never happens with both objects in the frame). So lets say Object 1 is in the picture, assigned ID: 1. Then object 1 leaves the picture and about 5 seconds later Object 2 enters the frame and gets assigned ID: 1 aswell. I tried lowering the Track_Buffer in the botsort.yaml aswell as lowering the new_track_thresh, but it doesnt change the example. At one Point there is three Objects in the picture within 20 seconds all being assigned the same ID. Someone pls help :(

@glenn-jocher
Copy link
Member Author

Hi @20157m,

Happy to provide some insight into the structure of the YOLOv8 Nano version!

The exact architecture can depend on the particular configuration you're using, but generally, YOLO Nano has a lightweight and efficient structure suitable for deployments on systems with limited computational resources, like mobile or edge devices.

The backbone, or feature extractor, in YOLO Nano typically has a significantly reduced number of layers compared to the larger YOLO versions. This can sometimes be a combination of convolutional and shortcut (residual) layers, similar to a miniaturized version of Darknet but with fewer layers.

The head, which is responsible for making the final object detections, consists of a few additional layers. These layers would include a series of convolutional layers, up-sampling and concatenation operations to combine feature maps, and final detection layers. These final layers are where the network outputs bounding box coordinates, objectness scores, and class probabilities.

Unfortunately, without an explicit model configuration at hand, it's not feasible to provide specific details about the exact count of layers, or which layers are specifically responsible for predictions.

Remember that the exact structure will depend on the specific configuration of YOLOv8 Nano that you are using, and I would recommend reviewing the configuration file for your model to understand its structure in detail.

I hope this gives you some general understanding of the architecture of YOLO Nano! Let me know if you have more questions.

@20157m
Copy link

20157m commented Oct 10, 2023

Hi @glenn-jocher,

Thank you for your time.
It is quite difficult to understand the content. It seems that I still need more training.

I will explain my recent situation. Sorry if I have said anything strange.
First I have created a delectry with this structure.

fruit/
    └ripeness.yaml
    ├images/
        ├train/
            └image.png (350 image : There are five levels of maturity, each with 70 images.)
        └val/
           └image.png (150 image : There are five levels of maturity, each with 30 images.)
    ├label/
        ├train/
            └label.txt (I used labelImg to create annotation files (.txt) corresponding to the images. 350 txt)
        └val/
           └label.txt (I used labelImg to create annotation files (.txt) corresponding to the images. 150 txt)

Using these data, we trained on the yolov8l.pt model in the same way as the code in the previous question(#1429 (comment)). I got the last.pt as a result and tried to see if I could actually use it to extract the fruit in the video. The result was not very accurate, but we were able to extract the fruits well, and we were able to have it judge the degree of ripeness in 5 levels.

Next we went to #1429 (comment) and wanted to track with the original model.Then I ran into a problem.

from ultralytics import YOLO

model = YOLO('yolov8n.pt')
model = YOLO('last.pt') #my model 

results = model.track(source= 'mv_path', tracker=r'ultralytics\ultralytics\cfg\trackers\bytetrack.yaml', save=True, imgsz=1280, rect=True, save_txt = True)  # Tracking with ByteTrack tracker

If I run the code as above, it is not tracked and the result is saved with only labels without IDs, etc., as it is with model.predict().Is my approach to tracking wrong?Must other learning methods be used when tracking?

I apologize if my understanding has not been up to par and I have not conveyed the necessary information.I am concerned that I have failed to present a clear model.

@glenn-jocher
Copy link
Member Author

Hi @20157m,

Great question! The YOLO Nano version is a smaller and more efficient version of the YOLO model with fewer layers, which was developed keeping edge devices with lower computational power in mind.

The backbone of YOLO Nano is much smaller than the standard model. It is a modified version of the DarkNet architecture. Instead of using the DarkNet-53 like the original YOLO, YOLO Nano has fewer layers to keep the model more lightweight, while still effectively extracting features from the input image.

The head of the YOLO Nano includes the layers that are responsible for object detection, predicting the bounding boxes and their respective class probabilities. Similar to the backbone, the head of YOLO Nano is also smaller and efficient compared to larger YOLO models.

As for the specific layers responsible for predictions, it isn't very straightforward because the entire model (including both backbone and head) works together to make predictions. Higher layers in the network handle broader features, while lower layers handle details. Each part of the structure contributes to the prediction process — the backbone extracts features from the images, which the head then uses to detect objects and calculate bounding boxes and confidences.

Remember that the exact number of the layers in the backbone and head can vary depending on the specific version and configuration of YOLO Nano you're using.

I hope this helps clarify your understanding of the YOLO Nano architecture! Let me know if you have any other questions.

@glenn-jocher
Copy link
Member Author

Hi @breadrone,

I'm glad you found the previous explanation helpful. Let me provide some details about the YOLO Nano version.

The YOLO Nano is a lightweight, efficient version of the YOLO model, specifically designed for edge computing devices with limited computational resources.

Similar to its counterparts, YOLO Nano also has a backbone and a head. The backbone is responsible for extracting features from the input image, and the head utilizes these features to make the final object detection predictions.

In terms of structure, the backbone of YOLO Nano is generally composed of several convolutional layers. It's important to note that the exact number of layers can vary depending on the specific implementation of YOLO Nano you are using. Typically, YOLO Nano employs a lot fewer layers in its backbone compared to the standard YOLO models in order to reduce computational complexity.

The head of YOLO Nano is responsible for predicting the bounding boxes and class probabilities. It contains additional layers that process the features extracted by the backbone to generate the final detections. These layers typically include a combination of convolutional, upsampling, and linear layers.

Unfortunately, without the specific implementation details, I cannot provide the exact count of layers and their functionalities. I would recommend taking a look at the architecture diagram of the YOLO Nano version you are using for a thorough understanding of its structure and specific layer responsibilities.

I hope this provides a clearer picture of the YOLO Nano structure! If you have any more questions, feel free to ask.

@glenn-jocher
Copy link
Member Author

Hello @phipsi369,

I'm glad to hear that you found the previous explanation helpful. Let's dive a bit into the YOLOv8 Nano version.

The YOLOv8 Nano model has a simplified and lightweight design to support computer vision tasks on devices with limited computational resources. As with the standard YOLOv8, the YOLOv8 Nano includes a backbone and a head.

The backbone is responsible for feature extraction and is generally simpler than the one used in the full-sized model. Although the exact number of layers can vary, the backbone usually comprises of a series of reduced convolutional layers.

For the head of the YOLOv8 Nano model, it is designed to detect and classify objects in the input. Similar to the backbone, the head is streamlined and smaller in comparison to the full-sized model.

About the specific layers responsible for predictions, the output layer at the very end of the network produces the final bounding boxes and class probabilities. Each bounding box includes coordinates (x, y, width, height), an objectness score, and class probabilities.

The exact configuration of layers in both the backbone and head can vary depending on different factors, including the characteristics and requirements of your specific application.

Remember that while the YOLOv8 Nano is smaller and faster, it generally won't perform as well as larger models in terms of accuracy due to its reduced complexity. However, it may provide a more efficient solution for scenarios where resources are constrained or where speed is of greater importance.

I hope that helps! Let me know if you need more information or have any follow-up questions.

@glenn-jocher
Copy link
Member Author

Hello @20157m,

YOLOv8 Nano, like all YOLO models, is a variant of the YOLO family optimised for smaller devices, hence the "Nano" specification. The architecture of YOLOv8 Nano, similar to other YOLO models, can be broken down into two main components: the backbone and the head.

The backbone is responsible for feature extraction. It consists of multiple convolutional layers aimed at extracting features from the input images at various spatial resolutions. While exact numbers can vary depending upon the specific implementation or customization, typically, the backbone could consist of tens of layers, bringing the total in the range of 45-53 in some cases.

The head is the part responsible for prediction. It processes the features extracted by the backbone to perform the final object detection. The head includes additional layers that use the convolutional features to predict the bounding boxes and the class probabilities for every detected object in the image. There might be fewer layers in the head as compared to the backbone, and also includes upsampling layers for feature map resolution recovery, and further convolutional layers directly responsible for bounding box and class predictions.

The specific functionality of layers could include standard convolutional layers for feature extraction, pooling layers for downsampling the feature maps, normalization layers such as batch normalization for accelerating training and reducing overfitting, and the final detection layers which make the predictions based on these extracted features.

That said, the specifics of the layers within the model architecture can range depending on factors related to the exact implementation of YOLOv8 Nano or even potential training requirements and constraints.

I hope this provides a good high-level overview of the YOLOv8 Nano's architecture! Feel free to ask if you have more questions or need further clarification on any points.

@bharath5673
Copy link

@akashAD98 I just tested with a custom trained model that detect human-head and it works fine me, no errors

from ultralytics import YOLO

model = YOLO("best.pt")
results = model.track(
    source="test.mp4",
    conf=0.5,
    iou=0.5,
    show=False,
    device="CPU",
    save=True,
)

pic-full-230325-1550-06

your error seems like a cv2 issue that related to the gmc module in trakcer, it's probably related to your detected boxes and original frame.

lol lisa ))) <3

@glenn-jocher
Copy link
Member Author

Hi @bharath5673,

Thank you for your question! The YOLOv8 Nano, like its counterparts, follows a similar architectural layout, comprising a backbone and a detection head.

The backbone of YOLOv8 Nano is responsible for feature extraction. It consists of fewer layers compared to other YOLOv8 versions, making it highly suitable for resource-limited devices. The number of layers can vary depending on the specific implementation of YOLOv8 Nano, but typically it has fewer convolutional layers compared to other versions like YOLOv8 or YOLOv8-large.

The detection head of YOLOv8 Nano is where the actual prediction happens. It takes feature maps from the backbone, and through its layers, generates the bounding boxes, objectness scores, and class predictions. As with the backbone, the number of layers depends on the specific implementation but typically would include convolution layers and upsampling layers.

To visualize the exact architecture and to see the specific layers that are involved in prediction, you can check the configuration (.cfg) file for YOLOv8 Nano. This file provides a detailed layer-by-layer structure of the model, so you can see the specifics of the backbone and the detection head, including types of layers, their order, and their hyperparameters.

I hope this gives you a better understanding of YOLOv8 Nano's structure. Don't hesitate to reach out if you have more questions on this!

@ArmandoFuentes99
Copy link

Hello, I'm kinda new to training computer vision models.
I'd like to know if there's any place where i can find a step-by-step tutorial on how to custom train a YOLOv8 model for multi-object tracking.
I'd also like to know how my labels and attributes should be for my training data i want the model train on for my custom case.

@glenn-jocher
Copy link
Member Author

Hi @ArmandoFuentes99,

Thank you for your question. YOLO Nano is a smaller and more efficient version of the YOLO models, specifically designed for edge devices with limited computational resources.

The YOLO Nano architecture consists of a backbone and a head, similar to other YOLO models.

The backbone is responsible for feature extraction. It's a smaller and streamlined version compared to larger YOLO models and is designed to be very efficient. The exact number of layers can vary, but it typically contains fewer convolutional layers to reduce the model's complexity and computational requirements.

The head of the model is responsible for detecting objects and predicting bounding boxes and class probabilities based on the feature maps produced by the backbone. The head also includes several layers, however, it's less complex than the larger YOLO models. The layers in the head include convolutional layers for bounding box regression and class prediction, as well as additional layers for other tasks like anchor box assignment and non-maximum suppression.

Regarding the specific layers responsible for predictions, it's typically the final layers in the head. These layers take the feature maps from the backbone and generate the final predictions for the bounding box coordinates and class probabilities.

The YOLO Nano architecture is designed to strike a balance between accuracy and efficiency, making it suitable for edge devices and real-time applications where resource constraints are a major consideration.

I hope this provides a basic understanding of the YOLO Nano model structure. Do let me know if you need further clarification.

@Azer2401
Copy link

hi,
How can I implement object tracking using DeepSORT to count fruits and deploy it to Android? I have previously created a YOLOv8s model for fruit detection and deployed it to Android using ONNX > NCNN.

@karimda
Copy link

karimda commented Jan 1, 2024

Hello! Yes, you can use the YOLOv8 Builtin Tracker for multi-object tracking on video frames read by OpenCV. The tracker can be initialized on a single frame and then updated on subsequent frames.

Here is a brief overview of how you can do it:

  1. Initialize the detector and the tracker
  2. Read a frame from OpenCV using cap.read()
  3. Pass the frame through the detector and get the detections
  4. Pass the detections to the tracker and update the tracks
  5. Repeat steps 2-4 for each frame

Here is some sample code to get you started:

import cv2
from yolov5 import Detector
from yolov5.utils.bbox import xyxy2xywh
from yolov5.utils.tracker import Tracker

# Initialize the detector and the tracker
detector = Detector(weights='yolov5s.pt')
tracker = Tracker(threshold=0.5)

# Open the video capture
cap = cv2.VideoCapture('test.mp4')
while cap.isOpened():
    # Read a frame
    ret, frame = cap.read()
    if not ret:
        break

    # Pass the frame through the detector and get the detections
    detections = detector.detect(frame)

    # Pass the detections to the tracker and update the tracks
    tracker.update(xyxy2xywh(detections))

    # Draw the tracks on the frame
    for track in tracker.tracks:
        cv2.rectangle
```Please, tell me how adapt ,this code to yolov8. Because , I have tried but in vain. Can you help me , please. Thank you.

@hidetsuji
Copy link

hi
I am tracking fish movements with YOLOv8 tracking. However, it seems that although detection is occurring, it's not being tracked with unique IDs. This might be because there's not much overlap between frames due to the fast movements of the fish. Would increasing the size of the bounding box increase the overlap between frames and improve tracking? If so, how should I enlarge the bounding box?

@Crear12
Copy link

Crear12 commented Feb 27, 2024

Hi,
I'm using yolo track with save_txt. I expect the tracking results to be written into one file, however it generates one file per frame...Is there any way to change this behavior?

Thanks

@martin0496
Copy link

@zldrobit I've managed to include metadata additions into all YOLOv8 model formats above except for TF *.pb models. Do you know if this is possible for this format? The metadata is a dictionary here. For directory exports I simply place a metadata.yaml inside the directory, for file formats like ONNX I've found methods to embed the metadata dict inside the file.

self.metadata = {
'description': description,
'author': 'Ultralytics',
'license': 'GPL-3.0 https://ultralytics.com/license',
'version': __version__,
'stride': int(max(model.stride)),
'task': model.task,
'batch': self.args.batch,
'imgsz': self.imgsz,
'names': model.names} # model metadata

Hi @glenn-jocher ,
in the case of engine format how to have access to the mtadata ?

Thank you

@AliasChenYi
Copy link

I have reproduced the result of yolov8s-pose, and the experimental data is very different from yours, with map50 dropping 20% points. My code and data are exactly the same as what you described, and the operation steps are also strictly followed by yours. Therefore, I may have a problem with hyperparameter setting.
Here are my hyperparameter Settings:
task: pose
mode: train
model: ultralytics/cfg/models/v8pose/yolov8-pose.yaml
data: ultralytics/cfg/datasets/coco-pose.yaml
epochs: 300
time: null
patience: 50
batch: 310
imgsz: 640
save: true
save_period: -1
cache: true
device: '0'
workers: 20
project: runs/train
name: yolov8s-pose
exist_ok: false
pretrained: true
optimizer: SGD
verbose: false
seed: 0
deterministic: true
single_cls: true
rect: false
cos_lr: false
close_mosaic: 10
resume: false
amp: true
fraction: 1.0
profile: false
freeze: null
multi_scale: false
overlap_mask: true
mask_ratio: 4
dropout: 0.0
val: true
split: val
save_json: false
save_hybrid: false
conf: null
iou: 0.7
max_det: 300
half: false
dnn: false
plots: true
source: null
vid_stride: 1
stream_buffer: false
visualize: false
augment: false
agnostic_nms: false
classes: null
retina_masks: false
embed: null
show: false
save_frames: false
save_txt: false
save_conf: false
save_crop: false
show_labels: true
show_conf: true
show_boxes: true
line_width: null
format: torchscript
keras: false
optimize: false
int8: false
dynamic: false
simplify: false
opset: null
workspace: 4
nms: false
lr0: 0.01
lrf: 0.01
momentum: 0.937
weight_decay: 0.0005
warmup_epochs: 3.0
warmup_momentum: 0.8
warmup_bias_lr: 0.1
box: 7.5
cls: 0.5
dfl: 1.5
pose: 12.0
kobj: 1.0
label_smoothing: 0.0
nbs: 64
hsv_h: 0.015
hsv_s: 0.7
hsv_v: 0.4
degrees: 0.0
translate: 0.1
scale: 0.5
shear: 0.0
perspective: 0.0
flipud: 0.0
fliplr: 0.5
mosaic: 0.5
mixup: 0.2
copy_paste: 0.0
auto_augment: randaugment
erasing: 0.4
crop_fraction: 1.0
cfg: null
tracker: botsort.yaml
save_dir: runs/train/yolov8s-pose

@MlLearnerAkash
Copy link

Thank you @glenn-jocher for such a great open source work.
But after going through the above discussion, I couldn't find one question.
That is: if it is required to train any ReId model like Fast ReID model on custom dataset for tracking that is usually being done for using byteTrack or BotTrack?

Because, it is quite necessary for using custom training.

@bsljsljy
Copy link

How to use tensorboard to check bn_weight in yolov8.

@Vikas-ABD
Copy link

i want to do yolov8 tracking using .onnx model .can anyone help me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests