A Comparative Analysis of State-of-the-Art Object Detection Models

1. Introduction

1.1 Object Detection: A Foundational Computer Vision Task

Object detection stands as a cornerstone task within computer vision, aiming to identify and precisely locate object instances belonging to predefined categories within digital images or video streams.¹ This task inherently combines object classification (determining the category of an object) and object localization (determining the spatial extent, typically via a bounding box).¹ Its significance is underscored by its wide-ranging applicability across numerous domains, including autonomous driving systems for recognizing pedestrians, vehicles, and obstacles ², video surveillance for security and monitoring ², robotics for environment interaction and navigation ², medical imaging analysis ², retail automation ², and industrial quality control.²

1.2 The Evolving Landscape of SOTA Models

The field of object detection has witnessed remarkable progress, evolving from traditional computer vision techniques reliant on hand-crafted features (e.g., Histogram of Oriented Gradients – HOG combined with Support Vector Machines – SVM ¹¹) to sophisticated deep learning architectures. Early deep learning approaches bifurcated into two main categories: two-stage detectors and one-stage detectors.² Two-stage detectors, pioneered by the R-CNN family (R-CNN, Fast R-CNN, Faster R-CNN ¹¹), first generate region proposals potentially containing objects and then classify these proposals. While often achieving high accuracy, they historically suffered from slower inference speeds.¹¹ One-stage detectors, exemplified by the You Only Look Once (YOLO) series ⁸ and Single Shot MultiBox Detector (SSD) ², perform localization and classification in a single pass, prioritizing real-time performance.⁷ More recently, Transformer-based architectures, particularly the Detection Transformer (DETR) family ⁴, have emerged, offering an end-to-end approach that eliminates the need for hand-crafted components like Non-Maximum Suppression (NMS).²⁰ This rapid proliferation of models and architectural paradigms presents a significant challenge for practitioners seeking to select the most suitable detector for their specific needs.⁸

1.3 Report Purpose and Scope

This report aims to provide an expert-level, comprehensive comparison of prominent contemporary object detection models. The analysis encompasses core models including YOLOv8, YOLOv9, YOLOv10, YOLOv11, RT-DETR, RF-DETR, YOLO-World, EfficientDet, SSD, and Faster R-CNN. Additionally, relevant contemporary models such as YOLOv12, RTMDet, DINO-DETR, and Deformable DETR are included to provide a broader perspective on the state-of-the-art. The comparison focuses on key technical aspects: architectural design and innovations, performance metrics (accuracy, speed, efficiency), training and deployment considerations (ease of use, convergence, resource requirements, licensing), robustness to distribution shifts and challenging conditions, and suitability for specific use cases. The analysis draws upon recent benchmarks, research papers, and technical documentation to offer an objective and data-driven evaluation.⁴

1.4 Target Audience

This report is intended for technical professionals, including computer vision engineers, machine learning researchers, and technical leads, who require a detailed understanding of the current object detection landscape to make informed decisions regarding model selection for research or deployment.

2. Object Detection Model Overviews

To facilitate a structured comparison, the models under review are grouped into families based on their core architectural paradigms: the YOLO family, the DETR family, Open-Vocabulary models, and other significant CNN-based architectures. This grouping highlights the distinct design philosophies and evolutionary trajectories within the field. The YOLO family has historically prioritized real-time performance through efficient single-stage CNN designs.⁸ The DETR family represents a shift towards end-to-end, transformer-based approaches, eliminating traditional post-processing steps like NMS.²⁰ Open-vocabulary models tackle the limitation of fixed category sets, enabling detection based on arbitrary text prompts.¹¹⁸ Finally, other influential CNN architectures like SSD, Faster R-CNN, and EfficientDet represent important milestones and alternative design choices.

2.1 YOLO Family (You Only Look Once)

General Concept: The YOLO family represents a dominant paradigm in real-time object detection, characterized by its single-stage approach.⁷ Fundamentally, YOLO models frame object detection as a regression problem, directly predicting bounding box coordinates and class probabilities from the entire input image in a single forward pass through the network.⁷ This unified approach contrasts sharply with two-stage detectors and is the key to YOLO’s renowned speed. The series has evolved significantly since YOLOv1 (2015), with versions like YOLOv2, v3, v4, v5, v6, and v7 progressively introducing architectural refinements (e.g., Darknet backbones, anchor boxes, FPN/PANet necks, improved loss functions, data augmentation) to enhance the trade-off between inference speed and detection accuracy.⁹

YOLOv8: Developed by Ultralytics, YOLOv8 marks a significant iteration, featuring a backbone inspired by CSPDarknet, a neck structure resembling PANet, and notably, an anchor-free detection head.²³ This anchor-free design simplifies the architecture and potentially improves generalization compared to earlier anchor-based YOLOs.⁷³ A key strength of YOLOv8 is its unified framework, extending beyond detection to support instance segmentation, pose estimation, image classification, and oriented bounding box (OBB) detection within the same architecture.⁵⁹ It offers scalable models (n, s, m, l, x variants) catering to different resource constraints.⁵⁹ YOLOv8 benefits immensely from the Ultralytics ecosystem, known for its ease of use, comprehensive documentation, active community, and integration with tools like Ultralytics HUB.⁵⁹ However, its models are typically released under the restrictive AGPL-3.0 license, which has significant implications for commercial use.⁵⁸

YOLOv9: Introduced in early 2024, YOLOv9 tackles the challenge of information loss in deep networks.²³ Its core innovations are Programmable Gradient Information (PGI) and the Generalized Efficient Layer Aggregation Network (GELAN).⁴ PGI utilizes an auxiliary reversible branch during training to generate more reliable gradients and preserve information flow, mitigating the information bottleneck problem without adding inference cost.²⁴ GELAN extends the ELAN structure found in models like YOLOv7, allowing the integration of various computational blocks (e.g., CSP, Res blocks) to optimize parameter utilization and efficiency for different hardware.²³ These innovations result in improved accuracy with significant reductions in parameters (up to 49%) and computation (up to 43%) compared to YOLOv8.⁶² YOLOv9 continues the anchor-free head design.¹⁴⁴ Licensing is typically GPL-3.0 for the original implementation ⁵⁸, though integration into frameworks like Ultralytics might involve AGPL-3.0.¹³⁹

YOLOv10: Also emerging in 2024, YOLOv10 focuses on achieving true end-to-end real-time detection by eliminating the need for NMS post-processing during inference.⁷ This is accomplished through a novel training strategy called Consistent Dual Assignments, which provides richer supervision during training using both one-to-one and one-to-many matching, while enabling an efficient NMS-free one-to-one head for inference.⁷ Architecturally, YOLOv10 incorporates several efficiency-driven designs: a lightweight classification head, spatial-channel decoupled downsampling, rank-guided block design for optimizing stage computation, large-kernel convolutions, and partial self-attention (PSA) for enhanced feature extraction with minimal overhead.⁷ These optimizations lead to significant latency reductions and parameter efficiency gains compared to YOLOv9 and RT-DETR variants, achieving state-of-the-art speed-accuracy trade-offs.⁷ YOLOv10 is available under the AGPL-3.0 license through Ultralytics.⁵⁸

YOLOv11: Developed by Ultralytics, YOLOv11 builds upon its predecessors, aiming for further improvements in accuracy and efficiency.⁷⁴ Architecturally, it replaces the C2f block (used in YOLOv8/v10) with a C3k2 block (Cross Stage Partial with kernel size 2) for potentially better computational efficiency.¹⁴⁹ It retains the SPPF (Spatial Pyramid Pooling – Fast) module but introduces a C2PSA (Convolutional block with Parallel Spatial Attention) module after the backbone’s final stage to enhance feature extraction, particularly for small or occluded objects.¹⁴⁹ Like YOLOv8, it supports multiple tasks including detection, segmentation, pose estimation, classification, and OBB.⁷⁴ Benchmarks suggest YOLOv11m achieves higher mAP than YOLOv8m with 22% fewer parameters and slightly faster inference than YOLOv10.¹¹¹ It is available in nano to extra-large variants.⁷⁴ License: AGPL-3.0.⁵⁸

YOLOv12: The most recent iteration (proposed early 2025), YOLOv12 introduces an “attention-centric” design, aiming to integrate the modeling power of attention mechanisms while maintaining the speed advantages of CNNs.⁶ Key architectural components include Area Attention (A2), potentially accelerated by FlashAttention for efficiency, and Residual Efficient Layer Aggregation Networks (R-ELAN) which modifies the ELAN structure with block-level residual connections for improved training stability and convergence.⁴² YOLOv12 claims to surpass previous YOLO versions and RT-DETR variants in accuracy-speed trade-offs, with the nano version reportedly outperforming YOLOv10-N and YOLOv11-N in mAP with comparable latency.⁶ It is available in scalable variants (n/s/m/l/x).⁶ License: AGPL-3.0.¹³⁹

2.2 DETR Family (Detection Transformer)

General Concept: The DETR family initiated a paradigm shift by applying the Transformer architecture, originally successful in Natural Language Processing, to object detection.⁴ The core innovation is framing object detection as a direct set prediction problem.²⁰ Using a Transformer encoder-decoder structure combined with a bipartite matching loss during training, DETR models predict a fixed-size set of objects, uniquely assigning predictions to ground truths.²⁰ This elegant end-to-end design eliminates the need for hand-designed components like anchor generation and, crucially, Non-Maximum Suppression (NMS) post-processing.²⁰ However, the original DETR faced significant challenges: notoriously slow training convergence, high computational complexity due to global self-attention ( $O (n^{2})$ ), and relatively poor performance on small objects.³⁵ Subsequent DETR variants primarily focus on addressing these limitations.

Deformable DETR: Deformable DETR was a major step in making DETR more practical.¹⁵⁹ Its key innovation is the Deformable Attention Module.¹⁵⁸ Instead of attending to all pixels in the feature map, deformable attention focuses on a small, fixed number of key sampling points around a reference point for each query.¹⁵⁸ The locations of these sampling points are learned offsets from the reference point, allowing the model to adaptively focus on relevant image regions. This sparse sampling significantly reduces the computational complexity, particularly for high-resolution feature maps, and enables the effective processing of multi-scale features derived from the CNN backbone (typically ResNet ¹⁵⁸). As a result, Deformable DETR converges much faster (reportedly 10x) than the original DETR and demonstrates improved accuracy, especially for detecting small objects.¹⁵⁸ The standard architecture consists of 6 encoder and 6 decoder layers.¹⁵⁸ A ResNet-50 based Deformable DETR has approximately 41 million parameters and requires around 86 GFLOPs.¹⁶⁶ It is released under the permissive Apache 2.0 license.¹⁵⁹

RT-DETR (Real-Time DETR) & Variants (RT-DETRv2/v3): Developed by Baidu, RT-DETR was arguably the first DETR variant to achieve true real-time performance competitive with, and often exceeding, the YOLO family.⁵ It achieves this through several key architectural innovations. Firstly, it employs an Efficient Hybrid Encoder that processes multi-scale features from a CNN backbone (e.g., ResNet, HGNetv2 ³³) by decoupling intra-scale interaction (using an attention mechanism, AIFI) and cross-scale fusion (using CNN-based modules, CCFM).³³ This design significantly reduces the computational bottleneck typically found in DETR encoders.³⁵ Secondly, it introduces IoU-aware Query Selection (or Uncertainty-Minimal Query Selection) to choose a fixed number of high-quality initial object queries from the encoder features, improving accuracy and potentially aiding convergence.²⁰ Thirdly, leveraging the standard DETR decoder structure, RT-DETR allows flexible adjustment of inference speed by simply using fewer decoder layers without retraining.³⁴ RT-DETRv2 and RT-DETRv3 further refine the model, primarily through improved training strategies like dynamic data augmentation and hierarchical dense positive supervision to address the sparse supervision issue inherent in DETR’s one-to-one matching.²⁰ RT-DETR models are released under the Apache 2.0 license.⁵⁶

RF-DETR (Roboflow DETR): Developed and released by Roboflow in 2025, RF-DETR is another real-time DETR variant aiming for state-of-the-art performance, particularly focusing on domain adaptability.⁵⁴ Its architecture builds upon the foundations of Deformable DETR and incorporates ideas from LW-DETR.⁶⁶ A key component is its use of a pre-trained DINOv2 Vision Transformer as the backbone.⁵⁴ This leverages the strong generalization capabilities learned by DINOv2 through self-supervised pre-training on large datasets, aiming for better performance when fine-tuned on novel or smaller datasets.⁶⁶ Unlike Deformable DETR which often uses multi-scale features, RF-DETR employs a single-scale feature extraction strategy from the backbone, likely to improve efficiency.⁶⁶ It maintains the anchor-free and NMS-free characteristics of the DETR family.⁷⁷ RF-DETR is available in Base (29M parameters) and Large (128M or 129M parameters) variants.⁶⁸ It is released under the permissive Apache 2.0 license.⁵⁴

DINO-DETR: DINO (DETR with Improved deNoising anchor boxes) represents another significant advancement in the DETR lineage, focusing on improving training stability and final performance.⁴⁸ It integrates several techniques: Contrastive DeNoising (CDN) training, where the model learns to reconstruct original boxes from noisy versions; Mixed Query Selection, combining anchor points and content features for better query initialization; and a Look Forward Twice scheme for box prediction refinement.⁴⁸ DINO-DETR, particularly when paired with large backbones like Swin-L, achieved state-of-the-art results on COCO at the time of its release.⁴⁹ While powerful, it generally requires significant computational resources for training and inference. The code is available under likely an Apache 2.0 license, given its origins in the DETR family and release by IDEA-Research.⁴⁹

2.3 Open-Vocabulary Models

General Concept: Open-Vocabulary Detection (OVD) addresses a major limitation of traditional object detectors: their inability to recognize object categories not seen during training.⁷⁰ OVD models aim to detect objects based on arbitrary textual descriptions or prompts provided at inference time, leveraging the power of Vision-Language Models (VLMs).¹¹⁸ Early OVD approaches often involved computationally expensive methods, requiring real-time encoding of text prompts and image features, hindering real-time application.⁸⁴

YOLO-World: Developed by Tencent AI Lab CVC, YOLO-World specifically targets real-time OVD by adapting the efficient YOLO architecture (likely YOLOv8-based ⁶⁷) for open-vocabulary tasks.¹¹⁸ Its key innovation is the “prompt-then-detect” paradigm ¹¹⁸: user prompts (desired categories) are encoded offline into vocabulary embeddings, which are then re-parameterized into the model weights.¹¹⁸ This avoids costly online text encoding during inference, enabling significantly faster speeds.¹¹⁸ Architecturally, it features a Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) to fuse visual features (from a DarkNet-like backbone ¹²⁴) and text embeddings (from a pre-trained text encoder like CLIP ⁷⁰).⁷⁰ It’s pre-trained on large-scale detection, grounding, and image-text datasets using a region-text contrastive loss.¹¹⁸ YOLO-World is available in S, M, L, and X variants ¹¹⁸, with PyTorch code and pre-trained weights released.¹¹⁸ Export options like ONNX and TFLite are available or planned.¹¹⁸ License: GPL-3.0 ¹¹⁸ or AGPL-3.0 via Ultralytics.¹³⁹

2.4 Other Architectures

EfficientDet: Developed by Google Brain, EfficientDet prioritizes computational efficiency and scalability.⁹ It achieves this through three main innovations: 1) Using the highly efficient EfficientNet as its backbone network for feature extraction ⁹; 2) Introducing the Bi-directional Feature Pyramid Network (BiFPN), a novel neck module that allows efficient and weighted multi-scale feature fusion with both top-down and bottom-up pathways ⁹; 3) Employing a compound scaling method that uniformly scales the resolution, depth, and width of the backbone, BiFPN, and prediction heads simultaneously using a single coefficient ( $ϕ$ ).¹² This compound scaling creates a family of models (EfficientDet-D0 to D7x, plus Lite versions for edge devices) offering a spectrum of accuracy-efficiency trade-offs.³ EfficientDet models were state-of-the-art in efficiency upon release, achieving high mAP on COCO with significantly fewer parameters and FLOPs than contemporary detectors.⁹ The official code is likely under Apache 2.0, but weight licenses should be checked.

SSD (Single Shot MultiBox Detector): SSD was one of the pioneering single-stage object detectors, designed for real-time performance.² Its core idea is to predict object categories and bounding box offsets directly from feature maps at multiple scales within a single forward pass of the network.¹⁹ It typically uses a standard classification network (like VGG16 ¹⁹) as a base network, truncated before the final classification layers.²⁷ Auxiliary convolutional layers are added on top, progressively decreasing in size, to enable detection at multiple feature map resolutions – larger feature maps detect smaller objects, smaller feature maps detect larger objects.³ At each location on these feature maps, SSD predicts scores and offsets relative to a set of pre-defined default boxes (similar to anchors) with varying aspect ratios and scales.¹⁹ Small convolutional filters (e.g., 3×3) are applied to generate these predictions.²⁷ Finally, NMS is applied to filter redundant detections.¹⁹ While faster than two-stage methods of its time, SSD’s accuracy, particularly for small objects, could be lower.¹¹

Faster R-CNN: Faster R-CNN represents a significant milestone in the evolution of two-stage object detectors, dramatically improving speed over its predecessors (R-CNN, Fast R-CNN).⁸ Its key innovation is the Region Proposal Network (RPN), a fully convolutional network that learns to propose object regions, replacing the slow, external Selective Search algorithm used previously.¹¹ The RPN shares convolutional features with the main detection network (Fast R-CNN head), making region proposal nearly cost-free at inference time.¹³ The RPN slides a small network over the shared feature map, and at each location, predicts objectness scores and bounding box refinements relative to a set of predefined anchor boxes of different scales and aspect ratios.¹³ These proposals are then fed into the second stage, the Fast R-CNN detector, which uses RoI Pooling to extract features for each proposal from the shared map, followed by classification and final bounding box regression.¹³ This two-stage approach, combined with feature sharing, allowed Faster R-CNN to achieve state-of-the-art accuracy while being significantly faster than previous R-CNN methods, though still generally slower than single-stage detectors.⁸

RTMDet: Developed within the MMDetection/OpenMMLab ecosystem, RTMDet (Real-Time Model Detection) is designed as an efficient real-time object detector aiming to surpass the YOLO series.²⁰⁰ Its architecture focuses on having compatible capacities in the backbone and neck, potentially utilizing large-kernel depth-wise convolutions for efficiency.²⁰² A key feature is its use of dynamic label assignment, specifically incorporating soft labels during the matching cost calculation to improve accuracy.²⁰⁰ RTMDet is designed to be easily extensible to other tasks like instance segmentation and rotated object detection.²⁰⁰ It is available in various sizes (tiny to extra-large) to cater to different application needs.²⁰² RTMDet is released under the Apache 2.0 license.²⁰¹

3. Comparative Analysis

This section provides a detailed comparison of the selected object detection models based on the defined criteria: architecture, performance (accuracy, speed, efficiency), training and deployment factors, and robustness.

3.1 Architectural Comparison

The evolution of object detection architectures reveals distinct philosophies and trade-offs.

Backbone Networks: There has been a clear progression from standard CNNs like VGG ¹⁹ and ResNet ¹³ towards more efficient CNN designs like EfficientNet (used in EfficientDet ⁹) and specialized YOLO backbones (CSPDarknet variants ²⁰, GELAN ²³). Concurrently, Vision Transformers (ViTs) have emerged as powerful backbones, particularly in DETR variants, with models like DINOv2 being leveraged for their strong pre-training capabilities and domain adaptability (e.g., in RF-DETR ⁵⁴). The choice of backbone significantly impacts both feature representation quality and computational cost.
Neck Designs: Feature Pyramid Networks (FPNs) ¹³ and their enhancement, Path Aggregation Networks (PANets) ²⁰, became standard for fusing multi-scale features in CNN-based detectors like YOLO. EfficientDet introduced the more complex BiFPN with weighted fusion.⁹ Recent YOLO models incorporate specialized aggregation networks like GELAN ²³ and R-ELAN ⁴², while YOLO-World uses the RepVL-PAN to integrate language features.⁷⁰ DETR models typically rely on the Transformer encoder itself for feature fusion, though variants like RT-DETR incorporate CNN-based cross-scale fusion modules (CCFM).³³
Head Architectures: A major trend is the shift from anchor-based prediction heads (used in SSD, Faster R-CNN, early YOLOs ¹³) to anchor-free designs (YOLOv8 onwards ²³, RTMDet ²⁰², CenterNet ²¹⁹). Anchor-free heads simplify the detection pipeline and can improve generalization. DETR models employ a unique set prediction head based on object queries and bipartite matching.²⁰
Attention Mechanisms: Transformers inherently use self-attention for global context modeling. DETR variants have focused on making this more efficient, notably through Deformable Attention ¹⁵⁸, which is used in Deformable DETR, RT-DETR, and RF-DETR. Recent YOLO versions (v10, v11, v12) are also incorporating attention mechanisms (Partial Self-Attention, C2PSA, Area Attention) to enhance feature representation while trying to maintain CNN efficiency.⁷
Overall Trends: The architectural landscape shows two primary trajectories. The YOLO lineage continues to refine CNN-based single-stage detection, focusing on optimizing the speed-accuracy curve through architectural tweaks (GELAN, R-ELAN), efficiency improvements (anchor-free heads, NMS-free training in YOLOv10), and cautiously integrating attention mechanisms.⁴ The DETR family, starting from a fundamentally different Transformer base, has focused on overcoming its initial limitations (slow convergence, high cost, small object issues) through innovations like deformable attention, hybrid encoders, and improved query selection/training strategies, aiming for high accuracy and end-to-end simplicity.²⁰ Models like RF-DETR attempt to bridge these worlds by combining strong pre-trained ViT backbones with efficient DETR components.⁶⁶ YOLO-World represents a distinct branch, adapting the efficient YOLO structure for the novel task of open-vocabulary detection.¹¹⁸

3.2 Performance Benchmarks

Performance is typically evaluated based on accuracy (mAP), speed (FPS/latency), and efficiency (parameters/FLOPs). These metrics are often interdependent, representing a trade-off that model designers navigate.

Table 1: Accuracy Comparison (COCO val2017 / LVIS val)

Model Variant	Dataset	Input Size	mAP (50:95)	mAP (50)	mAP (75)	AP_S	AP_M	AP_L	Source(s)
Faster R-CNN (ResNet50-FPN)	COCO val	–	~37-41%	~59-62%	~40-44%	~21-24%	~40-44%	~48-52%	¹³ (Typical ranges)
SSD300 (VGG16)	COCO val	300×300	25.1%	43.1%	25.8%	6.6%	25.9%	41.4%	¹⁹
SSD512 (VGG16)	COCO val	512×512	28.8%	48.5%	30.3%	10.9%	31.8%	43.5%	¹⁹
EfficientDet-D0	COCO val	512×512	34.6%	53.0%	37.1%	12.4%	39.0%	52.7%	⁶¹
EfficientDet-D1	COCO val	640×640	40.5%	59.1%	43.7%	18.3%	45.0%	57.5%	⁶¹
EfficientDet-D4	COCO val	1024×1024	49.7%	68.4%	53.9%	30.7%	53.2%	63.2%	⁶¹
EfficientDet-D7x	COCO test-dev	1536×1536	55.1%	74.3%	59.9%	37.2%	57.9%	68.0%	⁶¹
Deformable DETR (R50)	COCO val	–	45.2%	–	–	–	–	–	¹⁶⁶
Deformable DETR (R101+DCN)	COCO test-dev	–	52.3%	71.9%	58.1%	34.4%	54.4%	65.6%	²⁰⁴
DINO-DETR (Swin-L)	COCO val	–	57.0%	–	–	–	–	–	¹¹⁶ (Paper value)
RT-DETR-R50	COCO val	640×640	53.1%	71.3%	–	–	–	–	⁴³
RT-DETR-R101	COCO val	640×640	54.3%	72.7%	–	–	–	–	⁴³
RT-DETR-HGNetv2-L	COCO val	640×640	53.0%	71.6%	–	–	–	–	⁵⁰
RT-DETR-HGNetv2-X	COCO val	640×640	54.8%	73.1%	–	–	–	–	⁵⁰
RT-DETRv2-S (R18)	COCO val	640×640	48.1%	65.1%	–	–	–	–	⁴
RT-DETRv2-L (R50)	COCO val	640×640	53.4%	71.6%	–	–	–	–	⁴
RT-DETRv2-X (R101)	COCO val	640×640	54.3%	72.8%	–	–	–	–	⁴
RF-DETR-Base	COCO val	640×640	53.3%	–	–	–	–	–	⁷²
RF-DETR-Large	COCO val	728×728	60.5%	–	–	–	–	–	⁵⁴
RTMDet-tiny	COCO val	640×640	41.1%	–	–	–	–	–	⁵⁸
RTMDet-m	COCO val	640×640	50.0%	–	–	–	–	–	⁵⁸
RTMDet-x	COCO val	640×640	52.8%	–	–	–	–	–	⁵⁸
YOLOv8n	COCO val	640×640	37.3%	52.5%	–	–	–	–	⁵⁹
YOLOv8s	COCO val	640×640	44.9%	61.8%	–	–	–	–	⁵⁹
YOLOv8m	COCO val	640×640	50.2%	67.2%	–	–	–	–	⁵⁹
YOLOv8l	COCO val	640×640	52.9%	69.8%	–	–	–	–	⁵⁹
YOLOv8x	COCO val	640×640	53.9%	71.0%	–	–	–	–	⁵⁹
YOLOv9t	COCO val	640×640	38.3%	–	–	–	–	–	⁴
YOLOv9s	COCO val	640×640	46.8%	63.4%	50.7%	–	–	–	⁴
YOLOv9m	COCO val	640×640	51.4%	68.1%	56.1%	–	–	–	⁴
YOLOv9c	COCO val	640×640	53.0%	70.2%	57.8%	–	–	–	⁴
YOLOv9e	COCO val	640×640	55.6%	72.8%	60.6%	–	–	–	⁴
YOLOv10n	COCO val	640×640	39.5%	55.1%	–	–	–	–	⁹¹
YOLOv10s	COCO val	640×640	46.7%	63.1%	–	–	–	–	⁹¹
YOLOv10m	COCO val	640×640	51.3%	67.8%	–	–	–	–	⁹¹
YOLOv10b	COCO val	640×640	52.7%	69.4%	–	–	–	–	⁶⁰
YOLOv10l	COCO val	640×640	53.4%	70.1%	–	–	–	–	⁶⁰
YOLOv10x	COCO val	640×640	54.4%	71.1%	–	–	–	–	⁶⁰
YOLOv11n	COCO val	640×640	39.5%	–	–	–	–	–	⁷⁴
YOLOv11s	COCO val	640×640	47.0%	–	–	–	–	–	⁷⁴
YOLOv11m	COCO val	640×640	51.5%	–	–	–	–	–	⁶⁰
YOLOv11l	COCO val	640×640	53.4%	–	–	–	–	–	⁷⁴
YOLOv11x	COCO val	640×640	54.7%	–	–	–	–	–	⁶⁰
YOLOv12n	COCO val	640×640	40.6%	–	–	–	–	–	¹⁵¹
YOLOv12s	COCO val	640×640	48.0%	–	–	–	–	–	⁶⁰
YOLOv12m	COCO val	640×640	52.5%	–	–	–	–	–	⁶⁰
YOLOv12l	COCO val	640×640	53.7%	–	–	–	–	–	⁴²
YOLOv12x	COCO val	640×640	55.2%	–	–	–	–	–	⁶⁰
YOLO-World-L (Zero-shot)	LVIS val	–	35.4%	–	–	–	–	–	⁷⁰
YOLO-World-L (Fine-tuned)	COCO val	–	52.7%	–	–	–	–	–	¹⁹¹ (Paper value)

Note: AP values are approximate ranges or specific reported values. Small object ( $A P_{S}$ ) data is sparse in the provided snippets. ‘-‘ indicates data not found in provided snippets.

Accuracy Discussion: The table highlights the continuous improvement in accuracy within the YOLO family, with newer versions like YOLOv9, v10, v11, and v12 generally achieving higher mAP scores than YOLOv8 at comparable scales.⁴ DETR variants, particularly RT-DETR and RF-DETR, demonstrate strong performance, often matching or exceeding YOLO models, especially the larger variants.⁴ RF-DETR-Large notably achieves over 60% mAP, a significant milestone for real-time models.⁵⁴ EfficientDet models show good scalability, with D7x reaching high accuracy, though potentially slower than recent YOLOs or DETRs.⁶¹ Older models like SSD and Faster R-CNN generally lag behind newer architectures in standard COCO benchmarks. Small object detection ( $A P_{S}$ ) remains a challenge, though Deformable DETR shows good results ²⁰⁴, and newer YOLOs (like v11 ¹⁵⁰) and DETRs (like RT-DETR ¹⁶⁷) claim improvements. YOLO-World demonstrates impressive zero-shot performance on LVIS and strong fine-tuned performance on COCO.⁸⁶

Table 2: Speed & Efficiency Comparison

Model Variant	Params (M)	FLOPs (G)	Latency (ms) / FPS	Hardware / Precision	Source(s)
Faster R-CNN (VGG16)	~138+	High	~140 ms (7 FPS)	GPU	¹¹
SSD300 (VGG16)	~26.2	~30	17.2 ms (58 FPS)	Titan X / FP32	¹⁹
SSD512 (VGG16)	~26.2	~80	–	–	¹⁹
EfficientDet-D0	3.9	2.54	10.2 ms (97 FPS)	V100 / FP32	⁶¹
EfficientDet-D1	6.6	6.10	13.5 ms (74 FPS)	V100 / FP32	⁶¹
EfficientDet-D4	20.7	55.2	42.8 ms (23 FPS)	V100 / FP32	⁶¹
EfficientDet-D7x	77.0	410	153 ms (6.5 FPS)	V100 / FP32	⁶¹
Deformable DETR (R50)	~41	86	~103 ms (9.7 FPS)	RTX 3090Ti	¹⁶²
Deformable DETR (R50)	~41	173	~200 ms (5 FPS)	T4 GPU	³⁵ (Approx. from DINO comparison)
DINO-DETR (R50)	–	–	~200 ms (5 FPS)	T4 GPU	³⁵ (Paper value)
RT-DETR-R50	42	136	9.2 ms (108 FPS)	T4 / TensorRT FP16	²⁰³
RT-DETR-R101	76	259	13.5 ms (74 FPS)	T4 / TensorRT FP16	²⁰³
RT-DETR-HGNetv2-L	32	110	8.8 ms (114 FPS)	T4 / TensorRT FP16	²⁰³
RT-DETR-HGNetv2-X	67	234	13.5 ms (74 FPS)	T4 / TensorRT FP16	²⁰³
RT-DETRv2-S (R18)	20	60	5.03 ms (198 FPS)	T4 / TensorRT FP16	⁴
RT-DETRv2-L (R50)	42	136	9.76 ms (102 FPS)	T4 / TensorRT FP16	⁴
RT-DETRv2-X (R101)	76	259	15.03 ms (66 FPS)	T4 / TensorRT FP16	⁴
RF-DETR-Base	29	–	6.0 ms (167 FPS)	T4 / TensorRT10 FP16	⁷²
RF-DETR-Large	128/129	–	~40 ms (25 FPS) @728	T4 / TensorRT FP16	⁵⁴
RTMDet-x	94.9	–	3.1 ms (322 FPS)	3090 / TensorRT FP16	²⁰²
YOLOv8n	3.2	8.7	1.47 ms (680 FPS)	T4 / TensorRT FP16	⁵⁹
YOLOv8s	11.2	28.6	2.66 ms (376 FPS)	T4 / TensorRT FP16	⁵⁹
YOLOv8m	25.9	78.9	5.86 ms (171 FPS)	T4 / TensorRT FP16	⁵⁹
YOLOv8l	43.7	165.2	9.06 ms (110 FPS)	T4 / TensorRT FP16	⁵⁹
YOLOv8x	68.2	257.8	14.37 ms (70 FPS)	T4 / TensorRT FP16	⁵⁹
YOLOv9t	2.0	7.7	2.3 ms (435 FPS)	T4 / TensorRT10 FP16	⁴
YOLOv9s	7.1	26.4	3.54 ms (282 FPS)	T4 / TensorRT10 FP16	⁴
YOLOv9m	20.0	76.3	6.43 ms (155 FPS)	T4 / TensorRT10 FP16	⁴
YOLOv9c	25.3	102.1	7.16 ms (140 FPS)	T4 / TensorRT10 FP16	⁴
YOLOv9e	57.3	189.0	16.77 ms (60 FPS)	T4 / TensorRT10 FP16	⁴
YOLOv10n	2.3	6.7	1.56-1.84 ms (~543-641 FPS)	T4 / TensorRT FP16	⁹¹
YOLOv10s	7.2	21.6	2.49-2.66 ms (~376-401 FPS)	T4 / TensorRT FP16	⁹¹
YOLOv10m	15.4	59.1	4.74-5.48 ms (~182-211 FPS)	T4 / TensorRT FP16	⁹¹
YOLOv10b	24.4	92.0	5.74-6.54 ms (~153-174 FPS)	T4 / TensorRT FP16	⁹¹
YOLOv10l	24.4	120.3	7.28-8.33 ms (~120-137 FPS)	T4 / TensorRT FP16	⁹¹
YOLOv10x	29.5	160.4	10.7-12.2 ms (~82-93 FPS)	T4 / TensorRT FP16	⁶⁰
YOLOv11n	2.6	6.5	1.5 ms (667 FPS)	T4 / TensorRT10 FP16	⁷⁴
YOLOv11s	9.4	21.5	2.5 ms (400 FPS)	T4 / TensorRT10 FP16	⁷⁴
YOLOv11m	20.1	68.0	4.7 ms (212 FPS)	T4 / TensorRT10 FP16	⁶⁰
YOLOv11l	25.3	86.9	6.2 ms (161 FPS)	T4 / TensorRT10 FP16	⁷⁴
YOLOv11x	56.9	194.9	11.3 ms (88 FPS)	T4 / TensorRT10 FP16	⁶⁰
YOLOv12n	–	–	1.64 ms (610 FPS)	T4 / TensorRT FP16	¹⁵¹
YOLOv12s	9.3	21.4	2.61 ms (383 FPS)	T4 / TensorRT FP16	⁶
YOLOv12m	20.2	67.5	4.86 ms (206 FPS)	T4 / TensorRT FP16	⁶
YOLOv12l	–	88.9	6.5 ms (154 FPS)	T4 / TensorRT FP16	⁴²
YOLOv12x	–	194.9	11.5 ms (87 FPS)	T4 / TensorRT FP16	⁶⁰
YOLO-World-S	13 (77 rep)	–	13.5 ms (74.1 FPS)	V100 / FP32	⁸⁴
YOLO-World-L	48 (110 rep)	–	19.2 ms (52.0 FPS)	V100 / FP32	⁷⁰
YOLOv8n (Jetson Orin Nano 8GB)	3.2	8.7	~40 ms (25 FPS)	Orin Nano / TensorRT	²²³
YOLOv8s (Jetson AGX Orin 32GB)	11.2	28.6	~3.2 ms (313 FPS)	AGX Orin / TensorRT INT8	¹³⁵
YOLOv8x (Jetson AGX Orin 32GB)	68.2	257.8	~13.3 ms (75 FPS)	AGX Orin / TensorRT INT8	¹³⁵
RT-DETR-R50 (Jetson Orin Nano 8GB)	42	136	~41.7 ms (24 FPS)	Orin Nano / TensorRT FP16	¹¹³

Note: FPS/Latency values are highly dependent on hardware, batch size, precision (FP32/FP16/INT8), and optimization frameworks (TensorRT, OpenVINO). Values are indicative based on reported benchmarks. Reparameterized parameters for YOLO-World are shown in parentheses. ‘-‘ indicates data not found in provided snippets.

Speed & Efficiency Discussion: The YOLO family consistently demonstrates superior inference speed, particularly the smaller variants (n, s, t) on both GPU (especially with TensorRT) and CPU platforms.⁴ YOLOv10 and YOLOv12, in particular, push the boundaries of low latency.⁶ DETR variants like RT-DETR achieve real-time speeds on GPUs (e.g., >100 FPS on T4 for RT-DETR-R50/L ²⁰³), often outperforming larger YOLO models in accuracy-per-FLOP but potentially lagging in raw FPS compared to the most optimized small YOLOs.⁴ RF-DETR also shows competitive latency.⁷² Efficiency, measured by Params/FLOPs relative to accuracy, has improved significantly. Newer YOLOs (v9, v10, v11, v12) generally offer better parameter efficiency than YOLOv8 for similar mAP.⁴ DETR models tend to have higher parameter counts and FLOPs compared to YOLOs of similar speed class, but their architectural differences mean FLOPs don’t always directly translate to latency on parallel hardware like GPUs.⁴ Edge device benchmarks (Jetson Orin ², Coral TPU ²) show significant performance variations and emphasize the need for hardware-specific optimization (TensorRT, INT8 quantization).¹³⁵ YOLO-World achieves real-time speeds (50-70 FPS on V100) despite its complexity, thanks to its YOLO backbone and offline vocabulary.⁷⁰

3.3 Training & Deployment Factors

Beyond raw performance, practical considerations heavily influence model selection.

Table 3: Training & Deployment Factors

Model Family/Name	Ease of Use (Ecosystem)	Training Convergence	GPU Memory Needs	Export Formats	Licensing
YOLO Family (Ultralytics)	High (Unified API, Docs, Community) ⁴	Fast (Relative to DETR) ⁵	Lower (vs. DETR) ⁴	ONNX, TensorRT, OpenVINO, CoreML, TFLite, etc. ⁴	AGPL-3.0 (v8, v10, v11, v12, World) / GPL-3.0 (v9 original) ¹¹⁸
DETR Family (General)	Medium-High (MMDetection, Paddle, Ultralytics, Roboflow) ⁴	Slow (Original DETR) to Moderate (Deformable, RT-DETR, DINO, RF-DETR) ²⁰	High ⁴	ONNX, TensorRT, OpenVINO (Support varies by framework) ⁴	Apache 2.0 (Most variants like Deformable, RT-DETR, RF-DETR, DINO) ⁴⁹
EfficientDet	Medium (Google AutoML Repo, TF Hub) ⁶¹	Moderate (300-600 epochs) ⁶¹	Medium ⁶¹	TFLite, ONNX, TensorRT, SavedModel ⁶¹	Apache 2.0 (Code)
SSD	Medium (Various implementations)	Fast	Low	Various	Permissive (likely)
Faster R-CNN	Medium (Various implementations)	Slow	High	Various	Permissive (likely)
RTMDet	Medium (MMDetection) ²⁰¹	Fast ²⁰²	Medium	ONNX, TensorRT (via MMDetection)	Apache 2.0 ²⁰¹

Ease of Use & Ecosystem: The Ultralytics framework provides a significant advantage for the YOLO family (v8 onwards) and its RT-DETR integration, offering a unified API, extensive documentation, pre-trained weights, and strong community support, simplifying training and deployment.⁴ Frameworks like MMDetection ¹¹⁸ and PaddleDetection ²⁰ support various models (including DETRs like RTMDet and the original RT-DETR) but can present a steeper learning curve or documentation challenges.⁵³ Roboflow offers a platform-centric approach, simplifying dataset management and training for models like RF-DETR.⁵⁴ User experiences reported in forums and GitHub issues often highlight challenges with setting up and fine-tuning DETR variants compared to YOLO models.⁵
Training Convergence & GPU Memory: A well-known characteristic of the original DETR was its slow convergence, requiring significantly more training epochs (e.g., 500) compared to typical CNN detectors.³² While variants like Deformable DETR (50 epochs ¹⁵⁹), RT-DETR (72-120 epochs ²⁰), DINO-DETR, LW-DETR, and RF-DETR have substantially improved convergence speed ²⁰, YOLO models generally train faster.⁵ Transformer-based models (DETRs) also typically demand significantly more GPU memory during training than CNN-based models like YOLO, potentially limiting accessibility for users with constrained hardware.⁴
Deployment Flexibility & Licensing: Support for various export formats (ONNX, TensorRT, OpenVINO, CoreML, TFLite) is crucial for deployment across different hardware platforms (server GPUs, edge devices, mobile).⁴ The Ultralytics ecosystem provides robust export capabilities for its supported models. Licensing is a critical, often overlooked factor. Many recent YOLO models (v8, v10, v11, v12, YOLO-World via Ultralytics) are released under the AGPL-3.0 license.⁵⁸ This license requires users who modify the code and offer it as a network service to release their modifications, posing significant challenges for commercial use unless an enterprise license is purchased.¹³⁸ In contrast, most DETR variants (Deformable DETR, RT-DETR, RF-DETR, DINO-DETR, RTMDet) and EfficientDet are released under the more permissive Apache 2.0 license, which is generally preferred for commercial applications as it does not have the same network service distribution clause.⁴⁹ YOLOv9’s original license is GPL-3.0 ⁵⁸, which also has copyleft provisions but lacks the specific network clause of AGPL.

3.4 Robustness

Robustness refers to a model’s ability to maintain performance under challenging conditions, such as variations in lighting, weather, occlusion, clutter, or domain shifts (differences between training and deployment data).³⁸

Common Corruptions (COCO-C): Benchmarks like COCO-C evaluate performance on synthetically corrupted images (e.g., noise, blur, weather effects).³⁶ While specific COCO-C results for all models discussed here were not found in the provided snippets, studies comparing DETR and CNN-based models on such benchmarks sometimes suggest transformers might offer better robustness due to their global attention mechanism, though this is not universally conclusive.³⁷ YOLOv8 and YOLOv9 documentation mentions robustness improvements, but quantitative COCO-C data is lacking in the snippets.²³ RF-DETR’s DINOv2 backbone, pre-trained via self-supervision, is specifically designed for better generalization and robustness to domain shifts.⁶⁶
Natural Distribution Shifts (COCO-O, Domain Adaptation): COCO-O benchmarks robustness against natural shifts like artistic styles (sketch, cartoon, painting).⁴¹ Studies using COCO-O suggest that backbone architecture is crucial for robustness, and while end-to-end DETR designs didn’t inherently enhance robustness over CNNs in early evaluations, large-scale foundation models (like those potentially used as backbones in newer DETRs or OVD models) show significant improvement.²⁶² Domain adaptation studies, like evaluating COCO-trained models in specific environments (e.g., Kazakhstan driving scenes ³⁷), often show performance drops for all models, highlighting the challenge of domain shift. In one such study, RT-DETR demonstrated better generalization than YOLOv8s and YOLO-NAS, suggesting its transformer architecture might handle unfamiliar conditions more robustly.³⁷
Occlusion and Clutter: Handling occluded or cluttered objects is a persistent challenge.³⁸ Transformer-based models like DETR variants, with their global attention mechanisms, are theoretically better equipped to reason about context and partially visible objects compared to CNNs that rely more on local features.⁷⁷ Recent YOLO versions like YOLOv11 also claim specific architectural improvements (e.g., C2PSA) for better handling of occlusion.¹⁵⁰
Overall Robustness Comparison: Evidence suggests that while newer models generally improve, significant robustness gaps remain, especially against domain shifts and challenging conditions like occlusion.³⁷ Transformer-based models (DETR variants, especially those with strong pre-trained backbones like DINOv2 in RF-DETR) may offer an advantage in handling global context and domain shifts compared to standard CNN-based YOLOs.³⁷ However, the specific architecture, training data, and fine-tuning strategy heavily influence robustness. Open-vocabulary models face unique robustness challenges related to prompt sensitivity and detecting unusual objects.⁴¹

3.5 Strengths, Weaknesses, and Use Cases

YOLO Family (v8-v12):

Strengths: Excellent speed-accuracy trade-off, particularly for real-time applications ⁴; High efficiency (low Params/FLOPs, especially newer versions) ⁴; Strong ecosystem (Ultralytics) providing ease of use, training, deployment, and community support ⁴; Versatility (detection, segmentation, pose, etc.).⁵⁹
Weaknesses: Restrictive licensing (AGPL-3.0) for commercial use without an enterprise license ⁵⁸; Potentially lower peak accuracy than large DETR models in complex scenes ⁸²; Historically weaker on small/occluded objects compared to some specialized or transformer models (though improving).²³
Use Cases: Real-time surveillance ⁴, autonomous driving (ADAS) ⁴, robotics ⁴, edge computing ⁴, mobile applications ⁴, rapid prototyping.⁷⁶

DETR Family (RT-DETR, RF-DETR, Deformable, DINO):

Strengths: Potential for higher peak accuracy, especially larger variants ⁴; End-to-end design (NMS-free) simplifies pipeline ²⁰; Good performance on small/occluded objects (especially Deformable, RT-DETR, RF-DETR) ³⁹; Potentially better robustness/generalization due to global attention and strong backbones (e.g., DINOv2 in RF-DETR) ³⁷; Permissive licensing (Apache 2.0 for most).⁴⁹
Weaknesses: Higher computational cost and GPU memory requirements, especially for training ⁴; Slower convergence historically (though improved) ³⁵; Can be slower than optimized YOLOs on CPU/edge devices ⁴; Potentially more complex to implement/fine-tune.²⁴
Use Cases: High-accuracy applications where resources permit (e.g., medical imaging ⁴, high-resolution analysis ⁴); Complex scenes with occlusion/clutter ⁷⁷; Autonomous driving ⁴; Robotics.⁴

YOLO-World:

Strengths: Open-vocabulary detection capability (detects unseen classes via prompts) ¹¹⁸; Real-time performance (leveraging YOLO architecture) ¹¹⁸; Efficient inference via offline vocabulary ¹¹⁸; Good zero-shot and fine-tuning performance.¹¹⁸
Weaknesses: Relies on quality of text prompts; Performance might be lower than specialized closed-set models for known categories; Licensing (GPL/AGPL) limits commercial use.¹¹⁸ Robustness to OOD/corruptions needs further study.⁴¹
Use Cases: Dynamic environments where object classes are not known beforehand; Interactive detection systems; Automated data labeling/bootstrapping ⁶⁷; Applications requiring flexibility without retraining (e.g., surveillance with changing targets, robotics in novel environments).⁶⁷

EfficientDet:

Strengths: High efficiency (good accuracy for Params/FLOPs) ⁹; Scalable architecture (D0-D7x) ³; Effective multi-scale feature fusion (BiFPN).⁹
Weaknesses: Can be slower than latest YOLOs/RT-DETRs on GPU ⁸¹; Anchor-based design.¹⁹³
Use Cases: Mobile/edge deployment (Lite versions) ³; Applications needing good accuracy-efficiency balance where Apache 2.0 license is preferred.

SSD:

Strengths: Very fast inference speed (single-stage) ⁸; Simple architecture.¹⁹
Weaknesses: Generally lower accuracy than two-stage or newer single-stage models, especially for small objects.¹¹
Use Cases: Resource-constrained environments where speed is paramount and accuracy requirements are moderate.²

Faster R-CNN:

Strengths: Historically high accuracy, good baseline ¹¹; Two-stage design can be robust for localization.
Weaknesses: Relatively slow inference speed compared to single-stage models.⁸ More complex architecture.
Use Cases: Offline processing where accuracy is prioritized over speed; Research baseline.

4. Use Case Suitability Analysis

Selecting the optimal object detection model depends heavily on the specific requirements of the application, balancing accuracy, speed, resource constraints, robustness needs, and licensing considerations.

Real-Time Applications (High FPS needed): Models from the YOLO family (v8-v12), particularly the smaller variants (n, s, t), consistently demonstrate the highest FPS on both GPU and CPU, making them prime candidates.⁴ RT-DETR and RF-DETR also achieve real-time speeds (>30 FPS, often >100 FPS) on GPUs, especially with TensorRT optimization, offering a high-accuracy alternative.⁴ SSD and EfficientDet-Lite are also very fast, particularly on edge devices.² Faster R-CNN is generally too slow for real-time use.¹¹
Edge Computing (Resource Constrained): Lightweight models are essential. YOLO nano/tiny/small variants (v8-v12) are highly optimized for edge deployment (e.g., Jetson Orin, Coral TPU) due to low parameter counts and FLOPs, combined with efficient export formats (TFLite, TensorRT INT8).² EfficientDet-Lite models are specifically designed for edge TPUs.³ RT-DETR variants (e.g., R18/R50) can run on Jetson Orin, achieving reasonable FPS with optimization, offering a transformer alternative.² RF-DETR Base is also positioned for edge use.⁷⁵ SSD MobileNet/MobileDet are classic lightweight options.²
High Accuracy Required: When peak accuracy is paramount and computational resources are less constrained, larger models are preferred. Large DETR variants (RT-DETR-X, RF-DETR-Large, DINO-DETR Swin-L) often achieve the highest mAP scores on benchmarks like COCO.⁴ Large YOLO variants (v9e, v11x, v12x, v8x) are also highly competitive, sometimes surpassing DETRs in mAP while being more efficient.⁴ EfficientDet-D7x is another high-accuracy option.⁶¹ Faster R-CNN remains a strong baseline for accuracy.¹¹
Small Object Detection: This remains a challenging area.³⁹ DETR variants, particularly Deformable DETR and RT-DETR, were designed with multi-scale feature handling and attention mechanisms that theoretically benefit small object detection.¹⁵⁸ Some studies suggest DETR variants outperform YOLOs here.³⁷ However, recent YOLO versions (YOLOv8, v9, v11, v12) have also incorporated architectural improvements (e.g., finer-grained feature maps, attention) aimed at improving small object performance.²⁰ Specific benchmarks like VisDrone are often used to evaluate this capability.²¹ The choice depends on the specific dataset and the required accuracy-speed trade-off.
Open-Vocabulary Needs: For applications requiring detection of objects not defined during training, YOLO-World is the primary real-time option discussed, offering flexibility via text prompts.¹¹⁸ Its “prompt-then-detect” paradigm makes it efficient for dynamic scenarios.¹¹⁸ Other OVD models like Grounding DINO exist but are generally slower.⁸⁴
Commercial Use (Licensing): Licensing is a critical non-technical factor. RT-DETR, RF-DETR, Deformable DETR, DINO-DETR, RTMDet, EfficientDet, SSD, Faster R-CNN (most implementations) use permissive licenses like Apache 2.0 or MIT, making them suitable for commercial products without requiring source code release.⁴⁹ In contrast, many recent YOLO models (v8, v10, v11, v12, YOLO-World) distributed by Ultralytics use the AGPL-3.0 license, which requires modifications to be open-sourced if the software is used over a network, effectively necessitating an enterprise license for most commercial deployments.⁵³ YOLOv9’s original license is GPL-3.0.⁵⁸ This makes DETR variants or older/alternatively licensed YOLOs more attractive for businesses unwilling or unable to comply with AGPL or purchase enterprise licenses.

5. Conclusion

The landscape of object detection is characterized by rapid innovation and a diversification of architectural approaches. This analysis reveals several key trends and trade-offs:

The YOLO vs. DETR Dichotomy: The field is largely shaped by the ongoing evolution of two dominant paradigms. The YOLO family continues its trajectory of optimizing single-stage CNNs for exceptional speed and efficiency, incorporating innovations like anchor-free designs, advanced feature aggregation (GELAN, R-ELAN), NMS-free training (YOLOv10), and attention mechanisms (YOLOv11, YOLOv12) to push the accuracy-speed frontier, particularly for real-time and edge applications. The DETR family, leveraging the power of Transformers, offers an elegant end-to-end solution, eliminating NMS and potentially achieving higher peak accuracy, especially in complex scenes or for small objects, through mechanisms like deformable attention and sophisticated query/training strategies. Variants like RT-DETR and RF-DETR have successfully bridged the gap to real-time performance on GPUs.
Performance Trade-offs: No single model reigns supreme across all metrics. YOLO models generally offer the best raw speed, especially smaller variants on diverse hardware including CPUs and edge devices. DETR variants often achieve higher peak accuracy (mAP50:95), particularly larger models on powerful GPUs, but typically come with higher computational costs (Params/FLOPs) and memory requirements, especially during training. The choice necessitates a careful balancing of accuracy requirements against latency budgets and available hardware resources. RF-DETR’s use of a DINOv2 backbone shows promise for domain adaptability, a crucial factor often overlooked in standard benchmarks.
Deployment and Usability: Practical deployment hinges on factors beyond raw metrics. The Ultralytics ecosystem provides significant ease-of-use advantages for the YOLO models it supports, streamlining training, validation, and export. DETR models, while powerful, can sometimes involve more complex setup and tuning. Export format support (ONNX, TensorRT, OpenVINO, TFLite) is critical for optimizing performance on target hardware, and support varies across models and frameworks.
Licensing Implications: The shift of many popular YOLO models (v8 onwards via Ultralytics) to the AGPL-3.0 license presents a major hurdle for commercial adoption without purchasing an enterprise license. Permissively licensed models like RT-DETR, RF-DETR, Deformable DETR, RTMDet, and EfficientDet offer a significant advantage for commercial applications due to their Apache 2.0 licensing.
Open-Vocabulary Future: YOLO-World demonstrates the feasibility of real-time open-vocabulary detection, merging the efficiency of YOLO with the flexibility of vision-language models. This “prompt-then-detect” approach opens up new possibilities for dynamic applications where object classes are not fixed a priori, though performance relative to specialized closed-set models needs consideration.

Future Directions: Research continues to focus on improving the accuracy-efficiency trade-off, particularly for small object detection and robustness to domain shifts and corruptions. Integrating attention mechanisms into efficient CNNs (as seen in recent YOLOs) and making Transformer models more lightweight and faster to converge remain active areas. The development of robust, efficient open-vocabulary models is also a key frontier.

Recommendation: Model selection should be driven by specific application needs. For maximum speed and efficiency, especially on edge devices, lightweight YOLO variants (considering licensing) are strong choices. For highest accuracy where resources allow, large DETR variants (RT-DETR-X, RF-DETR-Large, DINO-DETR) or large YOLO models (YOLOv9e, YOLO11x, YOLOv12x) are top contenders, with licensing often favoring the DETR family for commercial use. For dynamic scenarios requiring flexibility, YOLO-World offers a compelling real-time open-vocabulary solution. Thorough benchmarking on target hardware and datasets remains crucial for optimal selection.

Reference Number,Source Title and URL
[1],”Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices ArXiv alternative link summary (https://ouci.dntb.gov.ua/en/works/7BYjr0Z9/)”
[1],”Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices ArXiv alternative link summary (https://www.researchgate.net/publication/384367118_Benchmarking_Deep_Learning_Models_for_Object_Detection_on_Edge_Computing_Devices)”
[2],”EfficientDet object detection architecture Google Brain ArXiv (https://arxiv.org/pdf/2409.16808)”
[3],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://arxiv.org/abs/2502.12524)”
[4],”YOLOv8 RT-DETR pedestrian detection benchmark comparison KITTI CityPersons MOT (https://www.joig.net/2024/JOIG-V12N3-292.pdf)”
[5],”Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices ArXiv alternative link summary (https://arxiv.org/html/2403.02619v2?ref=labelvisor.com)”
[6],”object detection robustness occlusion clutter domain shift comparison YOLO DETR ArXiv (https://arxiv.org/html/2406.19407v5)”
[7],”Faster R-CNN architecture ArXiv (https://arxiv.org/pdf/1506.01497)”
[8],”SSD Single Shot MultiBox Detector architecture ArXiv (https://arxiv.org/abs/1512.02325)”
[9],”Object detection models comparison survey 2024 2025 ArXiv CVPR ICCV ECCV (https://arxiv.org/html/2503.19202v1)”
[10],”Faster R-CNN architecture ArXiv (https://www.thinkautonomous.ai/blog/faster-rcnn/)”
[11],”object detection robustness occlusion clutter domain shift comparison YOLO DETR ArXiv (https://www.researchgate.net/publication/381929610_YOLOv12_to_Its_Genesis_A_Decadal_and_Comprehensive_Review_of_The_You_Only_Look_Once_YOLO_Series)”
[12],”Faster R-CNN architecture ArXiv (https://developers.arcgis.com/python/latest/guide/faster-rcnn-object-detector/)”
[13],”YOLOR object detection architecture ArXiv (https://arxiv.org/html/2408.09332v1)”
[14],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://docs.ultralytics.com/datasets/detect/coco/)”
[15],”RT-DETR Jetson Orin FPS benchmark blog forum GitHub (https://github.com/lyuwenyu/RT-DETR)”
[16],”EfficientDet COCO benchmark mAP FPS Google Brain GitHub ArXiv (https://github.com/google/automl/blob/master/efficientdet/README.md)”
[17],”Faster R-CNN architecture ArXiv (https://www.digitalocean.com/community/tutorials/faster-r-cnn-explained-object-detection)”
[18],”SSD Single Shot MultiBox Detector architecture ArXiv (https://arxiv.org/abs/1512.02325)”
[19],”LW-DETR FLOPs parameters ArXiv (https://arxiv.org/abs/2406.03459)”
[20],”Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices ArXiv alternative link summary (https://arxiv.org/abs/2409.16808)”
[21],”Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices ArXiv alternative link summary (https://www.researchgate.net/publication/384367118_Benchmarking_Deep_Learning_Models_for_Object_Detection_on_Edge_Computing_Devices)”
[22],”YOLOv9 architecture ArXiv (https://arxiv.org/abs/2409.07813)”
[23],”VisDrone object detection benchmark leaderboard YOLO RT-DETR (https://paperswithcode.com/sota/real-time-object-detection-on-coco)”
[24],”object detection small object performance comparison YOLO DETR ArXiv CVPR (https://www.researchgate.net/publication/380821008_YOLOv10_Real-Time_End-to-End_Object_Detection)”
[25],”SSD Single Shot MultiBox Detector architecture ArXiv (https://arxiv.org/pdf/1512.02325)”
[26],”Describe the Deformable DETR architecture: backbone (ResNet), encoder/decoder structure, key innovation (Deformable Attention Module), comparison to DETR (convergence speed, small object performance), parameters/FLOPs (R50), and license. (https://ar5iv.labs.arxiv.org/html/2104.01318)”
[27],”SSD Single Shot MultiBox Detector architecture ArXiv (https://www.researchgate.net/publication/373392107_The_Current_Trends_of_Object_Detection_Algorithms_A_Review)”
[28],”SSD with multi-scale feature fusion and attention mechanism – PMC (https://pmc.ncbi.nlm.nih.gov/articles/PMC10695922/)”
[29],”Deformable DETR FLOPs parameters ArXiv (https://arxiv.org/html/2308.01300v2)”
[30],”object detection edge device benchmark Jetson Orin FPS YOLO RT-DETR RF-DETR (https://www.digitalocean.com/community/tutorials/yolov8-a-revolutionary-advancement-in-object-detection-2)”
[31],”YOLO-World Real-Time Open-Vocabulary Object Detection cited by COCO evaluation benchmark (https://cvpr.thecvf.com/virtual/2024/poster/30009)”
[27],”Describe the architecture of RT-DETR as mentioned in this RT-DETRv3 paper. Focus on the original RT-DETR components: backbone (lightweight CNN), Efficient Hybrid Encoder, IoU-aware query selection, decoder, end-to-end nature (Hungarian matching, NMS-free), and loss functions. (https://arxiv.org/pdf/2409.08475)”
[32],”RT-DETR object detection architecture ArXiv Baidu PaddlePaddle (https://arxiv.org/html/2409.08475v3)”
[33],”DINO-DETR vs RT-DETR vs YOLO benchmark comparison ArXiv (https://www.reddit.com/r/MachineLearning/comments/12rywz6/rdetrs_beat_yolos_on_realtime_object_detection/)”
[34],”Describe the SSD architecture: key concept (single-shot), base network, auxiliary convolutional layers for multi-scale detection, default boxes (anchors), prediction mechanism (scores + offsets), and use of NMS. (https://arxiv.org/pdf/1512.02325)”
[35],”YOLO vs DETR robustness COCO-C domain adaptation comparison ArXiv (https://arxiv.org/html/2412.18718v1)”
[36],”object detection robustness occlusion clutter domain shift comparison YOLO DETR ArXiv (https://openaccess.thecvf.com/content/CVPR2024/papers/Cheng_YOLO-World_Real-Time_Open-Vocabulary_Object_Detection_CVPR_2024_paper.pdf)”
[37],”VisDrone object detection benchmark leaderboard YOLO RT-DETR (https://www.researchgate.net/figure/Comparison-of-YOLOv8midand-MPE-YOLOrighton-the-RSOD-VisDrone-dataset_fig6_382796923)”
[38],”SSD Single Shot MultiBox Detector architecture ArXiv (https://research.google.com/pubs/archive/44872.pdf)”
[39],”COCO-C robustness benchmark YOLO DETR comparison ArXiv (https://arxiv.org/html/2405.14874v2)”
[40],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://arxiv.org/html/2504.11995v1)”
[41],”RT-DETR Deformable DETR COCO-C robustness benchmark mAP_C (https://www.engineeringletters.com/issues_v33/issue_1/EL_33_1_15.pdf)”
[42],”small object detection comparison YOLO DETR review ArXiv VisDrone COCO AP_S (https://www.researchgate.net/publication/380637118_Better_Sampling_towards_Better_End-to-end_Small_Object_Detection)”
[43],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://arxiv.org/abs/2304.08069)”
[44],”COCO-C robustness benchmark YOLO DETR comparison ArXiv (https://arxiv.org/html/2412.18718v1)”
[45],”YOLOv9 COCO benchmark mAP FPS ArXiv GitHub (https://docs.ultralytics.com/models/yolov9/)”
[46],”YOLOv8 architecture Ultralytics ArXiv (https://www.researchgate.net/publication/388354012_YOLOv8_to_YOLO11_A_Comprehensive_Architecture_In-depth_Comparative_Review)”
[47],”RT-DETR Deformable DETR COCO-C robustness benchmark mAP_C (https://www.tandfonline.com/doi/full/10.1080/15440478.2025.2476634?src=exp-la)”
[48],”RT-DETR COCO benchmark mAP FPS PaddlePaddle GitHub ArXiv (https://github.com/lyuwenyu/RT-DETR)”
[48],”YOLO-World COCO benchmark mAP FPS GitHub Tencent ArXiv (https://github.com/zsxkib/cog-yolo-world)”
[49],”object detection model training convergence comparison YOLO DETR ArXiv (https://arxiv.org/html/2409.08475v3)”
[50],”object detection ease of implementation deployment comparison YOLO DETR Ultralytics Roboflow (https://arxiv.org/html/2409.08475v3)”
[51],”RF-DETR object detection architecture ArXiv (https://arxiv.org/abs/2504.13099)”
[52],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://pmc.ncbi.nlm.nih.gov/articles/PMC11629979/)”
[49],”RT-DETR COCO benchmark mAP FPS PaddlePaddle GitHub ArXiv (https://arxiv.org/html/2409.08475v3)”
[36],”YOLOv9 architecture ArXiv (https://arxiv.org/abs/2409.07813)”
[31],”YOLO-World Real-Time Open-Vocabulary Object Detection cited by COCO evaluation benchmark (https://cvpr.thecvf.com/virtual/2024/poster/30009)”
[36],”YOLOv8 COCO benchmark mAP FPS Ultralytics GitHub (https://github.com/ultralytics/ultralytics/blob/main/docs/en/models/yolov8.md)”
[53],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://arxiv.org/html/2410.17725v1)”
[54],”EfficientDet COCO benchmark mAP FPS Google Brain GitHub ArXiv (https://github.com/google/automl/blob/master/efficientdet/README.md)”
[55],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://arxiv.org/html/2409.16808v1)”
[56],”YOLOv8 RT-DETR RF-DETR Coral Edge TPU benchmark FPS (https://www.researchgate.net/figure/Performance-values-of-YOLOv8-YOLOv9-YOLOv10-and-RT-DETR_tbl1_382961382)”
[27],”YOLO-World Real-Time Open-Vocabulary Object Detection cited by COCO evaluation benchmark (https://github.com/AILab-CVC/YOLO-World)”
[57],”YOLO vs DETR robustness COCO-C domain adaptation comparison ArXiv (https://arxiv.org/html/2412.12349v1)”
[58],”PapersWithCode COCO object detection benchmark (https://paperswithcode.com/sota/object-detection-on-coco-1)”
[59],”small object detection comparison YOLO DETR review ArXiv VisDrone COCO AP_S (https://www.mdpi.com/2504-446X/8/12/713)”
[60],”Tencent AI Lab CVC YOLO-World COCO performance evaluation blog post talk (https://openlibrary.telkomuniversity.ac.id/pustaka/files/219195/abstraksi/document-analysis-and-recognition-icdar-2023-17th-international-conference-san-jos-ca-usa-august-21-26-2023-proceedings-part-ii.pdf)”
[58],”What are the reported COCO AP_S (small object AP) values for models like YOLOv8, YOLOv9, RT-DETR, or RF-DETR listed on this benchmark page? (https://paperswithcode.com/sota/object-detection-on-coco-1)”
[61],”COCO-C robustness benchmark YOLO DETR comparison ArXiv (https://arxiv.org/html/2412.13490v1)”
[62],”YOLOv8 YOLOv9 COCO-C robustness benchmark mAP_C (https://yolov8.org/yolov8-coco-dataset/)”
[63],”Deformable DETR FLOPs parameters ArXiv (https://arxiv.org/html/2405.17677v1)”
[23],”COCO AP_S benchmark YOLOv8 YOLOv9 RT-DETR RF-DETR (https://paperswithcode.com/sota/real-time-object-detection-on-coco)”
[64],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://docs.ultralytics.com/models/yolo11/)”
[65],”small object detection comparison YOLO DETR review ArXiv VisDrone COCO AP_S (https://arxiv.org/pdf/2203.04799)”
[66],”RT-DETR Jetson Orin FPS benchmark blog forum GitHub (https://forums.developer.nvidia.com/t/problem-with-running-jetson-benchmarks-from-github/144581)”
[67],”Deformable DETR FLOPs parameters ArXiv (https://arxiv.org/pdf/2403.10913)”
[68],”DINOv2 backbone FLOPs ViT-L ViT-B (https://openaccess.thecvf.com/content/WACV2024/supplemental/Berrada_Guided_Distillation_for_WACV_2024_supplemental.pdf)”
[33],”RF-DETR COCO benchmark mAP FPS Roboflow GitHub (https://blog.roboflow.com/train-rf-detr-on-a-custom-dataset/)”
[69],”Object detection models comparison survey 2024 2025 ArXiv CVPR ICCV ECCV (https://www.sciopen.com/article/10.26599/BDMA.2024.9020098?issn=2096-0654)”
[70],”PapersWithCode COCO object detection benchmark (https://paperswithcode.com/datasets?q=coco&v=lst&o=match&task=object-detection&page=1)”
[71],”RT-DETR Deformable DETR COCO-C robustness benchmark mAP_C (https://pmc.ncbi.nlm.nih.gov/articles/PMC11280984/)”
[72],”YOLOv9 FLOPs parameters (https://www.ikomia.ai/blog/train-yolov9-custom-object-detection-guide)”
[73],”RT-DETR COCO benchmark mAP FPS PaddlePaddle GitHub ArXiv (https://docs.ultralytics.com/models/rtdetr/)”
[74],”RT-DETR object detection architecture ArXiv Baidu PaddlePaddle (https://debuggercafe.com/rt-detr/)”
[75],”RF-DETR object detection architecture ArXiv (https://roboflow.com/model/rf-detr)”
[33],”RF-DETR Jetson Orin benchmark FPS performance Roboflow (https://blog.roboflow.com/train-rf-detr-on-a-custom-dataset/)”
[76],”RF-DETR Jetson Orin FPS benchmark blog forum GitHub (https://github.com/roboflow/rf-detr/issues)”
[77],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://openaccess.thecvf.com/content/ICCV2023/papers/Mao_COCO-O_A_Benchmark_for_Object_Detectors_under_Natural_Distribution_Shifts_ICCV_2023_paper.pdf)”
[78],”RT-DETR Deformable DETR COCO-C robustness benchmark mAP_C (https://www.mdpi.com/1424-8220/24/13/4262)”
[79],”YOLOv8 FLOPs parameters (https://huggingface.co/Ultralytics/YOLOv8)”
[80],”EfficientDet object detection architecture Google Brain ArXiv (https://arxiv.org/pdf/2409.16808)”
[81],”DINOv2 backbone FLOPs ViT-L ViT-B (https://arxiv.org/html/2403.13043v2)”
[7],”Faster R-CNN architecture ArXiv (https://arxiv.org/pdf/1506.01497)”
[61],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://arxiv.org/html/2412.13490v1)”
[82],”COCO-C robustness benchmark YOLO DETR comparison ArXiv (https://arxiv.org/html/2405.14874v2)”
[57],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://arxiv.org/html/2412.12349v1)”
[83],”VisDrone object detection benchmark leaderboard YOLO RT-DETR (https://arxiv.org/html/2404.16944v1)”
[84],”VisDrone object detection benchmark leaderboard YOLO RT-DETR (https://github.com/ultralytics/ultralytics/issues/16380)”
[85],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://lnu.diva-portal.org/smash/get/diva2:1939025/FULLTEXT01.pdf)”
[86],”YOLOv9 FLOPs parameters (https://viso.ai/computer-vision/yolov9/)”
[87],”DINOv2 backbone FLOPs ViT-L ViT-B (https://openreview.net/forum?id=Bf6WFWNCUP&noteId=tiAWYwVQNO)”
[73],”COCO AP_S benchmark YOLOv8 YOLOv9 RT-DETR RF-DETR (https://docs.ultralytics.com/models/rtdetr/)”
[27],”Tencent AI Lab CVC YOLO-World COCO performance evaluation blog post talk (https://github.com/AILab-CVC/YOLO-World)”
[88],”VisDrone object detection benchmark leaderboard YOLO RT-DETR (https://www.mdpi.com/2079-9292/13/24/5014)”
[89],”Roboflow RF-DETR FLOPs technical details blog post (https://learnopencv.com/rf-detr-object-detection/)”
[90],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://arxiv.org/html/2504.11995v1)”
[91],”Tencent AI Lab CVC YOLO-World COCO performance evaluation blog post talk (https://blog.roboflow.com/how-to-detect-objects-with-yolo-world/)”
[92],”RF-DETR FLOPs parameters (https://github.com/facebookresearch/detr)”
[93],”RT-DETR Jetson Orin FPS benchmark blog forum GitHub (https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_rtdetr_graph-orin_nx.json)”
[94],”small object detection comparison YOLO DETR review ArXiv VisDrone COCO AP_S (https://www.techscience.com/cmc/v78n3/55942/html)”
[22],”open vocabulary object detection use cases YOLO-World (https://docs.ultralytics.com/models/yolo-world/)”
[22],”YOLO-World Real-Time Open-Vocabulary Object Detection cited by COCO evaluation benchmark (https://docs.ultralytics.com/models/yolo-world/)”
[95],”object detection model training convergence comparison YOLO DETR ArXiv (https://arxiv.org/html/2504.13099v1)”
[96],”YOLO-World object detection architecture Tencent ArXiv (https://arxiv.org/html/2412.20645v1/)”
[97],”RT-DETR Deformable DETR COCO-C robustness benchmark mAP_C (https://www.engineeringletters.com/issues_v33/issue_1/EL_33_1_15.pdf)”
[98],”RT-DETR Deformable DETR COCO-C robustness benchmark mAP_C (https://paperswithcode.com/paper/deformable-detr-deformable-transformers-for-1)”
[61],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://arxiv.org/html/2412.13490v1)”
[27],”YOLO-World COCO benchmark mAP FPS GitHub Tencent ArXiv (https://github.com/AILab-CVC/YOLO-World)”
[99],”COCO-C robustness benchmark YOLO DETR comparison ArXiv (https://arxiv.org/html/2405.14874v3)”
[100],”YOLO-World fine-tuned COCO benchmark mAP FPS (https://arxiv.org/html/2409.07907v1)”
[101],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://docs.ultralytics.com/compare/rtdetr-vs-yolov9/)”
[102],”Faster R-CNN architecture ArXiv (https://developers.arcgis.com/python/latest/guide/faster-rcnn-object-detector/)”
[103],”object detection robustness occlusion clutter domain shift comparison YOLO DETR ArXiv (https://arxiv.org/html/2504.09480v1)”
[27],”YOLO-World object detection architecture Tencent ArXiv (https://github.com/AILab-CVC/YOLO-World)”
S_R165,”pumpkin seeds magnesium zinc iron copper brain function (https://healthyhomefront.com/stay-well/check-out-these-foods-to-help-keep-your-brain-and-memory-strong/)”
[104],”VisDrone object detection benchmark leaderboard YOLO RT-DETR (https://www.mdpi.com/2504-446X/9/2/143)”
[105],”object detection small object performance comparison YOLO DETR ArXiv CVPR (https://pmc.ncbi.nlm.nih.gov/articles/PMC11598026/)”
[106],”YOLO-World Real-Time Open-Vocabulary Object Detection ArXiv paper (https://arxiv.org/html/2408.11221v1)”
[49],”RT-DETR object detection architecture ArXiv Baidu PaddlePaddle (https://arxiv.org/html/2409.08475v3)”
[107],”Describe the YOLOv8 architecture: backbone (CSPDarknet variant?), neck (PANet variant?), head (anchor-free?), key features (unified framework for detection, segmentation, pose, classification), strengths (ease of use, performance balance, ecosystem). (https://docs.ultralytics.com/compare/yolov8-vs-damo-yolo/)”
[101],”YOLOv8 architecture Ultralytics ArXiv (https://docs.ultralytics.com/compare/yolov8-vs-damo-yolo/)”
[89],”RF-DETR COCO benchmark mAP FPS Roboflow GitHub (https://learnopencv.com/rf-detr-object-detection/)”
[108],”object detection ease of implementation deployment comparison YOLO DETR Ultralytics Roboflow (https://docs.ultralytics.com/compare/rtdetr-vs-yolov9/)”
[73],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://www.seeedstudio.com/blog/2023/03/30/yolov8-performance-benchmarks-on-nvidia-jetson-devices/)”
[61],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://docs.ultralytics.com/compare/rtdetr-vs-yolov9/)”
[109],”YOLOv8 FLOPs parameters (https://www.digitalocean.com/community/tutorials/yolov8-a-revolutionary-advancement-in-object-detection-2)”
[110],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://docs.ultralytics.com/compare/rtdetr-vs-yolov9/)”
[111],”Faster R-CNN architecture ArXiv (https://arxiv.org/pdf/1504.08083)”
[22],”YOLO-World COCO LVIS benchmark mAP FPS GitHub Tencent (https://docs.ultralytics.com/models/yolo-world/)”
[112],”RF-DETR FLOPs Roboflow GitHub (https://github.com/roboflow/rf-detr)”
[113],”RT-DETR FLOPs parameters (https://pmc.ncbi.nlm.nih.gov/articles/PMC11903742/)”
[75],”RF-DETR FLOPs parameters (https://roboflow.com/model/rf-detr)”
[114],”Describe the YOLOv9 architecture: key innovations (PGI, GELAN), backbone (GELAN integration), neck (PGI integration), head (anchor-free?), comparison to YOLOv8 (parameter/computation reduction, accuracy improvement). (https://arxiv.org/abs/2409.07813)”
[115],”YOLO vs DETR training convergence speed data augmentation comparison ArXiv (https://arxiv.org/pdf/2409.08475)”
[116],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://www.researchgate.net/publication/373392107_The_Current_Trends_of_Object_Detection_Algorithms_A_Review)”
[117],”small object detection comparison YOLO DETR review ArXiv VisDrone COCO AP_S (https://arxiv.org/html/2501.01855v2)”
[95],”RF-DETR object detection architecture ArXiv (https://arxiv.org/html/2504.13099v1)”
[118],”YOLOv8 YOLOv9 RT-DETR OpenVINO benchmark FPS latency comparison (https://arxiv.org/html/2412.13490v2)”
[119],”LW-DETR FLOPs parameters ArXiv (https://openaccess.thecvf.com/content/CVPR2023/papers/Xu_Q-DETR_An_Efficient_Low-Bit_Quantized_Detection_Transformer_CVPR_2023_paper.pdf)”
[54],”small object detection comparison YOLO DETR review ArXiv VisDrone COCO AP_S (https://www.researchgate.net/publication/383436528_Drone-DETR_Efficient_Small_Object_Detection_for_Remote_Sensing_Image_Using_Enhanced_RT-DETR_Model)”
[115],”RT-DETR object detection architecture ArXiv Baidu PaddlePaddle (https://arxiv.org/pdf/2409.08475)”
[73],”YOLOv8 RT-DETR RF-DETR Coral Edge TPU benchmark FPS (https://docs.ultralytics.com/models/rtdetr/)”
[120],”YOLOv9 vs RT-DETR small object detection COCO AP_S VisDrone benchmark ArXiv CVPR (https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_DETRs_Beat_YOLOs_on_Real-time_Object_Detection_CVPR_2024_paper.pdf)”
[121],”DINO-DETR vs RT-DETR vs YOLO benchmark comparison ArXiv (https://arxiv.org/html/2503.01601v1)”
[122],”YOLOv8 RT-DETR pedestrian detection benchmark comparison KITTI CityPersons MOT (https://github.com/shefalishr95/Object-detection-using-YOLOv8-and-RT-DETR)”
[27],”Are there specific COCO benchmark results (mAP, FPS, hardware) reported for YOLO-World models (S, M, L) in this README? Look for tables comparing performance on COCO val2017 or test-dev. (https://github.com/AILab-CVC/YOLO-World)”
[123],”RF-DETR object detection architecture ArXiv (https://blog.roboflow.com/rf-detr/)”
[112],”COCO AP_S benchmark YOLOv8 YOLOv9 RT-DETR RF-DETR (https://github.com/roboflow/rf-detr)”
[95],”YOLO vs DETR robustness COCO-C domain adaptation comparison ArXiv (https://arxiv.org/html/2504.13099v1)”
[124],”YOLO vs DETR training convergence speed data augmentation comparison ArXiv (https://docs.ultralytics.com/compare/rtdetr-vs-yolox/)”
[125],”LW-DETR FLOPs parameters ArXiv (https://ojs.aaai.org/index.php/AAAI/article/view/29487/30803)”
[126],”Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices ArXiv alternative link summary (https://www.mdpi.com/1424-8220/22/11/4205)”
[127],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://docs.ultralytics.com/guides/yolo-performance-metrics/)”
[128],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://arxiv.org/html/2411.00201v4)”
[129],”Deformable DETR FLOPs parameters ArXiv (https://github.com/fundamentalvision/Deformable-DETR)”
[130],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://www.mdpi.com/2078-2489/15/8/469)”
[45],”YOLOv8 YOLOv9 COCO-C robustness benchmark mAP_C (https://docs.ultralytics.com/models/yolov9/)”
[131],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://arxiv.org/abs/2304.08069)”
[6],”EfficientDet COCO benchmark mAP FPS Google Brain GitHub ArXiv (https://arxiv.org/html/2406.19407v5)”
[132],”YOLOv8 vs RT-DETR reddit discussion experience fine-tuning deployment (https://www.reddit.com/r/computervision/comments/1h35tpi/whats_the_fastest_object_detection_model/)”
[49],”DINO-DETR vs RT-DETR vs YOLO benchmark comparison ArXiv (https://arxiv.org/html/2409.08475v3)”
[58],”What are the reported COCO object detection benchmark results (Box AP / mAP) and FPS (if available, specify hardware) for recent models like RF-DETR, RT-DETR, YOLOv8, YOLOv9, YOLO-World, EfficientDet, Faster R-CNN, SSD, and other SOTA models? (https://paperswithcode.com/sota/object-detection-on-coco-1)”
[133],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://www.mdpi.com/2504-4990/5/4/83)”
[134],”Tencent AI Lab CVC YOLO-World COCO performance evaluation blog post talk (https://blog.roboflow.com/what-is-yolo-world/)”
[135],”YOLO-World COCO benchmark evaluation comparison mAP FPS (https://www.researchgate.net/figure/FPS-vs-mAP-of-Different-Methods-on-COCO-2017-Validation-Dataset_fig8_369221038)”
[136],”EfficientDet object detection architecture Google Brain ArXiv (https://ar5iv.labs.arxiv.org/html/1911.09070)”
[137],”RT-DETR FLOPs parameters (https://docs.ultralytics.com/compare/rtdetr-vs-yolov6/)”
[6],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://arxiv.org/html/2406.19407v5)”
[138],”YOLO-World fine-tuned COCO benchmark mAP FPS (https://github.com/yukaryote/darknet)”
[139],”YOLOv8 RT-DETR pedestrian detection benchmark comparison KITTI CityPersons MOT (http://thesis.univ-biskra.dz/6587/1/Thesis_Doctorate%20in%20Sciences_CS_Khebbache_FinalVersion.pdf)”
[140],”DINOv2 backbone FLOPs ViT-L ViT-B (https://arxiv.org/html/2309.16588v2)”
[141],”small object detection comparison YOLO DETR review ArXiv VisDrone COCO AP_S (https://pmc.ncbi.nlm.nih.gov/articles/PMC11207305/)”
[142],”VisDrone object detection benchmark leaderboard YOLO RT-DETR (https://paperswithcode.com/sota/object-detection-on-visdrone-det2019-1)”
[120],”DINO-DETR vs RT-DETR vs YOLO benchmark comparison ArXiv (https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_DETRs_Beat_YOLOs_on_Real-time_Object_Detection_CVPR_2024_paper.pdf)”
[143],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://paperswithcode.com/paper/yolov11-an-overview-of-the-key-architectural)”
[127],”YOLO-World fine-tuned COCO benchmark mAP FPS (https://docs.ultralytics.com/guides/yolo-performance-metrics/)”
[144],”RT-DETR FLOPs parameters (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0317114)”
[145],”RT-DETR object detection architecture ArXiv Baidu PaddlePaddle (https://developers.arcgis.com/python/latest/guide/rt-detrv2-object-detector/)”
[146],”YOLOv8 YOLOv9 RT-DETR OpenVINO benchmark FPS latency comparison (https://huggingface.co/blog/samuellimabraz/signature-detection-model)”
[147],”YOLOv8 vs RT-DETR reddit discussion experience fine-tuning deployment (https://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=13716&context=etd)”
[148],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://docs.ultralytics.com/compare/yolov9-vs-rtdetr/)”
[89],”COCO AP_S benchmark YOLOv8 YOLOv9 RT-DETR RF-DETR (https://learnopencv.com/rf-detr-object-detection/)”
[149],”Describe the detailed architecture of YOLOv9: key innovations (PGI, GELAN), backbone (GELAN integration), neck (PGI integration), head (anchor-free?), comparison to YOLOv8 (parameter/computation reduction, accuracy improvement), and annotation format. (https://arxiv.org/html/2409.07813v1)”
[150],”RT-DETR Jetson Orin benchmark FPS performance (https://arxiv.org/html/2412.02171v1)”
[134],”YOLO-World Real-Time Open-Vocabulary Object Detection cited by COCO evaluation benchmark (https://blog.roboflow.com/what-is-yolo-world/)”
[151],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://montague.law/blog/open-source-licenses-to-avoid/)”
[152],”EfficientDet object detection architecture Google Brain ArXiv (https://docs.ultralytics.com/compare/efficientdet-vs-damo-yolo/)”
[89],”RT-DETR Deformable DETR COCO-C robustness benchmark mAP_C (https://learnopencv.com/rf-detr-object-detection/)”
[148],”YOLO vs DETR object detection comparison applications autonomous driving surveillance robotics (https://docs.ultralytics.com/compare/yolov9-vs-rtdetr/)”
[153],”object detection model selection guide use cases accuracy speed (https://www.hitechbpo.com/blog/top-object-detection-models.php)”
[95],”COCO-C robustness benchmark YOLO DETR comparison ArXiv (https://arxiv.org/html/2504.13099v1)”
[95],”RT-DETR Deformable DETR COCO-C robustness benchmark mAP_C (https://arxiv.org/html/2504.13099v1)”
[41],”YOLOv8 YOLOv9 COCO-C robustness benchmark mAP_C (https://arxiv.org/html/2411.00201v2)”
[147],”YOLOv8 RT-DETR Jetson Orin community benchmark comparison FPS (https://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=13716&context=etd)”
[154],”YOLOv8 RT-DETR pedestrian detection benchmark comparison KITTI CityPersons MOT (https://www.researchgate.net/figure/The-first-set-of-visual-comparison-results-between-RT-DETR-left-and-RT-DETR-improved_fig2_390311220)”
[55],”EfficientDet COCO benchmark mAP FPS Google Brain GitHub ArXiv (https://arxiv.org/html/2409.16808v1)”
[155],”YOLOv8 COCO benchmark mAP FPS Ultralytics GitHub (https://github.com/ultralytics/ultralytics/blob/main/docs/en/models/yolov8.md)”
[148],”YOLOv9 vs RT-DETR small object detection COCO AP_S VisDrone benchmark ArXiv CVPR (https://docs.ultralytics.com/compare/yolov9-vs-rtdetr/)”
[61],”YOLO vs DETR training GPU memory requirement comparison (https://arxiv.org/html/2412.13490v1)”
[123],”RF-DETR COCO benchmark mAP FPS Roboflow GitHub (https://blog.roboflow.com/rf-detr/)”
[123],”Describe the RF-DETR architecture based on Deformable DETR. What is the backbone (DINOv2)? How does it differ from Deformable DETR (e.g., single-scale features)? What are the model variants (Base/Large parameters)? What is the claimed COCO mAP and FPS (mention hardware if specified)? (https://blog.roboflow.com/rf-detr/)”
[156],”YOLOv9 COCO benchmark mAP FPS ArXiv GitHub (https://www.restack.io/p/real-time-ai-inference-answer-images-github-cat-ai)”
[75],”Roboflow RF-DETR FLOPs technical details blog post (https://roboflow.com/model/rf-detr)”
[55],”object detection edge device benchmark Jetson Orin FPS YOLO RT-DETR RF-DETR (https://arxiv.org/html/2409.16808v1)”
[95],”RF-DETR COCO benchmark mAP FPS Roboflow GitHub (https://arxiv.org/html/2504.13099v1)”
[123],”COCO AP_S benchmark YOLOv8 YOLOv9 RT-DETR RF-DETR (https://blog.roboflow.com/rf-detr/)”
[157],”YOLO vs DETR training GPU memory requirement comparison (https://arxiv.org/html/2406.03459v1)”
[157],”LW-DETR FLOPs parameters ArXiv (https://arxiv.org/html/2406.03459v1)”
[158],”Roboflow RF-DETR FLOPs technical details blog post (https://www.analyticsvidhya.com/blog/2025/03/roboflows-rf-detr/)”
[159],”RT-DETR FLOPs parameters (https://docs.ultralytics.com/compare/rtdetr-vs-yolov10/)”
[160],”object detection model selection guide use cases accuracy speed (https://mobidev.biz/blog/object-detection-recognition-tracking-guide-use-cases-approaches)”
[161],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://github.com/IRCVLab/YOLOv8-for-Jetson-Orin)”
[90],”YOLO vs DETR robustness COCO-C domain adaptation comparison ArXiv (https://arxiv.org/html/2504.11995v1)”
[147],”RT-DETR Jetson Orin benchmark FPS performance (https://ir.lib.uwo.ca/cgi/viewcontent.cgi?article=13716&context=etd)”
[112],”RF-DETR Jetson Orin benchmark FPS performance Roboflow (https://github.com/roboflow/rf-detr)”
[55],”YOLOv8 RT-DETR RF-DETR Coral Edge TPU benchmark FPS (https://arxiv.org/html/2409.16808v1)”
[101],”object detection ease of implementation deployment comparison YOLO DETR Ultralytics Roboflow (https://docs.ultralytics.com/compare/rtdetr-vs-yolov9/)”
[162],”object detection small object performance comparison YOLO DETR ArXiv CVPR (https://cvpr.thecvf.com/virtual/2024/poster/31301)”
[163],”object detection model training convergence comparison YOLO DETR ArXiv (https://arxiv.org/html/2402.16370v1)”
[164],”SSD Single Shot MultiBox Detector architecture ArXiv (http://arxiv.org/pdf/1512.02325)”
[165],”YOLO-World fine-tuned COCO benchmark mAP FPS (https://github.com/AILab-CVC/YOLO-World/blob/master/docs/finetuning.md)”
[166],”DINO-DETR vs RT-DETR vs YOLO benchmark comparison ArXiv (https://arxiv.org/html/2304.08069v3)”
[167],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://www.researchgate.net/publication/389548112_Improved_YOLOv12_with_LLM-Generated_Synthetic_Data_for_Enhanced_Apple_Detection_and_Benchmarking_Against_YOLOv11_and_YOLOv10)”
[168],”YOLOv8 YOLOv9 RT-DETR OpenVINO benchmark FPS latency comparison (https://arxiv.org/html/2501.05885v1)”
[169],”Object detection models comparison survey 2024 2025 ArXiv CVPR ICCV ECCV (https://www.ibm.com/think/topics/object-detection)”
[170],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://www.mdpi.com/2226-4310/12/4/356)”
[171],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://github.com/ultralytics/ultralytics)”
[158],”RF-DETR FLOPs parameters (https://www.analyticsvidhya.com/blog/2025/03/roboflows-rf-detr/)”
[172],”YOLOv8 RT-DETR RF-DETR Coral Edge TPU benchmark FPS (https://coral.ai/docs/edgetpu/benchmarks/)”
[75],”RF-DETR Jetson Orin benchmark FPS performance Roboflow (https://roboflow.com/model/rf-detr)”
[171],”YOLOv8 COCO benchmark mAP FPS Ultralytics GitHub (https://github.com/ultralytics/ultralytics)”
[112],”Roboflow RF-DETR FLOPs technical details blog post (https://github.com/roboflow/rf-detr)”
[173],”open vocabulary object detection use cases YOLO-World (https://www.visionplatform.ai/yolo-world/)”
[174],”YOLO vs DETR object detection comparison applications autonomous driving surveillance robotics (https://www.labelvisor.com/applications-of-yolov10-in-object-detection/)”
[175],”object detection ease of implementation deployment comparison YOLO DETR Ultralytics Roboflow (https://roboflow.com/model-alternatives/yolov8)”
[176],”RT-DETR Jetson Orin FPS benchmark blog forum GitHub (https://github.com/orgs/ultralytics/discussions/13117)”
[177],”YOLOv9 COCO benchmark mAP FPS ArXiv GitHub (https://arxiv.org/html/2411.11477v3)”
[178],”What are the reported inference performance benchmarks (FPS or latency) for object detection models (like YOLO variants, ResNet-50, RetinaNet, SSD, EfficientDet, DETR variants if available) on NVIDIA Jetson hardware, particularly Jetson AGX Orin or Orin NX/Nano? (https://developer.nvidia.com/embedded/jetson-benchmarks)”
[179],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://github.com/sunsmarterjie/yolov12)”
[172],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://roboflow.com/licensing)”
[180],”DINOv2 backbone FLOPs ViT-L ViT-B (https://sslneurips22.github.io/paper_pdfs/paper_10.pdf)”
[181],”YOLO-World COCO benchmark mAP FPS GitHub Tencent ArXiv (https://arxiv.org/abs/2401.17270)”
[182],”YOLOv8 RT-DETR Jetson Orin community benchmark comparison FPS (https://docs.ultralytics.com/compare/yolov8-vs-rtdetr/)”
[183],”COCO AP_S benchmark YOLOv8 YOLOv9 RT-DETR RF-DETR (https://www.dfrobot.com/blog-13914.html)”
[184],”object detection edge device benchmark Jetson Orin FPS YOLO RT-DETR RF-DETR (https://www.researchgate.net/publication/311609522_You_Only_Look_Once_Unified_Real-Time_Object_Detection)”
[185],”YOLOv8 architecture Ultralytics ArXiv (https://docs.ultralytics.com/compare/yolov8-vs-damo-yolo/)”
[27],”YOLO-World COCO LVIS benchmark mAP FPS GitHub Tencent (https://github.com/AILab-CVC/YOLO-World)”
[131],”RT-DETR COCO benchmark mAP FPS PaddlePaddle GitHub ArXiv (https://arxiv.org/abs/2304.08069)”
[127],”YOLO-World COCO benchmark evaluation comparison mAP FPS (https://docs.ultralytics.com/guides/yolo-performance-metrics/)”
[186],”YOLO-World object detection architecture Tencent ArXiv (https://www.e2enetworks.com/blog/step-by-step-guide-to-unlocking-open-vocabulary-object-detection-with-yolo-world)”
[187],”YOLOv9 FLOPs parameters (https://visionplatform.ai/yolov9/)”
[185],”Describe the YOLOv8 architecture: backbone (CSPDarknet variant?), neck (PANet variant?), head (anchor-free?), key features (unified framework for detection, segmentation, pose, classification), strengths (ease of use, performance balance, ecosystem). (https://docs.ultralytics.com/compare/yolov8-vs-damo-yolo/)”
[188],”open vocabulary object detection use cases YOLO-World (https://encord.com/blog/yolo-world-object-detection/)”
[189],”RT-DETR Jetson Orin FPS benchmark blog forum GitHub (https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_benchmark/blob/main/results/isaac_ros_rtdetr_graph-agx_orin.json)”
[190],”YOLOv8 RT-DETR RF-DETR Coral Edge TPU benchmark FPS (https://xis.ai/blogpost/RT-DETR/)”
[191],”RF-DETR FLOPs Roboflow GitHub (https://github.com/roboflow/rf-detr/actions)”
[192],”YOLOv8 FLOPs parameters (https://mmyolo.readthedocs.io/en/latest/recommended_topics/algorithm_descriptions/yolov8_description.html)”
[57],”COCO-C robustness benchmark YOLO DETR comparison ArXiv (https://arxiv.org/html/2412.12349v1)”
[134],”YOLO-World fine-tuned COCO benchmark mAP FPS (https://blog.roboflow.com/what-is-yolo-world/)”
[118],”DINO-DETR vs RT-DETR vs YOLO benchmark comparison ArXiv (https://arxiv.org/html/2412.13490v2)”
[193],”small object detection comparison YOLO DETR review ArXiv VisDrone COCO AP_S (https://pmc.ncbi.nlm.nih.gov/articles/PMC11550436/)”
[194],”YOLOv8 COCO benchmark mAP FPS Ultralytics GitHub (https://github.com/orgs/ultralytics/discussions/8790)”
[149],”Describe the detailed architecture of YOLOv9: key innovations (PGI, GELAN), backbone (GELAN integration), neck (PGI integration), head (anchor-free?), comparison to YOLOv8 (parameter/computation reduction, accuracy improvement). (https://arxiv.org/html/2409.07813v1)”
[123],”Roboflow RF-DETR FLOPs technical details blog post (https://blog.roboflow.com/rf-detr/)”
[195],”object detection edge device benchmark Jetson Orin FPS YOLO RT-DETR RF-DETR (https://www.dfrobot.com/blog-13914.html)”
[134],”YOLO-World COCO benchmark evaluation comparison mAP FPS (https://blog.roboflow.com/what-is-yolo-world/)”
[196],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://paperswithcode.com/sota/real-time-object-detection-on-coco?p=scaled-yolov4-scaling-cross-stage-partial)”
[197],”YOLOv8 RT-DETR pedestrian detection benchmark comparison KITTI CityPersons MOT (https://pmc.ncbi.nlm.nih.gov/articles/PMC11016902/)”
[198],”Describe the detailed architecture of YOLOv8: backbone (CSPDarknet variant?), neck (optimized PANet?), head (anchor-free), key features (unified framework), model variants (n/s/m/l/x parameters), and reported COCO mAP / inference times from the table. (https://arxiv.org/html/2408.15857v1)”
[11],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://www.researchgate.net/publication/381929610_YOLOv12_to_Its_Genesis_A_Decadal_and_Comprehensive_Review_of_The_You_Only_Look_Once_YOLO_Series)”
[44],”YOLO-World Real-Time Open-Vocabulary Object Detection cited by COCO evaluation benchmark (https://arxiv.org/html/2401.17270v3)”
[123],”Roboflow RF-DETR FLOPs technical details blog post (https://blog.roboflow.com/rf-detr/)”
[199],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://github.com/IRCVLab/YOLOv8-for-Jetson-Orin)”
[200],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://github.com/ultralytics/ultralytics/issues/8800)”
[201],”YOLOv8 COCO benchmark mAP FPS Ultralytics GitHub (https://github.com/orgs/ultralytics/discussions/9137)”
[202],”RF-DETR FLOPs parameters (https://github.com/facebookresearch/detr/issues/110)”
S_R327,”pumpkin seeds magnesium zinc iron copper brain function (https://www.medicoverhospitals.in/articles/pumpkin-seeds-benefits)”
[90],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://arxiv.org/html/2504.11995v1)”
[163],”YOLO vs DETR training convergence speed data augmentation comparison ArXiv (https://arxiv.org/html/2402.16370v1)”
S_R330,”pumpkin seeds brain health benefits nutrients (https://www.webmd.com/diet/health-benefits-pumpkin-seeds)”
[203],”open vocabulary object detection use cases YOLO-World (https://www.pipeless.ai/blog/yolo-world/YOLO-World:%20Open%20vocabulary%20object%20detection.%20No%20more%20model%20training)”
[204],”YOLOv8 YOLOv9 COCO-C robustness benchmark mAP_C (https://www.researchgate.net/publication/384535670_CDF-YOLOv8_City_recognition_system_based_on_improved_YOLOv8)”
[205],”YOLOv8 RT-DETR Jetson Orin community benchmark comparison FPS (https://docs.ultralytics.com/compare/rtdetr-vs-yolov8/)”
[123],”RF-DETR Jetson Orin benchmark FPS performance Roboflow (https://blog.roboflow.com/rf-detr/)”
[198],”Describe the detailed architecture of YOLOv8: backbone, neck, head (anchor-free), key features (unified framework), model variants (n/s/m/l/x parameters), and reported COCO mAP / inference times from the table. (https://arxiv.org/html/2408.15857v1)”
S_R336,”pumpkin seeds brain health benefits nutrients (https://toneop.care/blogs/pumpkin-seed-for-brain)”
[95],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://arxiv.org/html/2504.13099v1)”
[206],”object detection robustness occlusion clutter domain shift comparison YOLO DETR ArXiv (https://www.frontiersin.org/articles/10.3389/fcomp.2025.1437664/full)”
[207],”YOLOv8 YOLOv9 COCO-C robustness benchmark mAP_C (https://github.com/bethgelab/robust-detection-benchmark)”
[208],”YOLOv12 COCO benchmark mAP FPS ArXiv (https://arxiv.org/html/2503.00057v1)”
[209],”Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices ArXiv alternative link summary (https://arxiv.org/html/2501.03265v1)”
[210],”DINOv2 backbone FLOPs ViT-L ViT-B (https://github.com/facebookresearch/dinov2/blob/main/MODEL_CARD.md)”
S_R343,”pumpkin seeds magnesium zinc iron copper brain function (https://instituteforfunctionalhealth.com/7-nuts-and-seeds-that-are-healthy-brain-fuel/)”
[211],”YOLOv8 vs RT-DETR reddit discussion experience fine-tuning deployment (https://www.reddit.com/r/computervision/comments/1hp2440/fast_object_detection_models_and_their_licenses/)”
[198],”Describe the detailed architecture of YOLOv8: backbone, neck, head (anchor-free), key features (unified framework), model variants (n/s/m/l/x parameters), and reported COCO mAP / inference times from the table. (https://arxiv.org/html/2408.15857v1)”
[212],”What are the reported inference performance benchmarks (FPS or latency) for object detection models (like YOLO variants, ResNet-50, RetinaNet, SSD, EfficientDet, DETR variants if available) on NVIDIA Jetson hardware, particularly Jetson AGX Orin or Orin NX/Nano? (https://www.jetson-ai-lab.com/benchmarks.html)”
[213],”COCO AP_S benchmark YOLOv8 YOLOv9 RT-DETR RF-DETR (https://arxiv.org/html/2308.05480v2)”
[214],”Tencent AI Lab CVC YOLO-World COCO performance evaluation blog post talk (https://www.mdpi.com/bookfiles/book/1573/Machine_Learning_and_Embedded_Computing_in_Advanced_Driver_Assistance_Systems_ADAS.pdf)”
[131],”YOLO vs DETR training convergence speed data augmentation comparison ArXiv (https://arxiv.org/abs/2304.08069)”
S_R350,”pumpkin seeds brain health benefits nutrients (https://www.carehospitals.com/blog-detail/benefits-of-pumpkin-seeds/)”
[215],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://wiki.seeedstudio.com/YOLOv8-DeepStream-TRT-Jetson/)”
[172],”object detection edge device benchmark Jetson Orin FPS YOLO RT-DETR RF-DETR (https://blog.roboflow.com/train-deploy-rf-detr/)”
[216],”Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices ArXiv alternative link summary (https://www.mdpi.com/2079-9292/14/3/638)”
[217],”YOLOv8 YOLOv9 COCO-C robustness benchmark mAP_C (https://arxiv.org/html/2406.14239v1)”
[218],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://encord.com/blog/yolov9-sota-machine-learning-object-dection-model/)”
[219],”YOLO vs DETR training GPU memory requirement comparison (https://docs.ultralytics.com/compare/yolov5-vs-rtdetr/)”
[220],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://docs.ultralytics.com/tasks/pose/)”
[221],”YOLOv8 vs RT-DETR reddit discussion experience fine-tuning deployment (https://www.reddit.com/r/computervision/comments/1gh6bw6/dear_researchers_stop_this_nonsense/)”
[178],”object detection edge device benchmark Jetson Orin FPS YOLO RT-DETR RF-DETR (https://developer.nvidia.com/embedded/jetson-benchmarks)”
[131],”DINO-DETR vs RT-DETR vs YOLO benchmark comparison ArXiv (https://arxiv.org/abs/2304.08069)”
[222],”RT-DETR Jetson Orin FPS benchmark blog forum GitHub (https://github.com/orgs/ultralytics/discussions/2545)”
S_R362,”pumpkin seeds brain health benefits nutrients (https://goraw.com/blogs/news/10-top-reasons-to-love-sprouted-pumpkin-seeds)”
[223],”Roboflow RF-DETR FLOPs technical details blog post (https://blog.roboflow.com/yolov10-how-to-train/)”
[224],”YOLOv8 vs RT-DETR reddit discussion experience fine-tuning deployment (https://www.reddit.com/r/computervision/comments/1hc9p19/i_compared_the_object_detection_outputs_of_yolo/)”
[212],”YOLOv8 Jetson Orin benchmark FPS performance (https://www.jetson-ai-lab.com/benchmarks.html)”
[182],”YOLOv8 vs RT-DETR reddit discussion experience fine-tuning deployment (https://docs.ultralytics.com/compare/yolov8-vs-rtdetr/)”
[225],”YOLOv8 YOLOv9 RT-DETR OpenVINO benchmark FPS latency comparison (https://docs.ultralytics.com/compare/yolov8-vs-yolov9/)”
[226],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/04462.pdf)”
[227],”YOLO-World fine-tuned COCO benchmark mAP FPS (https://docs.ultralytics.com/models/yoloe/)”
[228],”EfficientDet object detection architecture Google Brain ArXiv (https://docs.ultralytics.com/compare/efficientdet-vs-yolov9/)”
[95],”Describe the detailed architecture of RF-DETR, including backbone (DINOv2), neck, head, key innovations (Deformable DETR/LW-DETR inspiration), single-scale features, anchor/NMS free), variants (Base/Large parameters), reported COCO mAP, and training details (optimizer, LR, epochs, pre-trained weights). (https://arxiv.org/html/2504.13099v1)”
[229],”DETR vs YOLO architecture theoretical comparison robustness small objects ArXiv review (https://www.mdpi.com/2079-9292/14/8/1624)”
S_R373,”pumpkin seeds brain health benefits nutrients (https://www.healthline.com/nutrition/11-benefits-of-pumpkin-seeds)”
[230],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://www.reddit.com/r/computervision/comments/1e3uxro/ultralytics_new_agpl30_license_exploiting/)”
[231],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://github.com/keras-team/keras-cv/discussions/2032)”
[232],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://www.reddit.com/r/opensource/comments/rjvr33/apache_20_vs_agpl30/)”
[233],”RF-DETR Jetson Orin FPS benchmark blog forum GitHub (https://blog.roboflow.com/run-inference/)”
[234],”object detection model selection guide use cases accuracy speed (https://labelyourdata.com/articles/object-detection-metrics)”
[235],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://github.com/ultralytics/ultralytics/issues/2129)”
[236],”SSD Single Shot MultiBox Detector architecture ArXiv (https://ar5iv.labs.arxiv.org/html/1512.02325)”
S_R381,”pumpkin seeds magnesium zinc iron copper brain function (https://www.jencaremed.com/articles/4-superfoods-keep-your-mind-sharp)”
[73],”DINO-DETR vs RT-DETR vs YOLO benchmark comparison ArXiv (https://docs.ultralytics.com/models/rtdetr/)”
[237],”YOLO vs DETR training GPU memory requirement comparison (https://blog.roboflow.com/train-rt-detr-custom-dataset-transformers/)”
[238],”YOLO-World fine-tuned COCO benchmark mAP FPS (https://paperswithcode.com/sota/object-detection-on-coco)”
[25],”What are the key findings regarding the performance sensitivity of YOLO models (v5, v8, v9, v10, v11) to small object sizes (e.g., objects occupying 1%, 2.5%, 5% of image area)? Provide quantitative results if available. (https://arxiv.org/abs/2504.09900)”
[25],”What are the key findings regarding the performance sensitivity of YOLO models (v5, v8, v9, v10, v11) to small object sizes (e.g., objects occupying 1%, 2.5%, 5% of image area)? Provide quantitative results if available. (https://arxiv.org/abs/2504.09900)”
[205],”YOLOv8 vs RT-DETR reddit discussion experience fine-tuning deployment (https://docs.ultralytics.com/compare/rtdetr-vs-yolov8/)”
[239],”YOLOv8 RT-DETR Jetson Orin community benchmark comparison FPS (https://www.dfrobot.com/blog-13998.html)”
[240],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://github.com/ultralytics/ultralytics/issues/5019)”
S_R390,”pumpkin seeds magnesium zinc iron copper brain function (https://centrespringmd.com/5-superfood-seeds-for-better-brain-health/)”
[241],”RF-DETR Jetson Orin FPS benchmark blog forum GitHub (https://forums.developer.nvidia.com/t/vlm-refresh-rate/315785)”
[242],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://roboflow.com/licensing)”
[243],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://visionplatform.ai/yolov10-object-detection/)”
[131],”COCO-C robustness benchmark YOLO DETR comparison ArXiv (https://arxiv.org/abs/2304.08069)”
[183],”YOLO vs DETR object detection comparison applications autonomous driving surveillance robotics (https://www.dfrobot.com/blog-13914.html)”
[244],”YOLO vs DETR training GPU memory requirement comparison (https://docs.ultralytics.com/compare/rtdetr-vs-yolov5/)”
[245],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://www.ultralytics.com/blog/comparing-ultralytics-yolo11-vs-previous-yolo-models)”
S_R398,”pumpkin seeds brain health benefits nutrients (https://www.medicalnewstoday.com/articles/303864)”
[246],”YOLOv8 YOLOv9 RT-DETR OpenVINO benchmark FPS latency comparison (https://test.dfrobot.com/blog-13998.html)”
[247],”RT-DETR Jetson Orin FPS benchmark blog forum GitHub (https://b.savant-ai.io/2023/11/29/running-the-rt-detr-detection-model-efficiently-with-savant/)”
[248],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://github.com/orgs/ultralytics/discussions/9942)”
[249],”YOLO vs DETR training GPU memory requirement comparison (https://www.ikomia.ai/blog/top-object-detection-models-review)”
[250],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://www.seeedstudio.com/blog/2023/03/30/yolov8-performance-benchmarks-on-nvidia-jetson-devices/)”
[251],”YOLO AGPL vs DETR Apache 2.0 license comparison discussion practical impact (https://fossa.com/blog/fall-2024-software-licensing-roundup/)”
[183],”YOLOv8 RT-DETR pedestrian detection benchmark comparison KITTI CityPersons MOT (https://www.dfrobot.com/blog-13914.html)”
[252],”YOLOv8 Jetson Orin benchmark FPS performance (https://www.jetson-ai-lab.com/tutorial_ultralytics.html)”
[253],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://docs.ultralytics.com/models/yolo11/)”
[233],”RT-DETR Jetson Orin benchmark FPS performance (https://blog.roboflow.com/run-inference/)”
[247],”RT-DETR Jetson Orin benchmark FPS performance (https://b.savant-ai.io/2023/11/29/running-the-rt-detr-detection-model-efficiently-with-savant/)”
[250],”YOLOv8 RT-DETR Jetson Orin community benchmark comparison FPS (https://www.seeedstudio.com/blog/2023/03/30/yolov8-performance-benchmarks-on-nvidia-jetson-devices/)”
[254],”YOLOv8 RT-DETR pedestrian detection benchmark comparison KITTI CityPersons MOT (https://www.mdpi.com/1424-8220/24/22)”
[255],”RF-DETR Jetson Orin FPS benchmark blog forum GitHub (https://www.kaggle.com/datasets/dschettler8845/rf-detr-github)”
[256],”YOLO vs DETR object detection comparison applications autonomous driving surveillance robotics (https://www.digitalocean.com/community/tutorials/yolov12-next-big-leap-in-object-detection)”
[1]4,”pumpkin seeds brain health benefits nutrients (https://www.aarp.org/health/healthy-living/info-2022/pumpkin-benefits.html)”
[257],”Tencent AI Lab CVC YOLO-World COCO performance evaluation blog post talk (https://www.youtube.com/watch?v=lF1BtQL16l4)”
[225],”YOLOv9 FLOPs parameters (https://docs.ultralytics.com/compare/yolov8-vs-yolov9/)”
[55],”What were the measured inference times or FPS for YOLOv8, SSD, and EfficientDet models when deployed on the Jetson Orin Nano using TensorRT, as reported in this benchmarking study? (https://arxiv.org/html/2409.16808v1)”
[55],”What were the measured inference times or FPS for YOLOv8, SSD, and EfficientDet models when deployed on the Jetson Orin Nano using TensorRT, as reported in this benchmarking study? (https://arxiv.org/html/2409.16808v1)”
[250],”YOLOv8 Jetson Orin benchmark FPS performance (https://www.seeedstudio.com/blog/2023/03/30/yolov8-performance-benchmarks-on-nvidia-jetson-devices/)”
[258],”YOLOv8 RT-DETR RF-DETR Coral Edge TPU benchmark FPS (https://github.com/ultralytics/ultralytics/issues/3287)”
[183],”YOLO vs DETR training GPU memory requirement comparison (https://www.dfrobot.com/blog-13914.html)”
[2]2,”pumpkin seeds brain health benefits nutrients (https://health.clevelandclinic.org/pumpkin-seeds-7-ways)”
[259],”YOLOv8 YOLOv9 RT-DETR OpenVINO benchmark FPS latency comparison (https://docs.ultralytics.com/models/yolov10/)”
[260],”YOLOv8 vs RT-DETR reddit discussion experience fine-tuning deployment (https://www.reddit.com/r/computervision/comments/19bzffg/is_there_a_better_algorithm_for_object/)”
[261],”YOLOv8 RT-DETR RF-DETR Coral Edge TPU benchmark FPS (https://coral.ai/docs/edgetpu/benchmarks/)”
[262],”YOLO-World fine-tuned COCO benchmark mAP FPS (https://learnopencv.com/performance-comparison-of-yolo-models/)”
[263],”YOLOv8 RT-DETR pedestrian detection benchmark comparison KITTI CityPersons MOT (https://www.youtube.com/watch?v=g4Q1tW988eI)”
[264],”YOLOv8 YOLOv9 RT-DETR OpenVINO benchmark FPS latency comparison (https://www.labelvisor.com/solutions-and-workarounds-for-yolov10-challenges/)”
[265],”YOLOv8 YOLOv9 RT-DETR OpenVINO benchmark FPS latency comparison (https://www.labelvisor.com/common-issues-in-yolov10-implementation/)”
[150],”YOLOv8 RT-DETR Jetson Orin community benchmark comparison FPS (https://arxiv.org/html/2412.02171v1)”
[23],”YOLOv8 YOLOv9 COCO-C robustness benchmark mAP_C (https://paperswithcode.com/sota/real-time-object-detection-on-coco)”
[3]2,”pumpkin seeds brain health benefits nutrients (https://www.va.gov/charleston-health-care/stories/pumpkins-on-the-brain/)”
[266],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://github.com/marcoslucianops/DeepStream-Yolo/issues/390)”
[267],”YOLOv8 Jetson Orin benchmark FPS performance (https://www.stereolabs.com/blog/performance-of-yolo-v5-v7-and-v8)”
[224],”DINO-DETR vs RT-DETR vs YOLO benchmark comparison ArXiv (https://www.reddit.com/r/computervision/comments/1hc9p19/i_compared_the_object_detection_outputs_of_yolo/)”
[132],”object detection model selection guide use cases accuracy speed (https://www.reddit.com/r/computervision/comments/1h35tpi/whats_the_fastest_object_detection_model/)”
[268],”object detection ease of implementation deployment comparison YOLO DETR Ultralytics Roboflow (https://roboflow.com/compare/rf-detr-vs-yolov8)”
[249],”YOLOv9 vs RT-DETR practical comparison ease of use GitHub issues (https://www.ikomia.ai/blog/top-object-detection-models-review)”
[269],”YOLOv8 vs RT-DETR reddit discussion experience fine-tuning deployment (https://www.reddit.com/r/computervision/best/?after=dDNfMWpvcHh6aA%3D%3D&sort=best&t=hour)”
[270],”YOLOv8 RT-DETR RF-DETR Coral Edge TPU benchmark FPS (https://github.com/ultralytics/ultralytics/issues/4089)”
[247],”YOLOv8 YOLOv9 RT-DETR OpenVINO benchmark FPS latency comparison (https://b.savant-ai.io/2023/11/29/running-the-rt-detr-detection-model-efficiently-with-savant/)”
[271],”object detection ease of implementation deployment comparison YOLO DETR Ultralytics Roboflow (https://blog.roboflow.com/ai-models/)”
[272],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://github.com/marcoslucianops/DeepStream-Yolo/issues/605)”
[273],”YOLOv8 YOLOv9 COCO-C robustness benchmark mAP_C (https://encord.com/blog/performanceyolov9-vs-yolov8-custom-dataset/)”
[194],”YOLOv11 COCO benchmark mAP FPS Ultralytics (https://github.com/orgs/ultralytics/discussions/8790)”
[247],”YOLOv8 RT-DETR Jetson Orin community benchmark comparison FPS (https://b.savant-ai.io/2023/11/29/running-the-rt-detr-detection-model-efficiently-with-savant/)”
[274],”YOLOv8 YOLOv9 COCO-C robustness benchmark mAP_C (https://roboflow.com/compare-model-sizes/yolov9e-vs-yolov9s)”
[275],”YOLO-World fine-tuned COCO benchmark mAP FPS (https://www.reddit.com/r/computervision/comments/1g2m688/yolo_metrics_comparison/)”
[276],”YOLOv8 Jetson Orin FPS benchmark blog forum GitHub (https://github.com/ultralytics/ultralytics/issues/17640)”
[267],”YOLOv8 RT-DETR Jetson Orin community benchmark comparison FPS (https://www.stereolabs.com/blog/performance-of-yolo-v5-v7-and-v8)”
[277],”YOLO vs DETR training GPU memory requirement comparison (https://github.com/orgs/ultralytics/discussions/4977)”
[278],”YOLOv8 RT-DETR RF-DETR Coral Edge TPU benchmark FPS (https://www.youtube.com/watch?v=sOxQTRRh9tw)”