YOLOv8-World v2 β€” ONNX with dynamic text prompts

ONNX exports of Ultralytics YOLOv8-World v2 (yolov8s-worldv2.pt, yolov8l-worldv2.pt) with a dynamic txt_feats input so class prompts can be re-encoded at runtime (zero-shot open-vocabulary detection), rather than being baked into the graph.

Files

File Size Params
yolov8s-worldv2.onnx 48.8 MB 12.7 M
yolov8l-worldv2.onnx 178.8 MB 46.8 M

Input schema

Two named inputs (all axes dynamic):

  • images β€” float32[batch, 3, height, width], normalized to [0, 1], RGB, imgsz=640 recommended.
  • txt_feats β€” float32[batch, num_classes, 512], CLIP ViT-B/32 text embeddings, L2-normalized (though the graph re-normalizes internally).

Output

  • output0 β€” float32[batch, 4 + num_classes, num_anchors] (YOLO head: 4 box regressors + per-class logits; anchors depend on input HxW).

Post-processing: apply sigmoid to the class channels, then standard NMS.

Text embedding

Use CLIP ViT-B/32 (openai/clip-vit-base-patch32) to encode each class prompt into a 512-dim vector, stack into (1, N, 512).

Export

Exported from Ultralytics 8.4.37 with a custom torch.onnx.export path that patches WorldDetect.forward to use torch.split sized from text.shape[1] (so N stays dynamic). Opset 18. Simplified with onnxslim.

License

AGPL-3.0 (inherits from Ultralytics YOLOv8-World weights).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support