--- license: agpl-3.0 library_name: cerberus-det tags: - object-detection - computer-vision - yolo - pytorch datasets: - pascal_voc - objects365 metrics: - map --- # CerberusDet (VOC + Objects365 Animals)
VIBE

📜 Paper on arXiv | Github | 🤗 All CerberusDet Models |

**CerberusDet** is a unified multi-dataset object detection framework based on the YOLO architecture. This specific model checkpoint was trained simultaneously on two datasets: **PASCAL VOC** and the **Animals subset of Objects365**. It demonstrates the ability to handle multiple domains with conflicting class definitions within a single model, achieving state-of-the-art performance with reduced inference time compared to running separate models. ## Model Details - **Architecture:** Multi-headed YOLOv8x (shared backbone/neck, separate heads) - **Task:** Object Detection - **Training Datasets:** PASCAL VOC (2007+2012), Objects365 (Animals subset) - **Input Resolution:** 640x640 - **Precision:** FP16 (Inference) - **Params**: 105M - **FLOPs(B)**: 381.3 ## Performance | Dataset | mAP@0.5 | mAP@0.5:0.95 | | :--- | :---: | :---: | | **PASCAL VOC 2007** | 0.92 | 0.75 | | **Objects365 Animals** | 0.57 | 0.43 | **Inference Speed:** 7.2 ms on NVIDIA V100 (FP16, batch 32). This is faster than running two separate YOLOv8 models sequentially (approx. 11.2 ms). ## Usage ### Installation Requirements: - python3.10 - [CerberusDet](https://github.com/ai-forever/CerberusDet) - transformers==4.50.0 - accelerate==1.8.1 ### Use with transformers (with trust_remote_code=True) ```bash pip install huggingface_hub git+https://github.com/ai-forever/CerberusDet transformers==4.50.0 accelerate==1.8.1 ``` ```python from transformers import AutoImageProcessor, AutoConfig, AutoModel import cv2 import torch model_name = "iitolstykh/cerberusdet-yolov8x-voc-o365-animals" device = torch.device("cpu") if not torch.cuda.is_available() else torch.device("cuda:0") torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32 config = AutoConfig.from_pretrained(model_name, trust_remote_code=True) cerberus_model = AutoModel.from_pretrained( model_name, trust_remote_code=True, config=config, torch_dtype=torch_dtype, ).to(device) image_processor = AutoImageProcessor.from_pretrained( model_name, trust_remote_code=True, half=cerberus_model.half, stride=config.stride, ) image = cv2.imread(img_path) # BGR inputs = image_processor(images=[image], device=device) # inference output = cerberus_model(**inputs, return_dict=True) # print results batch_size = len(output.boxes) for i in range(batch_size): print(f"\n--- Results for Image {i} ---") boxes, scores, labels, task_indices = output.boxes[i], output.scores[i], output.labels[i], output.tasks_ids[i] for j in range(len(scores)): score, label_id, task_idx = scores[j].item(), int(labels[j].item()), int(task_indices[j].item()) class_name = config.all_class_names[label_id] task_name = config.task_ids[task_idx] box = boxes[j].tolist() # [x1, y1, x2, y2] print(f"Object {j}:") print(f" • Class: {class_name} (ID: {label_id})") print(f" • Task: {task_name}") print(f" • Conf: {score:.4f}") print(f" • Box: {box}") ``` ### Use with CerberusDet ```bash pip install git+https://github.com/ai-forever/CerberusDet ``` ```python import cv2 from cerberusdet.cerberusdet_inference import CerberusDetInference, CerberusVisualizer from cerberusdet.cerberusdet_preprocessor import CerberusPreprocessor from huggingface_hub import hf_hub_download import torch # 1. Download model weights model_path = hf_hub_download( repo_id="iitolstykh/cerberusdet-yolov8x-voc-o365-animals", filename="voc_obj365_animals_v8x_best.pt", repo_type="model" ) device = 'cuda:0' inferencer = CerberusDetInference( weights=model_path, device=device, conf_thres=0.3, iou_thres=0.45, half=True ) # Note: Pass the model's stride to the preprocessor preprocessor = CerberusPreprocessor( img_size=640, stride=inferencer.stride, half=inferencer.half, auto=True ) visualizer = CerberusVisualizer(line_thickness=2, text_scale=0.5) # 3. Load images # The preprocessor expects a list of numpy arrays (BGR) images = [cv2.imread(img_path)] original_shapes = [img.shape[:2] for img in images] # 4. Run inference img_tensor = preprocessor.preprocess(images, device=inferencer.device) detections = inferencer.predict(img_tensor, original_shape=original_shapes) # Visualization res_image = visualizer.draw_detections( images[0], detections[0], hide_task=False, # Show task name (VOC, O365, etc.) hide_conf=False # Show confidence score ) # 5. Output / Save results print(f"Found objects: {len(detections[0])}") for det in detections[0]: print(f"{det['label_name']} ({det['score']:.2f}) - Task: {det['task']}") cv2.imshow("CerberusDet Result", res_image) cv2.imwrite("result.jpg", res_image) cv2.waitKey(0) cv2.destroyAllWindows() ``` ## Citation If you use this model in your research, please cite our paper: ``` @article{cerberusdet, Author = {Irina Tolstykh, Michael Chernyshov, Maksim Kuprashevich}, Title = {CerberusDet: Unified Multi-Dataset Object Detection}, Year = {2024}, Eprint = {arXiv:2407.12632}, } ```