Model Description

This is a fine-tuned version of the PaddleOCR v5 Server Detection Model. It has been trained on a dataset of manga speech bubble crops to improve detection for:

  • Speech Bubbles Lines: Standard dialogue detection.
  • Vertical Text Lines: Improved bounding boxes for Japanese vertical writing (tategaki).
  • Text Lines Outside Bubbles: Narration boxes and floating text.
  • Text Lines With Furigana: Greatly reduced the creation of separate bounding regions for furigana.

This model outputs bounding boxes (polygons) for text regions. It does not perform text recognition; you will need a separate recognition model for that.

Note that this model is still being worked on, and may improve with a better dataset or hyperparameters.

Training Data

The dataset consisted largely of synthetic data due to the limited real samples available.

  • ~400 randomly sampled speech bubble crops from Manga109s
  • ~200k synthetic images

Acknowledgments

This project was done with the usage of:

  • Manga109-s dataset
  • CC-100 dataset
  • MangaOCR synthetic data generation (code was edited for speedups, bounding box additions, and improved representation of manga)
Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bluolightning/PaddleOCRv5-Server-Det-For-Manga

Finetuned
(2)
this model

Dataset used to train bluolightning/PaddleOCRv5-Server-Det-For-Manga