Model Description
This is a fine-tuned version of the PaddleOCR v5 Server Detection Model. It has been trained on a dataset of manga speech bubble crops to improve detection for:
- Speech Bubbles Lines: Standard dialogue detection.
- Vertical Text Lines: Improved bounding boxes for Japanese vertical writing (tategaki).
- Text Lines Outside Bubbles: Narration boxes and floating text.
- Text Lines With Furigana: Greatly reduced the creation of separate bounding regions for furigana.
This model outputs bounding boxes (polygons) for text regions. It does not perform text recognition; you will need a separate recognition model for that.
Note that this model is still being worked on, and may improve with a better dataset or hyperparameters.
Training Data
The dataset consisted largely of synthetic data due to the limited real samples available.
- ~400 randomly sampled speech bubble crops from Manga109s
- ~200k synthetic images
Acknowledgments
This project was done with the usage of:
- Manga109-s dataset
- CC-100 dataset
- MangaOCR synthetic data generation (code was edited for speedups, bounding box additions, and improved representation of manga)
- Downloads last month
- 8
Model tree for bluolightning/PaddleOCRv5-Server-Det-For-Manga
Base model
PaddlePaddle/PP-OCRv5_server_det