Valerii02/ukr-htr-convtext

# 🇺🇦 Ukrainian OCR / ICR (HTR-ConvText)

Handwritten & printed text recognition for Ukrainian

Upload an image → Get recognized text

📋 Table of Contents

✨ Highlights
🚀 Quickstart
📖 Model Description
🖼️ Recognition Examples
🛠️ Tools & Scripts
📊 Evaluation
🙏 Attribution & Citation

✨ Highlights

Feature	Description
Language	Ukrainian (handwritten + printed)
Architecture	HTR-ConvText (ResNet-18 + MobileViT), CTC decoding
Input	64×3072 px, grayscale line images
Training	1.7M samples, SAM, EMA, scan simulation
Formats	PyTorch, ONNX, Hugging Face `AutoModel`

🚀 Quickstart

from transformers import AutoModel, AutoProcessor

processor = AutoProcessor.from_pretrained("Valerii02/ukr-htr-convtext", trust_remote_code=True)
model = AutoModel.from_pretrained("Valerii02/ukr-htr-convtext", trust_remote_code=True)

inputs = processor(images="sample.png", return_tensors="pt")
logits = model(**inputs).logits
text = processor.batch_decode(logits)[0]
print(text)

💡 Try it now: Open the Gradio demo — no code required!

📖 Model Description

This repository packages a Ukrainian OCR/ICR model for handwritten and partially printed text with a Hugging Face–native API (AutoModel + AutoProcessor).

Architecture

Backbone: ResNet-18 + MobileViT (MVP), hierarchical ConvText encoder (U-Net-like down/upsampling)
Decoding: CTC greedy
Vocabulary: 151 characters (Ukrainian + symbols)

Training Data

Source	Samples
ukrainian-handwriting-synth	Synthetic handwritten lines
Ukrainian Handwritten Text	~37k segmented lines
Total	1,696,499 (Train 90% / Val 5% / Test 5%)

Training

500k iterations, batch 16 + grad accum 4
SAM optimizer, EMA (decay 0.9999), TCM warmup 40k iters
Scan simulation & detector-error augmentations
Hardware: NVIDIA B200 (180GB VRAM)

🖼️ Recognition Examples

Example	Image	GT	Prediction	CER	WER
1		Департаменту патрульної поліції	Департаменту нагрульної поліції	0.065	0.33
2		за порушення правил дорожнього руху	за порушення правил дорожнього Дуку	0.057	0.20

Real-world inference on scanned Ukrainian documents. GT = ground truth.

🛠️ Tools & Scripts

File	Purpose
`prepare_hf_artifacts.py`	Convert `.pth` checkpoint → HF artifacts
`export_onnx.py`	Export to ONNX
`validate_parity.py`	OpenCV vs PIL, PyTorch vs ONNX parity checks
`predict.py`	Single-image CLI inference

Conversion

python prepare_hf_artifacts.py \
  --checkpoint-path /path/to/best_CER.pth \
  --alphabet-path /path/to/alphabet.json \
  --output-dir ./release

ONNX Export

python export_onnx.py --hf-model-dir ./release --output-dir ./onnx

📊 Evaluation

Split	CER	WER	Notes
real-world (124)	0.176	0.440	Scanned docs, handwritten + printed

Micro-averaging, format_string_for_wer normalization.

⚠️ Limitations

Sensitive to severe blur, low contrast, non-standard page artifacts
Performance may drop on long lines far from training distribution
CTC decoding can fail on highly ambiguous character boundaries

🙏 Attribution & Citation

This implementation adapts ideas from DAIR-Group/HTR-ConvText. See NOTICE and CITATION.cff for details. Upstream (HTR-ConvText):

@misc{truc2025htrconvtext,
  title={HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition},
  author={Pham Thach Thanh Truc and Dang Hoai Nam and Huynh Tong Dang Khoa and Vo Nguyen Le Duy},
  year={2025},
  eprint={2512.05021},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.05021},
}

This model: See CITATION.cff for full attribution.

📄 License

Apache-2.0. See LICENSE.

⭐ Star this repo if you find it useful! · Report issues · Contributions welcome

Downloads last month: 290

Safetensors

Model size

66M params

Tensor type

F32

Model tree for Valerii02/ukr-htr-convtext

Base model

DAIR-Group/HTR-ConvText

Finetuned

(1)

this model

Space using Valerii02/ukr-htr-convtext 1

Paper for Valerii02/ukr-htr-convtext

HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition

Paper • 2512.05021 • Published Dec 4, 2025