Ri-Gemma-Vision-v1: Khasi OCR Vision Model
A fine-tuned vision-language model for Optical Character Recognition (OCR) of Khasi language documents, built on top of Gemma-4-E2B-it.
Model Summary
| Property | Details |
|---|---|
| Base Model | unsloth/gemma-4-E2B-it |
| Task | OCR → Markdown transcription |
| Languages | Khasi (kha), English (en) |
| Fine-tuning Method | QLoRA (4-bit) via Unsloth |
| Training Samples | 22,985 |
| Validation Samples | 1,300 |
Dataset
Trained on toiar/Khasi-Gemma-OCR-24K, a 24K sample dataset consisting of:
- Real scanned Khasi books and articles
- Synthetic Khasi text images
- Real scanned English books
Each sample contains a scanned page image paired with its ground truth Markdown transcription.
Inference
from unsloth import FastVisionModel, get_chat_template
from PIL import Image
from transformers import TextIteratorStreamer
from threading import Thread
import torch
model, processor = FastVisionModel.from_pretrained(
"toiar/Ri-Gemma-Vision-v1",
load_in_4bit = False,
torch_dtype = torch.bfloat16,
device_map = "auto",
)
FastVisionModel.for_inference(model)
processor = get_chat_template(processor, "gemma-4")
# Load image
image = Image.open("your_image.jpg").convert("RGB")
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Convert to Markdown."},
{"type": "image"},
],
}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")
# Streaming inference
streamer = TextIteratorStreamer(
processor.tokenizer,
skip_prompt=True,
skip_special_tokens=True
)
thread = Thread(target=model.generate, kwargs=dict(
**inputs,
streamer=streamer,
max_new_tokens=4096,
use_cache=True,
do_sample=False,
))
thread.start()
for token in streamer:
print(token, end="", flush=True)
thread.join()
Citation
@misc{ri-gemma-vision-v1,
author = {Toiarbor Mawlieh},
title = {Ri-Gemma-Vision-v1: Khasi OCR Vision Model},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/toiar/Ri-Gemma-Vision-v1}
}
- Downloads last month
- 7