You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Ri-Gemma-Vision-v1: Khasi OCR Vision Model

A fine-tuned vision-language model for Optical Character Recognition (OCR) of Khasi language documents, built on top of Gemma-4-E2B-it.


Model Summary

Property Details
Base Model unsloth/gemma-4-E2B-it
Task OCR → Markdown transcription
Languages Khasi (kha), English (en)
Fine-tuning Method QLoRA (4-bit) via Unsloth
Training Samples 22,985
Validation Samples 1,300

Dataset

Trained on toiar/Khasi-Gemma-OCR-24K, a 24K sample dataset consisting of:

  • Real scanned Khasi books and articles
  • Synthetic Khasi text images
  • Real scanned English books

Each sample contains a scanned page image paired with its ground truth Markdown transcription.


Inference

from unsloth import FastVisionModel, get_chat_template
from PIL import Image
from transformers import TextIteratorStreamer
from threading import Thread
import torch

model, processor = FastVisionModel.from_pretrained(
    "toiar/Ri-Gemma-Vision-v1",
    load_in_4bit = False, 
    torch_dtype = torch.bfloat16,
    device_map = "auto",
)

FastVisionModel.for_inference(model)
processor = get_chat_template(processor, "gemma-4")

# Load image
image = Image.open("your_image.jpg").convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text",  "text": "Convert to Markdown."},
            {"type": "image"},
        ],
    }
]

input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")

# Streaming inference
streamer = TextIteratorStreamer(
    processor.tokenizer,
    skip_prompt=True, 
    skip_special_tokens=True
)

thread = Thread(target=model.generate, kwargs=dict(
    **inputs, 
    streamer=streamer, 
    max_new_tokens=4096,
    use_cache=True, 
    do_sample=False,
))
thread.start()

for token in streamer:
    print(token, end="", flush=True)
thread.join()

Citation

@misc{ri-gemma-vision-v1,
  author    = {Toiarbor Mawlieh},
  title     = {Ri-Gemma-Vision-v1: Khasi OCR Vision Model},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/toiar/Ri-Gemma-Vision-v1}
}
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for toiar/Ri-Gemma-Vision-v1

Finetuned
(58)
this model