Image-Text-to-Text
Safetensors
multilingual
GOT
got
vision-language
ocr2.0
custom_code

Poor performance with simple table extraction task

#43
by hanshupe - opened

There is a lot of hype around multimodal models, such GOT.
I would like to know if others made a similar experience in practice: While they can do impressive things, they still struggle with table extraction, in cases which are straight-forward for humans.

Attached is a simple example, all I need is a reconstruction of the table as a flat CSV, preserving empty all empty cells correctly. Which open source model is able to do that?

page_1.png

https://huggingface.co/spaces/yonigozlan/GOT-OCR-Transformers
just use this demo, which satisfied your requirement.

Sign up or log in to comment