Full-text search
Search in
Scope to owner or repo
+ 1,000 results
Aleistar / Gengar_toy_example
README.md
dataset
1 matches
dhirajudhani / image
README.md
model
5 matches
nvidia / nemotron-ocr-v1
README.md
model
24 matches
tags: image, ocr, object recognition, text recognition, layout analysis, ingestion, image-to-text, en, license:other, region:us
26
27
28
29
30
mple image.* -->
### **Description**
The Nemotron OCR v1 model is a state-of-the-art text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a detector for text region localization, a recognizer for transcription of detected regions, and a relational model for layout and structure analysis.
qpqpqpqpqpqp / Ovis_Image_7B_fp8
README.md
model
2 matches
tags: image generation, comfyui, text-to-image, en, zh, base_model:AIDC-AI/Ovis-Image-7B, base_model:finetune:AIDC-AI/Ovis-Image-7B, license:apache-2.0, region:us
14
15
16
Ovis Image 7B!
<img src=/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F636f4c6b5d2050767e4a1491%2FcfsnngElzYv8DbTKsLohl.png width="40%"/>Enjoy!
</div>
nvidia / nemotron-page-elements-v3
README.md
model
16 matches
tags: image, detection, pdf, ingestion, yolox, object-detection, en, arxiv:2107.08430, license:other, region:us
22
23
24
25
26
mple image.*
### Description
The **Nemotron Page Elements v3** model is a specialized object detection model designed to identify and extract elements from document pages. While the underlying technology builds upon work from [Megvii Technology](https://github.com/Megvii-BaseDetection/YOLOX), we developed our own base model through complete retraining rather than using pre-trained weights. YOLOX is an anchor-free version of YOLO (You Only Look Once), this model combines a simpler architecture with enhanced performance. The model is trained to detect **tables**, **charts**, **infographics**, **titles**, **header/footers** and **texts** in documents.
xinyu1205 / recognize_anything_model
README.md
model
12 matches
tags: image tagging, image captioning, image-to-text, en, arxiv:2306.03514, arxiv:2303.05657, license:mit, region:us
12
13
14
15
16
rong Image Tagging Model </a> and <a href="https://tag2text.github.io/">Tag2Text: Guiding Vision-Language Model via Image Tagging</a>.
**Recognition and localization are two foundation computer vision tasks.**
- **The Segment Anything Model (SAM)** excels in **localization capabilities**, while it falls short when it comes to **recognition tasks**.
- **The Recognize Anything Model (RAM) and Tag2Text** exhibits **exceptional recognition abilities**, in terms of **both accuracy and scope**.
kviai / Kvi-Upscale-V1
README.md
model
5 matches
tags: diffusers, Image Upscaling, Img2Img, image-to-image, en, license:cc-by-4.0, region:us
11
12
13
14
15
### Image Upscaling Model
This repository contains the PyTorch model for upscaling images. The model has been trained to upscale low-resolution images to higher resolution using convolutional neural networks.
## Model Details
huwhitememes / laptophunterbiden_v1-qwen_image
README.md
model
6 matches
tags: image, lora, qwen, hunter-biden, generative-image, huwhitememes, Meme King Studio, Green Frog Labs, NSFW, text-to-image, base_model:Qwen/Qwen-Image, base_model:adapter:Qwen/Qwen-Image, license:apache-2.0, region:us
17
18
19
20
21
Qwen Image V1
This is a custom-trained **LoRA (Low-Rank Adapter)** for **Qwen Image**, fine-tuned on 85+ upscaled and varied images sourced from the infamous Hunter Biden iCloud laptop archive. Designed for **Qwen-based image generation**, this LoRA supports photorealistic and meme-style compositions for digital propaganda, viral satire, and social media chaos. Trained by [@huwhitememes](https://x.com/huwhitememes) using the [WaveSpeedAI LoRA Trainer](https://wavespeed.ai/models/wavespeed-ai/qwen-image-lora-trainer) pipeline.
## 🎯 Use Cases
gymball / FatimaFellowship-UpsideDown
README.md
model
2 matches
unography / PP-HumanSegV1-Lite
README.md
model
2 matches
tags: image matting, image segmentation, en, license:apache-2.0, region:us
12
13
14
15
16
PP-HumanSeg v1 model, released by [Paddle](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/contrib/PP-HumanSeg).
Tested on the [PP-HumanSeg-14K](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/paper.md) dataset.
| Model Name | Best Input Shape | mIou(%) | Inference Time on Arm CPU(ms) | Modle Size(MB) |
unography / PP-HumanSegV2-Lite
README.md
model
2 matches
tags: image matting, image segmentation, en, license:apache-2.0, region:us
12
13
14
15
16
PP-HumanSeg v2 model, released by [Paddle](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/contrib/PP-HumanSeg).
Tested on the [PP-HumanSeg-14K](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/paper.md) dataset.
| Model Name | Best Input Shape | mIou(%) | Inference Time on Arm CPU(ms) | Modle Size(MB) |
johko / capdec_015
README.md
model
3 matches
tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us
17
18
19
20
21
for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.
In their words:
johko / capdec_0
README.md
model
3 matches
tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us
17
18
19
20
21
for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.
In their words:
johko / capdec_001
README.md
model
3 matches
tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us
17
18
19
20
21
for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.
In their words:
johko / capdec_005
README.md
model
3 matches
tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us
17
18
19
20
21
for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.
In their words:
johko / capdec_025
README.md
model
3 matches
tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us
17
18
19
20
21
for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.
In their words:
johko / capdec_05
README.md
model
3 matches
tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us
17
18
19
20
21
for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).
Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.
In their words:
jhorwath / AI-NERD
README.md
model
1 matches
tags: Image Classification, arxiv:2212.03984, en, region:us
10
11
12
13
14
# AI-NERD
AI-NERD (Artificial Intelligence for Non-Equilibrium Relaxation Dynamics) uses unsupervised deep learning to classify relaxation behavior in complex fluids directly from X-ray Photon Correlation Spectroscopy (XPCS) data.
This repo contains a pre-trained model, which is trained on rheo-XPCS data collected at Advanced Photon Source beamline 8-ID-I.
dreMaz / AnimeInstanceSegmentation
README.md
model
1 matches
tags: image processing, segmentation, en, arxiv:2312.01943, license:mit, region:us
11
12
13
14
15
# Instance-guided Cartoon Editing with a Large-scale Dataset
[](http://arxiv.org/abs/2312.01943)
For more technical details and the codebase, please refer to our <a href="https://github.com/CartoonSegmentation/CartoonSegmentation">Github Repo</a>.