Full Text Search - Hugging Face

Full-text search

Search in

Models Datasets Spaces

Scope to owner or repo

+ 1,000 results

Aleistar / Gengar_toy_example

README.md

dataset

1 matches

tags: region:us

crcoder07 / dataset

README.md

dataset

1 matches

tags: size_categories:n<1K, format:imagefolder, modality:image, library:datasets, library:mlcroissant, region:us

dhirajudhani / image

README.md

model

5 matches

tags: diffusers, flux, lora, replicate, text-to-image, en, base_model:black-forest-labs/FLUX.1-dev, base_model:adapter:black-forest-labs/FLUX.1-dev, license:other, region:us

Trained on Replicate using:

nvidia / nemotron-ocr-v1

README.md

model

24 matches

tags: image, ocr, object recognition, text recognition, layout analysis, ingestion, image-to-text, en, license:other, region:us

mple image.* -->

### **Description**

The Nemotron OCR v1 model is a state-of-the-art text recognition model designed for robust end-to-end optical character recognition (OCR) on complex real-world images. It integrates three core neural network modules: a detector for text region localization, a recognizer for transcription of detected regions, and a relational model for layout and structure analysis.

qpqpqpqpqpqp / Ovis_Image_7B_fp8

README.md

model

2 matches

tags: image generation, comfyui, text-to-image, en, zh, base_model:AIDC-AI/Ovis-Image-7B, base_model:finetune:AIDC-AI/Ovis-Image-7B, license:apache-2.0, region:us

<img src=/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F636f4c6b5d2050767e4a1491%2FcfsnngElzYv8DbTKsLohl.png width="40%"/>Enjoy!

nvidia / nemotron-page-elements-v3

README.md

model

16 matches

tags: image, detection, pdf, ingestion, yolox, object-detection, en, arxiv:2107.08430, license:other, region:us

### Description

The **Nemotron Page Elements v3** model is a specialized object detection model designed to identify and extract elements from document pages. While the underlying technology builds upon work from [Megvii Technology](https://github.com/Megvii-BaseDetection/YOLOX), we developed our own base model through complete retraining rather than using pre-trained weights. YOLOX is an anchor-free version of YOLO (You Only Look Once), this model combines a simpler architecture with enhanced performance. The model is trained to detect **tables**, **charts**, **infographics**, **titles**, **header/footers** and **texts** in documents.

xinyu1205 / recognize_anything_model

README.md

model

12 matches

tags: image tagging, image captioning, image-to-text, en, arxiv:2306.03514, arxiv:2303.05657, license:mit, region:us

rong Image Tagging Model </a> and <a href="https://tag2text.github.io/">Tag2Text: Guiding Vision-Language Model via Image Tagging</a>.

**Recognition and localization are two foundation computer vision tasks.**

- **The Segment Anything Model (SAM)** excels in **localization capabilities**, while it falls short when it comes to **recognition tasks**.

- **The Recognize Anything Model (RAM) and Tag2Text** exhibits **exceptional recognition abilities**, in terms of **both accuracy and scope**.

kviai / Kvi-Upscale-V1

README.md

model

5 matches

tags: diffusers, Image Upscaling, Img2Img, image-to-image, en, license:cc-by-4.0, region:us

### Image Upscaling Model

This repository contains the PyTorch model for upscaling images. The model has been trained to upscale low-resolution images to higher resolution using convolutional neural networks.

## Model Details

huwhitememes / laptophunterbiden_v1-qwen_image

README.md

model

6 matches

tags: image, lora, qwen, hunter-biden, generative-image, huwhitememes, Meme King Studio, Green Frog Labs, NSFW, text-to-image, base_model:Qwen/Qwen-Image, base_model:adapter:Qwen/Qwen-Image, license:apache-2.0, region:us

This is a custom-trained **LoRA (Low-Rank Adapter)** for **Qwen Image**, fine-tuned on 85+ upscaled and varied images sourced from the infamous Hunter Biden iCloud laptop archive. Designed for **Qwen-based image generation**, this LoRA supports photorealistic and meme-style compositions for digital propaganda, viral satire, and social media chaos. Trained by [@huwhitememes](https://x.com/huwhitememes) using the [WaveSpeedAI LoRA Trainer](https://wavespeed.ai/models/wavespeed-ai/qwen-image-lora-trainer) pipeline.

## 🎯 Use Cases

gymball / FatimaFellowship-UpsideDown

README.md

model

2 matches

tags: Image Classification, en, dataset:cifar100, license:unlicense, region:us

This is part of my submission for the Fatima Fellowship Selection Task.

unography / PP-HumanSegV1-Lite

README.md

model

2 matches

tags: image matting, image segmentation, en, license:apache-2.0, region:us

PP-HumanSeg v1 model, released by [Paddle](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/contrib/PP-HumanSeg).

Tested on the [PP-HumanSeg-14K](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/paper.md) dataset.

| Model Name | Best Input Shape | mIou(%) | Inference Time on Arm CPU(ms) | Modle Size(MB) |

unography / PP-HumanSegV2-Lite

README.md

model

2 matches

tags: image matting, image segmentation, en, license:apache-2.0, region:us

PP-HumanSeg v2 model, released by [Paddle](https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/contrib/PP-HumanSeg).

Tested on the [PP-HumanSeg-14K](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/paper.md) dataset.

| Model Name | Best Input Shape | mIou(%) | Inference Time on Arm CPU(ms) | Modle Size(MB) |

johko / capdec_015

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).

Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.

In their words:

johko / capdec_0

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).

Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.

In their words:

johko / capdec_001

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).

Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.

In their words:

johko / capdec_005

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).

Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.

In their words:

johko / capdec_025

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).

Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.

In their words:

johko / capdec_05

README.md

model

3 matches

tags: Image Captioning, image-to-text, en, dataset:MS-COCO, dataset:Flickr30k, arxiv:2211.00575, license:apache-2.0, region:us

for Image Captioning using Noise-Injected CLIP](https://arxiv.org/pdf/2211.00575.pdf).

Their method aims to train CLIP with only text samples. Therefore they are injecting zero-mean Gaussian Noise into the text embeddings before decoding.

In their words:

jhorwath / AI-NERD

README.md

model

1 matches

tags: Image Classification, arxiv:2212.03984, en, region:us

AI-NERD (Artificial Intelligence for Non-Equilibrium Relaxation Dynamics) uses unsupervised deep learning to classify relaxation behavior in complex fluids directly from X-ray Photon Correlation Spectroscopy (XPCS) data.

This repo contains a pre-trained model, which is trained on rheo-XPCS data collected at Advanced Photon Source beamline 8-ID-I.

dreMaz / AnimeInstanceSegmentation

README.md

model

1 matches

tags: image processing, segmentation, en, arxiv:2312.01943, license:mit, region:us

# Instance-guided Cartoon Editing with a Large-scale Dataset

[![arXiv](https://img.shields.io/badge/arXiv-2312.01943-<COLOR>)](http://arxiv.org/abs/2312.01943)

For more technical details and the codebase, please refer to our <a href="https://github.com/CartoonSegmentation/CartoonSegmentation">Github Repo</a>.