Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
12.8
TFLOPS
11
28
100
Anurag
edwixx
Follow
Artificial7734's profile picture
21world's profile picture
robmartello's profile picture
14 followers
ยท
53 following
https://anuragkanade.com/
edwixxxx
anurag12-webster
anurag-kanade
AI & ML interests
Machine Learning, and Speech
Recent Activity
new
activity
about 10 hours ago
huggingface/InferenceSupport:
edwixx/whisper-large-hebrew-finetune
reacted
to
sagar007
's
post
with ๐ค
about 11 hours ago
๐ I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! ๐ง What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency ๐ Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) ๐ https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding ๐ **Try it yourself:** - ๐ค Model: https://huggingface.co/sagar007/multigemma - ๐ฎ Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - ๐ป GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! ๐ #multimodal #gemma #clip #llava #vision-language #pytorch
reacted
to
sagar007
's
post
with ๐ฅ
about 11 hours ago
๐ I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! ๐ง What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency ๐ Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) ๐ https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding ๐ **Try it yourself:** - ๐ค Model: https://huggingface.co/sagar007/multigemma - ๐ฎ Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - ๐ป GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! ๐ #multimodal #gemma #clip #llava #vision-language #pytorch
View all activity
Organizations
edwixx
's Spaces
2
Sort:ย Recently updated
Build error
Dejavugfbh Gemma 2b Mt G2E
๐ข
Running
Life Expectancy Pred
๐จ
Predict life expectancy based on country data