Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
kshitijthakkar 's Collections
mcp-server-bench
Qwen3.5 Dense-to-MoE Weight Transfer
Large MoE Architecture Search (1B-2B)
Mobile MoE Architecture Search
OutageOdyssey
TraceMind-AI
Loggenix-MOE

Qwen3.5 Dense-to-MoE Weight Transfer

updated 9 days ago

Qwen3.5 MoE models from dual-source weight transfer (dense backbone + 35B-A3B experts). Hybrid DeltaNet + GQA attention.

Upvote
-

  • kshitijthakkar/qwen3.5-moe-0.87B-d0.8B

    Image-Text-to-Text • 1B • Updated 9 days ago • 136

  • kshitijthakkar/qwen3.5-moe-2.3B-d2B

    Image-Text-to-Text • 3B • Updated 9 days ago • 73

  • kshitijthakkar/qwen3.5-moe-4.7B-d4B

    Image-Text-to-Text • 5B • Updated 9 days ago • 82

  • kshitijthakkar/qwen3.5-tiny-test

    Image-Text-to-Text • 0.1B • Updated 16 days ago • 286

  • kshitijthakkar/qwen3.5-from-scratch-tiny

    2B • Updated 9 days ago • 73

  • kshitijthakkar/qwen3.5-0.8b-moe-from-scratch

    3B • Updated 9 days ago • 57
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs