--- library_name: mlc-llm pipeline_tag: text-generation quantization: q4f16_1 base_model: Qwen/Qwen3-1.7B tags: - mlc - android - on-device - vulkan - qwen3 - quantization license: other --- # Qwen3-1.7B-LOMO-q4f16_1-MLC2 **MLC-LLM formatted** weights for **on-device** inference (Android / Vulkan, CPU, etc.). - **Base model**: `Qwen/Qwen3-1.7B` - **Quantization**: `q4f16_1` (group size 32, int4 weights + f16 scales) - **Conversation template**: `chatml` ← *match runtime prompt formatting with training* - **Files** - `mlc-chat-config.json` - `params_shard_*.bin` - `tensor-cache.json` - tokenizer files (`tokenizer.json` + `vocab.json` + `merges.txt`) ## Quick test (CLI) ```bash mlc_llm chat HF://raining-codes/Qwen3-1.7B-LOMO-q4f16_1-MLC2 --temperature 0.7 --top-p 0.9 --max-gen-len 256 ``` ## Notes - This repo contains **MLC** execution format. It is **not** directly loadable with HF Transformers. - In apps (e.g., **MLCChat** for Android), use "Add remote model" with `raining-codes/Qwen3-1.7B-LOMO-q4f16_1-MLC2`.