---
library_name: mlc-llm
pipeline_tag: text-generation
quantization: q4f16_1
base_model: Qwen/Qwen3-1.7B
tags:
- mlc
- android
- on-device
- vulkan
- qwen3
- quantization
license: other
---

# Qwen3-1.7B-LOMO-q4f16_1-MLC2

**MLC-LLM formatted** weights for **on-device** inference (Android / Vulkan, CPU, etc.).

- **Base model**: `Qwen/Qwen3-1.7B`
- **Quantization**: `q4f16_1` (group size 32, int4 weights + f16 scales)
- **Conversation template**: `chatml`  ← *match runtime prompt formatting with training*
- **Files**
  - `mlc-chat-config.json`
  - `params_shard_*.bin`
  - `tensor-cache.json`
  - tokenizer files (`tokenizer.json` + `vocab.json` + `merges.txt`)

## Quick test (CLI)
```bash
mlc_llm chat HF://raining-codes/Qwen3-1.7B-LOMO-q4f16_1-MLC2   --temperature 0.7 --top-p 0.9 --max-gen-len 256
```

## Notes
- This repo contains **MLC** execution format. It is **not** directly loadable with HF Transformers.
- In apps (e.g., **MLCChat** for Android), use "Add remote model" with `raining-codes/Qwen3-1.7B-LOMO-q4f16_1-MLC2`.