Use it from Swift
Add the package
Package.swift:
.package(url: "https://github.com/john-rocky/CoreML-LLM", branch: "main"),
// In your target:
.product(name: "CoreMLLLM", package: "CoreML-LLM"),
Platforms: iOS 18+ / macOS 15+.
Download + chat (one call)
import CoreMLLLM
let llm = try await CoreMLLLM.load(repo: "mlboydaisuke/qwen3-vl-2b-coreml")
let stream = try await llm.generate(
[CoreMLLLM.Message(role: .user, content: "Hello!")],
maxTokens: 256
)
for await chunk in stream { print(chunk, terminator: "") }
With an image
import CoreGraphics
let cgImage: CGImage = ... // your CGImage
let stream = try await llm.generate(
[CoreMLLLM.Message(role: .user,
content: "What's in this image?")],
image: cgImage,
maxTokens: 256
)
for await chunk in stream { print(chunk, terminator: "") }
Qwen3-VL 2B β Core ML (recurrent path, v1.4.0)
Core ML port of Qwen/Qwen3-VL-2B-Instruct β text + vision, INT8 chunked, runs on iPhone A18 ANE.
Heads up: for new work prefer the stateful variant at
mlboydaisuke/qwen3-vl-2b-stateful-coremlβ same model, but KV cache lives inside ANE viaMLState+slice_updateso memory is 6Γ lower (264 MB vs 1.7 GB) and decode is 2Γ faster (24 vs 10 tok/s) on iPhone 17 Pro. This recurrent repo is kept for backward compatibility with the v1.4.0 runtime.
Files
qwen3_vl_2b_decode_chunks/
βββ chunk_0.mlpackage # 353 MB β text path: embed + L0-6
βββ chunk_1.mlpackage # 353 MB β L7-13
βββ chunk_2.mlpackage # 353 MB β L14-20
βββ chunk_3.mlpackage # 353 MB β L21-27
βββ chunk_head.mlpackage # 311 MB β final_norm + lm_head + argmax
βββ chunk_0_vision.mlpackage # 353 MB β chunk_0 with DeepStack injection
βββ prefill_chunk_{0..3}.mlpackage # 353 MB each β T=32 batched prefill bodies
βββ prefill_chunk_0_vision.mlpackage # vision-aware prefill chunk_0
βββ embed_weight.bin # 622 MB β raw fp16 embed (151936 Γ 2048)
qwen3_vl_2b_vision/
βββ vision.mlpackage # 406 MB β 448Γ448 β 196 tokens + 3 DeepStack taps
The vision encoder is loaded only when an image is in the prompt. DeepStack taps from vision layers 5/11/17 are injected into text layers 0/1/2 via chunk_0_vision.
What this repo does NOT ship
- No
model_config.jsonβ Core ML serializes shapes into each.mlpackage.coremltoolsopens them without external config. - No tokenizer / processor β fetch from the base model:
from transformers import AutoTokenizer, AutoProcessor
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-VL-2B-Instruct")
proc = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-2B-Instruct")
Vision preprocessing note: Qwen3-VL uses
mean=std=0.5(not the CLIP defaults), and the vision encoder expectspixel_values (3, 2, 448, 448)already pre-patchified β seeconversion/build_qwen3_vl_2b_vision.pyfor the exact transform.
Standalone usage (Python / Mac)
import coremltools as ct, numpy as np
from huggingface_hub import snapshot_download
local = snapshot_download("mlboydaisuke/qwen3-vl-2b-coreml")
root = f"{local}/qwen3_vl_2b_decode_chunks"
decode_chunks = [ct.models.MLModel(f"{root}/chunk_{i}.mlpackage") for i in range(4)]
head = ct.models.MLModel(f"{root}/chunk_head.mlpackage")
embed = np.memmap(f"{root}/embed_weight.bin",
dtype=np.float16, mode="r",
shape=(151936, 2048))
vision = ct.models.MLModel(
f"{local}/qwen3_vl_2b_vision/vision.mlpackage")
For text-only prompts, skip the vision encoder and chain chunk_0..3 β chunk_head per step. For image prompts, run vision.predict once, swap chunk_0 with chunk_0_vision and inject the 3 DeepStack tensors during the first token of each image span.
Reference loop: conversion/qwen3_vl_2b_parity.py.
iOS / Mac app
Swift runtime: Qwen3VL2BGenerator.swift. Pick Qwen3-VL 2B (recurrent, v1.4.0) in the model picker.
Architecture
28-layer GQA text backbone + ViT vision tower.
- Text: hidden=2048, num_heads=16, num_kv=8, head_dim=128, vocab=151936, tie_embeddings=True, rope_theta=5e6, mRoPE section=[24,20,20] interleaved (collapses to standard 1D RoPE for text-only).
- Vision: 448Γ448 fixed, 196 tokens after spatial_merge=2, DeepStack taps at vision layers 5/11/17 β text layers 0/1/2.
- Chunks: 4 body chunks Γ 7 layers each.
License
Apache 2.0 (inherits from the base model).
- Downloads last month
- 63
Model tree for mlboydaisuke/qwen3-vl-2b-coreml
Base model
Qwen/Qwen3-VL-2B-Instruct