ANEMLL Gemma 3 270M — Apple Neural Engine (Monolithic, LUT6)

Pre-converted Gemma 3 270M IT model optimized for Apple Neural Engine inference using ANEMLL.

This is a monolithic (single-file) CoreML model with in-model argmax, ideal for quick testing and development on any Apple Silicon device.

Model Details

Property	Value
Base Model	`google/gemma-3-270m-it`
Architecture	Gemma 3 (gemma3_text)
Parameters	270M
Context Length	512
Batch Size	64
Quantization	LUT6 (6-bit, per-channel group size 4)
Argmax	In-model (outputs token IDs)
Format	Monolithic (single CoreML file)
Dedup	ANEMLL-Dedup enabled
ANEMLL Version	0.3.5
Model Size	~335 MB (compiled)

Files

File	Size	Description
`gemma3_monolithic_full_lut6.mlmodelc/`	335 MB	Compiled CoreML model (infer + prefill)
`meta.yaml`	2 KB	Model configuration
`tokenizer.json`	32 MB	Tokenizer data
`tokenizer.model`	4.5 MB	SentencePiece model
`tokenizer_config.json`	1.1 MB	Tokenizer configuration
`chat_template.jinja`	1.5 KB	Chat template
`config.json`	66 B	iOS tokenizer config

Quick Start

Download

# Clone with git-lfs
git lfs install
git clone https://huggingface.co/anemll/anemll-gemma-3-270m-it-ctx512-lut6

# Or use huggingface-cli
huggingface-cli download anemll/anemll-gemma-3-270m-it-ctx512-lut6 \
  --local-dir ~/Models/ANE/gemma3-270m

Run with ANEMLL

# Install ANEMLL
git clone https://github.com/Anemll/Anemll.git
cd Anemll
./create_uv_env.sh
source env-anemll/bin/activate
./install_dependencies.sh

# Chat with the model
python tests/chat.py \
  --meta ~/Models/ANE/gemma3-270m/meta.yaml \
  --prompt "Who are you?"

# Full conversation mode
python tests/chat_full.py \
  --meta ~/Models/ANE/gemma3-270m/meta.yaml

Run with ANEMLL Chat (macOS app)

Open ANEMLL Chat
Go to Models > Link Local Model
Select the downloaded model directory
Start chatting

Conversion

This model was converted using:

python tests/test_gemma3_model.py \
  --model google/gemma-3-270m-it \
  --lut 6,4 \
  --lut-embeddings 6,4 \
  --lut-lmhead 6,4 \
  --context 512 \
  --batch 64

Or equivalently:

./anemll/utils/convert_monolith.sh \
  --model google/gemma-3-270m-it \
  --output ./output \
  --lut 6,4 \
  --lut-embeddings 6,4 \
  --lut-lmhead 6,4 \
  --context 512 \
  --batch 64 \
  --argmax \
  --prefix gemma3

Notes

270M is a tiny model — it's designed for testing the conversion pipeline, not for production quality output. For better quality, use larger models (1B+).
Runs entirely on Apple Neural Engine — no GPU or cloud required.
Supports Apple Silicon: M1, M2, M3, M4 and later.
Inference speed: ~200+ tokens/sec on M-series chips.