ANEMLL Gemma 3 270M β€” Apple Neural Engine (Monolithic, LUT6)

Pre-converted Gemma 3 270M IT model optimized for Apple Neural Engine inference using ANEMLL.

This is a monolithic (single-file) CoreML model with in-model argmax, ideal for quick testing and development on any Apple Silicon device.

Model Details

Property Value
Base Model google/gemma-3-270m-it
Architecture Gemma 3 (gemma3_text)
Parameters 270M
Context Length 512
Batch Size 64
Quantization LUT6 (6-bit, per-channel group size 4)
Argmax In-model (outputs token IDs)
Format Monolithic (single CoreML file)
Dedup ANEMLL-Dedup enabled
ANEMLL Version 0.3.5
Model Size ~335 MB (compiled)

Files

File Size Description
gemma3_monolithic_full_lut6.mlmodelc/ 335 MB Compiled CoreML model (infer + prefill)
meta.yaml 2 KB Model configuration
tokenizer.json 32 MB Tokenizer data
tokenizer.model 4.5 MB SentencePiece model
tokenizer_config.json 1.1 MB Tokenizer configuration
chat_template.jinja 1.5 KB Chat template
config.json 66 B iOS tokenizer config

Quick Start

Download

# Clone with git-lfs
git lfs install
git clone https://huggingface.co/anemll/anemll-gemma-3-270m-it-ctx512-lut6

# Or use huggingface-cli
huggingface-cli download anemll/anemll-gemma-3-270m-it-ctx512-lut6 \
  --local-dir ~/Models/ANE/gemma3-270m

Run with ANEMLL

# Install ANEMLL
git clone https://github.com/Anemll/Anemll.git
cd Anemll
./create_uv_env.sh
source env-anemll/bin/activate
./install_dependencies.sh

# Chat with the model
python tests/chat.py \
  --meta ~/Models/ANE/gemma3-270m/meta.yaml \
  --prompt "Who are you?"

# Full conversation mode
python tests/chat_full.py \
  --meta ~/Models/ANE/gemma3-270m/meta.yaml

Run with ANEMLL Chat (macOS app)

  1. Open ANEMLL Chat
  2. Go to Models > Link Local Model
  3. Select the downloaded model directory
  4. Start chatting

Conversion

This model was converted using:

python tests/test_gemma3_model.py \
  --model google/gemma-3-270m-it \
  --lut 6,4 \
  --lut-embeddings 6,4 \
  --lut-lmhead 6,4 \
  --context 512 \
  --batch 64

Or equivalently:

./anemll/utils/convert_monolith.sh \
  --model google/gemma-3-270m-it \
  --output ./output \
  --lut 6,4 \
  --lut-embeddings 6,4 \
  --lut-lmhead 6,4 \
  --context 512 \
  --batch 64 \
  --argmax \
  --prefix gemma3

Notes

  • 270M is a tiny model β€” it's designed for testing the conversion pipeline, not for production quality output. For better quality, use larger models (1B+).
  • Runs entirely on Apple Neural Engine β€” no GPU or cloud required.
  • Supports Apple Silicon: M1, M2, M3, M4 and later.
  • Inference speed: ~200+ tokens/sec on M-series chips.

License

This model conversion is released under the MIT license. The base model (Gemma 3) is subject to Google's Gemma Terms of Use.

Links

Downloads last month
50
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for anemll/anemll-gemma-3-270m-it-MONO-ctx512-lut6

Finetuned
(1017)
this model

Collection including anemll/anemll-gemma-3-270m-it-MONO-ctx512-lut6