LCO-Embedding-Omni-7B-GGUF

GGUF quantizations of LCO-Embedding/LCO-Embedding-Omni-7B for use with llama.cpp.

Converted using ht-llama.cpp, a fork with added support for the Qwen2_5OmniThinkerForConditionalGeneration architecture.

About the model

LCO-Embedding-Omni-7B is a 9B-parameter multimodal embedding model based on the Thinker component of Qwen 2.5 Omni, fine-tuned with LoRA and contrastive learning to produce 3584-dimensional embeddings from text, images, audio, and video. It achieves state-of-the-art on MIEB-Lite (68.8 mean across 51 tasks), outperforming models trained on 21x more data. Uses last-token pooling.

See Scaling Language-Centric Omnimodal Representation Learning (NeurIPS 2025) for details.

Original model benchmarks (MIEB-Lite, 51 tasks)

Model	Data	Mean
GME (7B)	8.0M pairs	64.5
mmE5 (11B)	2.1M pairs	61.8
Voyage Multimodal 3	--	58.1
LCO-Emb-Omni (7B)	370k pairs	68.8

Available files

Standard quantizations

File	Quant	Size	Description
`LCO-Embedding-Omni-7B-BF16.gguf`	BF16	15 GB	Full precision, no quality loss
`LCO-Embedding-Omni-7B-Q8_0.gguf`	Q8_0	7.6 GB	Near-lossless quantization
`LCO-Embedding-Omni-7B-Q4_K_M.gguf`	Q4_K_M	4.4 GB	Good balance of quality and size
`LCO-Embedding-Omni-7B-Q3_K_M.gguf`	Q3_K_M	3.6 GB	Smaller, some quality loss
`LCO-Embedding-Omni-7B-Q2_K.gguf`	Q2_K	2.9 GB	Smallest, more quality loss

Importance matrix (imatrix) quantizations

Quantized with an importance matrix computed from WikiText-2 calibration data for improved quality at low bit widths.

File	Quant	Size	Description
`LCO-Embedding-Omni-7B-IQ4_XS.gguf`	IQ4_XS	4.0 GB	4.25 bpw, imatrix-optimized
`LCO-Embedding-Omni-7B-IQ3_M.gguf`	IQ3_M	3.4 GB	3.66 bpw, imatrix-optimized
`LCO-Embedding-Omni-7B-IQ3_XS.gguf`	IQ3_XS	3.2 GB	3.3 bpw, imatrix-optimized
`LCO-Embedding-Omni-7B-IQ2_M.gguf`	IQ2_M	2.6 GB	2.7 bpw, imatrix-optimized

Multimodal projection

File	Quant	Size	Description
`mmproj-LCO-Embedding-Omni-7b-F16.gguf`	F16	2.5 GB	Vision + audio projection (required for multimodal)

For text-only embedding, you only need one of the text model GGUFs. For multimodal (image/audio/video), you also need the mmproj file.

Quantization quality

Measured on 8 diverse text sentences (3584-dim embeddings). BF16 is the reference.

Embedding quality vs BF16

Quant	Type	Size	Speedup	Mean Abs Diff	Pearson r	Spearman rho	Vec Cosine
Q8_0	Standard	7.6 GB	1.7x	0.0025	0.9997	0.9956	0.9998
Q4_K_M	Standard	4.4 GB	2.4x	0.0073	0.9974	0.9951	0.9948
IQ4_XS	imatrix	4.0 GB	2.6x	0.0145	0.9942	0.9918	0.9944
Q3_K_M	Standard	3.6 GB	2.8x	0.0165	0.9839	0.9770	0.9825
IQ3_M	imatrix	3.4 GB	2.9x	0.0248	0.9825	0.9693	0.9825
IQ3_XS	imatrix	3.2 GB	3.0x	0.0224	0.9753	0.9600	0.9797
Q2_K	Standard	2.9 GB	3.1x	0.0429	0.9126	0.8506	0.9111
IQ2_M	imatrix	2.6 GB	3.4x	0.0465	0.8636	0.7258	0.9395

pgvector retrieval quality (query with quant, corpus in BF16)

Quant	Recall@1	Recall@3	Mean Drift	Max Drift
Q8_0	100%	100%	0.0002	0.0003
Q4_K_M	100%	100%	0.0052	0.0092
Q3_K_M	100%	100%	0.0175	0.0354
Q2_K	100%	100%	0.0889	0.1380

Recommendations:

Q8_0 — essentially lossless, best quality
Q4_K_M — excellent quality/size tradeoff for most use cases
IQ3_M / IQ3_XS — best options for constrained environments, smaller than Q3_K_M with comparable quality
Q2_K / IQ2_M — functional but noticeable embedding drift (~9%), use only when size is critical

Usage

Build llama.cpp

git clone https://github.com/heiervang-technologies/ht-llama.cpp
cd ht-llama.cpp
cmake -B build
cmake --build build --target llama-embedding llama-server -j$(nproc)

Text embeddings (CLI)

./build/bin/llama-embedding \
  -m LCO-Embedding-Omni-7B-Q8_0.gguf \
  --pooling last \
  -p "Your text here"

Text embeddings (server)

./build/bin/llama-server \
  -m LCO-Embedding-Omni-7B-Q8_0.gguf \
  --embedding --pooling last

curl -s http://localhost:8080/embeddings \
  -d '{"content": "Your text here"}'

Multimodal embeddings (vision + audio)

Requires the mmproj file:

./build/bin/llama-server \
  -m LCO-Embedding-Omni-7B-Q8_0.gguf \
  --mmproj mmproj-LCO-Embedding-Omni-7b-F16.gguf \
  --embedding --pooling last

# Image embedding (base64-encoded image)
curl -s http://localhost:8080/embeddings \
  -d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-image-data>"]}]}'

# Audio embedding (base64-encoded WAV)
curl -s http://localhost:8080/embeddings \
  -d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-audio-data>"]}]}'

JSON output (for programmatic use)

./build/bin/llama-embedding \
  -m LCO-Embedding-Omni-7B-Q8_0.gguf \
  --pooling last \
  --embd-output-format json \
  -p "Your text here"

Notes

This is a quantization of LCO-Embedding/LCO-Embedding-Omni-7B -- see the original model card for benchmarks, training details, and licensing
The --pooling last flag is required -- this model uses last-token pooling, not mean pooling
Embedding dimensions: 3584
All three modalities (text, vision, audio) have been tested and verified working
Contributions and bug reports welcome at ht-llama.cpp

Citations

LCO-Embedding

@article{xiao2025scaling,
  title={Scaling Language-Centric Omnimodal Representation Learning},
  author={Xiao, Chenghao and Chan, Hou Pong and Zhang, Hao and Xu, Weiwen and Aljunied, Mahani and Rong, Yu},
  journal={arXiv preprint arXiv:2510.11693},
  year={2025}
}

Qwen 2.5 Omni

@article{Qwen2.5-Omni,
  title={Qwen2.5-Omni Technical Report},
  author={Jin Xu and Zhifang Guo and Jinzheng He and Hangrui Hu and Ting He and Shuai Bai and Keqin Chen and Jialin Wang and Yang Fan and Kai Dang and Bin Zhang and Xiong Wang and Yunfei Chu and Junyang Lin},
  journal={arXiv preprint arXiv:2503.20215},
  year={2025}
}

Downloads last month: 654

GGUF

Model size

8B params

Architecture

qwen2vl

Hardware compatibility

2-bit

3-bit

4-bit

8-bit

16-bit

Model tree for marksverdhei/LCO-Embedding-Omni-7B-GGUF

Base model

LCO-Embedding/LCO-Embedding-Omni-7B

Quantized

(1)

this model

Papers for marksverdhei/LCO-Embedding-Omni-7B-GGUF

Scaling Language-Centric Omnimodal Representation Learning

Paper • 2510.11693 • Published Oct 13, 2025 • 104

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 170

Evaluation results

Embedding Dimensions on MIEB-Lite
self-reported

3584.000