Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -4,14 +4,37 @@ library_name: llama.cpp
|
|
| 4 |
license: apache-2.0
|
| 5 |
language:
|
| 6 |
- en
|
|
|
|
|
|
|
| 7 |
tags:
|
| 8 |
- gguf
|
| 9 |
- embedding
|
| 10 |
- multimodal
|
| 11 |
- qwen2
|
| 12 |
- qwen2.5-omni
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
pipeline_tag: feature-extraction
|
|
|
|
| 14 |
quantized_by: marksverdhei
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
---
|
| 16 |
|
| 17 |
# LCO-Embedding-Omni-7B-GGUF
|
|
@@ -22,10 +45,23 @@ Converted using [ht-llama.cpp](https://github.com/heiervang-technologies/ht-llam
|
|
| 22 |
|
| 23 |
## About the model
|
| 24 |
|
| 25 |
-
LCO-Embedding-Omni-7B is a multimodal embedding model based on the Thinker component of [Qwen 2.5 Omni](https://huggingface.co/Qwen/Qwen2.5-Omni-7B), fine-tuned with contrastive learning to produce embeddings from text, images, audio, and video. It
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
## Available files
|
| 28 |
|
|
|
|
|
|
|
| 29 |
| File | Quant | Size | Description |
|
| 30 |
|------|-------|------|-------------|
|
| 31 |
| `LCO-Embedding-Omni-7B-BF16.gguf` | BF16 | 15 GB | Full precision, no quality loss |
|
|
@@ -33,10 +69,60 @@ LCO-Embedding-Omni-7B is a multimodal embedding model based on the Thinker compo
|
|
| 33 |
| `LCO-Embedding-Omni-7B-Q4_K_M.gguf` | Q4_K_M | 4.4 GB | Good balance of quality and size |
|
| 34 |
| `LCO-Embedding-Omni-7B-Q3_K_M.gguf` | Q3_K_M | 3.6 GB | Smaller, some quality loss |
|
| 35 |
| `LCO-Embedding-Omni-7B-Q2_K.gguf` | Q2_K | 2.9 GB | Smallest, more quality loss |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
For text-only embedding, you only need one of the text model GGUFs. For multimodal (image/audio/video), you also need the `mmproj` file.
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
## Usage
|
| 41 |
|
| 42 |
### Build llama.cpp
|
|
@@ -45,10 +131,10 @@ For text-only embedding, you only need one of the text model GGUFs. For multimod
|
|
| 45 |
git clone https://github.com/heiervang-technologies/ht-llama.cpp
|
| 46 |
cd ht-llama.cpp
|
| 47 |
cmake -B build
|
| 48 |
-
cmake --build build --target llama-embedding -j$(nproc)
|
| 49 |
```
|
| 50 |
|
| 51 |
-
### Text embeddings
|
| 52 |
|
| 53 |
```bash
|
| 54 |
./build/bin/llama-embedding \
|
|
@@ -57,32 +143,76 @@ cmake --build build --target llama-embedding -j$(nproc)
|
|
| 57 |
-p "Your text here"
|
| 58 |
```
|
| 59 |
|
| 60 |
-
###
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
```bash
|
| 65 |
./build/bin/llama-server \
|
| 66 |
-m LCO-Embedding-Omni-7B-Q8_0.gguf \
|
| 67 |
--mmproj mmproj-LCO-Embedding-Omni-7b-F16.gguf \
|
| 68 |
-
--embedding
|
| 69 |
-
--pooling last
|
| 70 |
```
|
| 71 |
|
| 72 |
-
Then use the `/embeddings` endpoint:
|
| 73 |
-
|
| 74 |
```bash
|
| 75 |
-
#
|
| 76 |
curl -s http://localhost:8080/embeddings \
|
| 77 |
-
-d '{"content": "
|
| 78 |
|
| 79 |
-
#
|
| 80 |
curl -s http://localhost:8080/embeddings \
|
| 81 |
-
-d '{"content": [{"prompt_string": "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
```
|
| 83 |
|
| 84 |
## Notes
|
| 85 |
|
| 86 |
- This is a quantization of [LCO-Embedding/LCO-Embedding-Omni-7B](https://huggingface.co/LCO-Embedding/LCO-Embedding-Omni-7B) -- see the original model card for benchmarks, training details, and licensing
|
| 87 |
-
-
|
| 88 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
license: apache-2.0
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
+
- zh
|
| 8 |
+
- multilingual
|
| 9 |
tags:
|
| 10 |
- gguf
|
| 11 |
- embedding
|
| 12 |
- multimodal
|
| 13 |
- qwen2
|
| 14 |
- qwen2.5-omni
|
| 15 |
+
- feature-extraction
|
| 16 |
+
- text-embedding
|
| 17 |
+
- image-embedding
|
| 18 |
+
- audio-embedding
|
| 19 |
+
- imatrix
|
| 20 |
pipeline_tag: feature-extraction
|
| 21 |
+
model_type: qwen2vl
|
| 22 |
quantized_by: marksverdhei
|
| 23 |
+
datasets:
|
| 24 |
+
- MIEB
|
| 25 |
+
model_name: LCO-Embedding-Omni-7B-GGUF
|
| 26 |
+
model-index:
|
| 27 |
+
- name: LCO-Embedding-Omni-7B-GGUF
|
| 28 |
+
results:
|
| 29 |
+
- task:
|
| 30 |
+
type: feature-extraction
|
| 31 |
+
metrics:
|
| 32 |
+
- name: Embedding Dimensions
|
| 33 |
+
type: embedding_dimensions
|
| 34 |
+
value: 3584
|
| 35 |
+
- name: Pooling Method
|
| 36 |
+
type: pooling
|
| 37 |
+
value: last-token
|
| 38 |
---
|
| 39 |
|
| 40 |
# LCO-Embedding-Omni-7B-GGUF
|
|
|
|
| 45 |
|
| 46 |
## About the model
|
| 47 |
|
| 48 |
+
[LCO-Embedding-Omni-7B](https://huggingface.co/LCO-Embedding/LCO-Embedding-Omni-7B) is a 9B-parameter multimodal embedding model based on the Thinker component of [Qwen 2.5 Omni](https://huggingface.co/Qwen/Qwen2.5-Omni-7B), fine-tuned with LoRA and contrastive learning to produce 3584-dimensional embeddings from text, images, audio, and video. It achieves **state-of-the-art on [MIEB-Lite](https://huggingface.co/spaces/MTEB/MIEB) (68.8 mean across 51 tasks)**, outperforming models trained on 21x more data. Uses last-token pooling.
|
| 49 |
+
|
| 50 |
+
See [Scaling Language-Centric Omnimodal Representation Learning](https://arxiv.org/abs/2510.11693) (NeurIPS 2025) for details.
|
| 51 |
+
|
| 52 |
+
### Original model benchmarks (MIEB-Lite, 51 tasks)
|
| 53 |
+
|
| 54 |
+
| Model | Data | Mean |
|
| 55 |
+
|-------|------|------|
|
| 56 |
+
| GME (7B) | 8.0M pairs | 64.5 |
|
| 57 |
+
| mmE5 (11B) | 2.1M pairs | 61.8 |
|
| 58 |
+
| Voyage Multimodal 3 | -- | 58.1 |
|
| 59 |
+
| **LCO-Emb-Omni (7B)** | **370k pairs** | **68.8** |
|
| 60 |
|
| 61 |
## Available files
|
| 62 |
|
| 63 |
+
### Standard quantizations
|
| 64 |
+
|
| 65 |
| File | Quant | Size | Description |
|
| 66 |
|------|-------|------|-------------|
|
| 67 |
| `LCO-Embedding-Omni-7B-BF16.gguf` | BF16 | 15 GB | Full precision, no quality loss |
|
|
|
|
| 69 |
| `LCO-Embedding-Omni-7B-Q4_K_M.gguf` | Q4_K_M | 4.4 GB | Good balance of quality and size |
|
| 70 |
| `LCO-Embedding-Omni-7B-Q3_K_M.gguf` | Q3_K_M | 3.6 GB | Smaller, some quality loss |
|
| 71 |
| `LCO-Embedding-Omni-7B-Q2_K.gguf` | Q2_K | 2.9 GB | Smallest, more quality loss |
|
| 72 |
+
|
| 73 |
+
### Importance matrix (imatrix) quantizations
|
| 74 |
+
|
| 75 |
+
Quantized with an importance matrix computed from WikiText-2 calibration data for improved quality at low bit widths.
|
| 76 |
+
|
| 77 |
+
| File | Quant | Size | Description |
|
| 78 |
+
|------|-------|------|-------------|
|
| 79 |
+
| `LCO-Embedding-Omni-7B-IQ4_XS.gguf` | IQ4_XS | 4.0 GB | 4.25 bpw, imatrix-optimized |
|
| 80 |
+
| `LCO-Embedding-Omni-7B-IQ3_M.gguf` | IQ3_M | 3.4 GB | 3.66 bpw, imatrix-optimized |
|
| 81 |
+
| `LCO-Embedding-Omni-7B-IQ3_XS.gguf` | IQ3_XS | -- | 3.3 bpw, imatrix-optimized |
|
| 82 |
+
| `LCO-Embedding-Omni-7B-IQ2_M.gguf` | IQ2_M | -- | 2.7 bpw, imatrix-optimized |
|
| 83 |
+
| `LCO-Embedding-Omni-7B-IQ2_XS.gguf` | IQ2_XS | -- | 2.31 bpw, imatrix-optimized |
|
| 84 |
+
|
| 85 |
+
### Multimodal projection
|
| 86 |
+
|
| 87 |
+
| File | Quant | Size | Description |
|
| 88 |
+
|------|-------|------|-------------|
|
| 89 |
+
| `mmproj-LCO-Embedding-Omni-7b-F16.gguf` | F16 | 2.5 GB | Vision + audio projection (required for multimodal) |
|
| 90 |
|
| 91 |
For text-only embedding, you only need one of the text model GGUFs. For multimodal (image/audio/video), you also need the `mmproj` file.
|
| 92 |
|
| 93 |
+
## Quantization quality
|
| 94 |
+
|
| 95 |
+
Measured on 8 diverse text sentences (3584-dim embeddings). BF16 is the reference.
|
| 96 |
+
|
| 97 |
+
### Similarity matrix quality vs BF16
|
| 98 |
+
|
| 99 |
+
| Quant | Speedup | Mean Abs Diff | Max Diff | Pearson r | Spearman rho |
|
| 100 |
+
|-------|---------|---------------|----------|-----------|--------------|
|
| 101 |
+
| Q8_0 | 1.7x | 0.0025 | 0.009 | 0.9997 | 0.9956 |
|
| 102 |
+
| Q4_K_M | 2.4x | 0.0073 | 0.022 | 0.9974 | 0.9951 |
|
| 103 |
+
| Q3_K_M | 2.8x | 0.0165 | 0.063 | 0.9839 | 0.9770 |
|
| 104 |
+
| Q2_K | 3.1x | 0.0429 | 0.175 | 0.9126 | 0.8506 |
|
| 105 |
+
|
| 106 |
+
### Embedding vector cosine similarity vs BF16
|
| 107 |
+
|
| 108 |
+
| Quant | Mean | Min | Max |
|
| 109 |
+
|-------|------|-----|-----|
|
| 110 |
+
| Q8_0 | 0.9998 | 0.9997 | 0.9999 |
|
| 111 |
+
| Q4_K_M | 0.9948 | 0.9908 | 0.9965 |
|
| 112 |
+
| Q3_K_M | 0.9825 | 0.9646 | 0.9882 |
|
| 113 |
+
| Q2_K | 0.9111 | 0.8620 | 0.9432 |
|
| 114 |
+
|
| 115 |
+
### pgvector retrieval quality (query with quant, corpus in BF16)
|
| 116 |
+
|
| 117 |
+
| Quant | Recall@1 | Recall@3 | Mean Drift | Max Drift |
|
| 118 |
+
|-------|----------|----------|------------|-----------|
|
| 119 |
+
| Q8_0 | 100% | 100% | 0.0002 | 0.0003 |
|
| 120 |
+
| Q4_K_M | 100% | 100% | 0.0052 | 0.0092 |
|
| 121 |
+
| Q3_K_M | 100% | 100% | 0.0175 | 0.0354 |
|
| 122 |
+
| Q2_K | 100% | 100% | 0.0889 | 0.1380 |
|
| 123 |
+
|
| 124 |
+
**Recommendation:** Q8_0 is essentially lossless for retrieval. Q4_K_M offers an excellent quality/size tradeoff. Q3_K_M is viable for constrained environments. Q2_K works but shows meaningful embedding drift (~9%).
|
| 125 |
+
|
| 126 |
## Usage
|
| 127 |
|
| 128 |
### Build llama.cpp
|
|
|
|
| 131 |
git clone https://github.com/heiervang-technologies/ht-llama.cpp
|
| 132 |
cd ht-llama.cpp
|
| 133 |
cmake -B build
|
| 134 |
+
cmake --build build --target llama-embedding llama-server -j$(nproc)
|
| 135 |
```
|
| 136 |
|
| 137 |
+
### Text embeddings (CLI)
|
| 138 |
|
| 139 |
```bash
|
| 140 |
./build/bin/llama-embedding \
|
|
|
|
| 143 |
-p "Your text here"
|
| 144 |
```
|
| 145 |
|
| 146 |
+
### Text embeddings (server)
|
| 147 |
|
| 148 |
+
```bash
|
| 149 |
+
./build/bin/llama-server \
|
| 150 |
+
-m LCO-Embedding-Omni-7B-Q8_0.gguf \
|
| 151 |
+
--embedding --pooling last
|
| 152 |
+
|
| 153 |
+
curl -s http://localhost:8080/embeddings \
|
| 154 |
+
-d '{"content": "Your text here"}'
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
### Multimodal embeddings (vision + audio)
|
| 158 |
+
|
| 159 |
+
Requires the mmproj file:
|
| 160 |
|
| 161 |
```bash
|
| 162 |
./build/bin/llama-server \
|
| 163 |
-m LCO-Embedding-Omni-7B-Q8_0.gguf \
|
| 164 |
--mmproj mmproj-LCO-Embedding-Omni-7b-F16.gguf \
|
| 165 |
+
--embedding --pooling last
|
|
|
|
| 166 |
```
|
| 167 |
|
|
|
|
|
|
|
| 168 |
```bash
|
| 169 |
+
# Image embedding (base64-encoded image)
|
| 170 |
curl -s http://localhost:8080/embeddings \
|
| 171 |
+
-d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-image-data>"]}]}'
|
| 172 |
|
| 173 |
+
# Audio embedding (base64-encoded WAV)
|
| 174 |
curl -s http://localhost:8080/embeddings \
|
| 175 |
+
-d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-audio-data>"]}]}'
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
### JSON output (for programmatic use)
|
| 179 |
+
|
| 180 |
+
```bash
|
| 181 |
+
./build/bin/llama-embedding \
|
| 182 |
+
-m LCO-Embedding-Omni-7B-Q8_0.gguf \
|
| 183 |
+
--pooling last \
|
| 184 |
+
--embd-output-format json \
|
| 185 |
+
-p "Your text here"
|
| 186 |
```
|
| 187 |
|
| 188 |
## Notes
|
| 189 |
|
| 190 |
- This is a quantization of [LCO-Embedding/LCO-Embedding-Omni-7B](https://huggingface.co/LCO-Embedding/LCO-Embedding-Omni-7B) -- see the original model card for benchmarks, training details, and licensing
|
| 191 |
+
- The `--pooling last` flag is required -- this model uses last-token pooling, not mean pooling
|
| 192 |
+
- Embedding dimensions: 3584
|
| 193 |
+
- All three modalities (text, vision, audio) have been tested and verified working
|
| 194 |
+
- Contributions and bug reports welcome at [ht-llama.cpp](https://github.com/heiervang-technologies/ht-llama.cpp/issues)
|
| 195 |
+
|
| 196 |
+
## Citations
|
| 197 |
+
|
| 198 |
+
### LCO-Embedding
|
| 199 |
+
|
| 200 |
+
```bibtex
|
| 201 |
+
@article{xiao2025scaling,
|
| 202 |
+
title={Scaling Language-Centric Omnimodal Representation Learning},
|
| 203 |
+
author={Xiao, Chenghao and Chan, Hou Pong and Zhang, Hao and Xu, Weiwen and Aljunied, Mahani and Rong, Yu},
|
| 204 |
+
journal={arXiv preprint arXiv:2510.11693},
|
| 205 |
+
year={2025}
|
| 206 |
+
}
|
| 207 |
+
```
|
| 208 |
+
|
| 209 |
+
### Qwen 2.5 Omni
|
| 210 |
+
|
| 211 |
+
```bibtex
|
| 212 |
+
@article{Qwen2.5-Omni,
|
| 213 |
+
title={Qwen2.5-Omni Technical Report},
|
| 214 |
+
author={Jin Xu and Zhifang Guo and Jinzheng He and Hangrui Hu and Ting He and Shuai Bai and Keqin Chen and Jialin Wang and Yang Fan and Kai Dang and Bin Zhang and Xiong Wang and Yunfei Chu and Junyang Lin},
|
| 215 |
+
journal={arXiv preprint arXiv:2503.20215},
|
| 216 |
+
year={2025}
|
| 217 |
+
}
|
| 218 |
+
```
|