marksverdhei commited on
Commit
c281a75
·
verified ·
1 Parent(s): 019af2f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +146 -16
README.md CHANGED
@@ -4,14 +4,37 @@ library_name: llama.cpp
4
  license: apache-2.0
5
  language:
6
  - en
 
 
7
  tags:
8
  - gguf
9
  - embedding
10
  - multimodal
11
  - qwen2
12
  - qwen2.5-omni
 
 
 
 
 
13
  pipeline_tag: feature-extraction
 
14
  quantized_by: marksverdhei
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
  # LCO-Embedding-Omni-7B-GGUF
@@ -22,10 +45,23 @@ Converted using [ht-llama.cpp](https://github.com/heiervang-technologies/ht-llam
22
 
23
  ## About the model
24
 
25
- LCO-Embedding-Omni-7B is a multimodal embedding model based on the Thinker component of [Qwen 2.5 Omni](https://huggingface.co/Qwen/Qwen2.5-Omni-7B), fine-tuned with contrastive learning to produce embeddings from text, images, audio, and video. It uses last-token pooling to extract embeddings from the final hidden state.
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Available files
28
 
 
 
29
  | File | Quant | Size | Description |
30
  |------|-------|------|-------------|
31
  | `LCO-Embedding-Omni-7B-BF16.gguf` | BF16 | 15 GB | Full precision, no quality loss |
@@ -33,10 +69,60 @@ LCO-Embedding-Omni-7B is a multimodal embedding model based on the Thinker compo
33
  | `LCO-Embedding-Omni-7B-Q4_K_M.gguf` | Q4_K_M | 4.4 GB | Good balance of quality and size |
34
  | `LCO-Embedding-Omni-7B-Q3_K_M.gguf` | Q3_K_M | 3.6 GB | Smaller, some quality loss |
35
  | `LCO-Embedding-Omni-7B-Q2_K.gguf` | Q2_K | 2.9 GB | Smallest, more quality loss |
36
- | `mmproj-LCO-Embedding-Omni-7b-F16.gguf` | F16 | 1.3 GB | Vision + audio projection (required for multimodal) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  For text-only embedding, you only need one of the text model GGUFs. For multimodal (image/audio/video), you also need the `mmproj` file.
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ## Usage
41
 
42
  ### Build llama.cpp
@@ -45,10 +131,10 @@ For text-only embedding, you only need one of the text model GGUFs. For multimod
45
  git clone https://github.com/heiervang-technologies/ht-llama.cpp
46
  cd ht-llama.cpp
47
  cmake -B build
48
- cmake --build build --target llama-embedding -j$(nproc)
49
  ```
50
 
51
- ### Text embeddings
52
 
53
  ```bash
54
  ./build/bin/llama-embedding \
@@ -57,32 +143,76 @@ cmake --build build --target llama-embedding -j$(nproc)
57
  -p "Your text here"
58
  ```
59
 
60
- ### Multimodal embeddings
61
 
62
- Multimodal embedding support (vision + audio) requires the mmproj file and the llama.cpp server:
 
 
 
 
 
 
 
 
 
 
 
63
 
64
  ```bash
65
  ./build/bin/llama-server \
66
  -m LCO-Embedding-Omni-7B-Q8_0.gguf \
67
  --mmproj mmproj-LCO-Embedding-Omni-7b-F16.gguf \
68
- --embedding \
69
- --pooling last
70
  ```
71
 
72
- Then use the `/embeddings` endpoint:
73
-
74
  ```bash
75
- # Text embedding
76
  curl -s http://localhost:8080/embeddings \
77
- -d '{"content": "Your text here"}'
78
 
79
- # Image embedding (base64)
80
  curl -s http://localhost:8080/embeddings \
81
- -d '{"content": [{"prompt_string": "Describe this image", "multimodal_data": [{"type": "image", "data": "<base64>"}]}]}'
 
 
 
 
 
 
 
 
 
 
82
  ```
83
 
84
  ## Notes
85
 
86
  - This is a quantization of [LCO-Embedding/LCO-Embedding-Omni-7B](https://huggingface.co/LCO-Embedding/LCO-Embedding-Omni-7B) -- see the original model card for benchmarks, training details, and licensing
87
- - Embedding inference has not been fully validated yet -- contributions and bug reports welcome at [ht-llama.cpp](https://github.com/heiervang-technologies/ht-llama.cpp/issues)
88
- - The `--pooling last` flag is important -- this model uses last-token pooling, not mean pooling
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  license: apache-2.0
5
  language:
6
  - en
7
+ - zh
8
+ - multilingual
9
  tags:
10
  - gguf
11
  - embedding
12
  - multimodal
13
  - qwen2
14
  - qwen2.5-omni
15
+ - feature-extraction
16
+ - text-embedding
17
+ - image-embedding
18
+ - audio-embedding
19
+ - imatrix
20
  pipeline_tag: feature-extraction
21
+ model_type: qwen2vl
22
  quantized_by: marksverdhei
23
+ datasets:
24
+ - MIEB
25
+ model_name: LCO-Embedding-Omni-7B-GGUF
26
+ model-index:
27
+ - name: LCO-Embedding-Omni-7B-GGUF
28
+ results:
29
+ - task:
30
+ type: feature-extraction
31
+ metrics:
32
+ - name: Embedding Dimensions
33
+ type: embedding_dimensions
34
+ value: 3584
35
+ - name: Pooling Method
36
+ type: pooling
37
+ value: last-token
38
  ---
39
 
40
  # LCO-Embedding-Omni-7B-GGUF
 
45
 
46
  ## About the model
47
 
48
+ [LCO-Embedding-Omni-7B](https://huggingface.co/LCO-Embedding/LCO-Embedding-Omni-7B) is a 9B-parameter multimodal embedding model based on the Thinker component of [Qwen 2.5 Omni](https://huggingface.co/Qwen/Qwen2.5-Omni-7B), fine-tuned with LoRA and contrastive learning to produce 3584-dimensional embeddings from text, images, audio, and video. It achieves **state-of-the-art on [MIEB-Lite](https://huggingface.co/spaces/MTEB/MIEB) (68.8 mean across 51 tasks)**, outperforming models trained on 21x more data. Uses last-token pooling.
49
+
50
+ See [Scaling Language-Centric Omnimodal Representation Learning](https://arxiv.org/abs/2510.11693) (NeurIPS 2025) for details.
51
+
52
+ ### Original model benchmarks (MIEB-Lite, 51 tasks)
53
+
54
+ | Model | Data | Mean |
55
+ |-------|------|------|
56
+ | GME (7B) | 8.0M pairs | 64.5 |
57
+ | mmE5 (11B) | 2.1M pairs | 61.8 |
58
+ | Voyage Multimodal 3 | -- | 58.1 |
59
+ | **LCO-Emb-Omni (7B)** | **370k pairs** | **68.8** |
60
 
61
  ## Available files
62
 
63
+ ### Standard quantizations
64
+
65
  | File | Quant | Size | Description |
66
  |------|-------|------|-------------|
67
  | `LCO-Embedding-Omni-7B-BF16.gguf` | BF16 | 15 GB | Full precision, no quality loss |
 
69
  | `LCO-Embedding-Omni-7B-Q4_K_M.gguf` | Q4_K_M | 4.4 GB | Good balance of quality and size |
70
  | `LCO-Embedding-Omni-7B-Q3_K_M.gguf` | Q3_K_M | 3.6 GB | Smaller, some quality loss |
71
  | `LCO-Embedding-Omni-7B-Q2_K.gguf` | Q2_K | 2.9 GB | Smallest, more quality loss |
72
+
73
+ ### Importance matrix (imatrix) quantizations
74
+
75
+ Quantized with an importance matrix computed from WikiText-2 calibration data for improved quality at low bit widths.
76
+
77
+ | File | Quant | Size | Description |
78
+ |------|-------|------|-------------|
79
+ | `LCO-Embedding-Omni-7B-IQ4_XS.gguf` | IQ4_XS | 4.0 GB | 4.25 bpw, imatrix-optimized |
80
+ | `LCO-Embedding-Omni-7B-IQ3_M.gguf` | IQ3_M | 3.4 GB | 3.66 bpw, imatrix-optimized |
81
+ | `LCO-Embedding-Omni-7B-IQ3_XS.gguf` | IQ3_XS | -- | 3.3 bpw, imatrix-optimized |
82
+ | `LCO-Embedding-Omni-7B-IQ2_M.gguf` | IQ2_M | -- | 2.7 bpw, imatrix-optimized |
83
+ | `LCO-Embedding-Omni-7B-IQ2_XS.gguf` | IQ2_XS | -- | 2.31 bpw, imatrix-optimized |
84
+
85
+ ### Multimodal projection
86
+
87
+ | File | Quant | Size | Description |
88
+ |------|-------|------|-------------|
89
+ | `mmproj-LCO-Embedding-Omni-7b-F16.gguf` | F16 | 2.5 GB | Vision + audio projection (required for multimodal) |
90
 
91
  For text-only embedding, you only need one of the text model GGUFs. For multimodal (image/audio/video), you also need the `mmproj` file.
92
 
93
+ ## Quantization quality
94
+
95
+ Measured on 8 diverse text sentences (3584-dim embeddings). BF16 is the reference.
96
+
97
+ ### Similarity matrix quality vs BF16
98
+
99
+ | Quant | Speedup | Mean Abs Diff | Max Diff | Pearson r | Spearman rho |
100
+ |-------|---------|---------------|----------|-----------|--------------|
101
+ | Q8_0 | 1.7x | 0.0025 | 0.009 | 0.9997 | 0.9956 |
102
+ | Q4_K_M | 2.4x | 0.0073 | 0.022 | 0.9974 | 0.9951 |
103
+ | Q3_K_M | 2.8x | 0.0165 | 0.063 | 0.9839 | 0.9770 |
104
+ | Q2_K | 3.1x | 0.0429 | 0.175 | 0.9126 | 0.8506 |
105
+
106
+ ### Embedding vector cosine similarity vs BF16
107
+
108
+ | Quant | Mean | Min | Max |
109
+ |-------|------|-----|-----|
110
+ | Q8_0 | 0.9998 | 0.9997 | 0.9999 |
111
+ | Q4_K_M | 0.9948 | 0.9908 | 0.9965 |
112
+ | Q3_K_M | 0.9825 | 0.9646 | 0.9882 |
113
+ | Q2_K | 0.9111 | 0.8620 | 0.9432 |
114
+
115
+ ### pgvector retrieval quality (query with quant, corpus in BF16)
116
+
117
+ | Quant | Recall@1 | Recall@3 | Mean Drift | Max Drift |
118
+ |-------|----------|----------|------------|-----------|
119
+ | Q8_0 | 100% | 100% | 0.0002 | 0.0003 |
120
+ | Q4_K_M | 100% | 100% | 0.0052 | 0.0092 |
121
+ | Q3_K_M | 100% | 100% | 0.0175 | 0.0354 |
122
+ | Q2_K | 100% | 100% | 0.0889 | 0.1380 |
123
+
124
+ **Recommendation:** Q8_0 is essentially lossless for retrieval. Q4_K_M offers an excellent quality/size tradeoff. Q3_K_M is viable for constrained environments. Q2_K works but shows meaningful embedding drift (~9%).
125
+
126
  ## Usage
127
 
128
  ### Build llama.cpp
 
131
  git clone https://github.com/heiervang-technologies/ht-llama.cpp
132
  cd ht-llama.cpp
133
  cmake -B build
134
+ cmake --build build --target llama-embedding llama-server -j$(nproc)
135
  ```
136
 
137
+ ### Text embeddings (CLI)
138
 
139
  ```bash
140
  ./build/bin/llama-embedding \
 
143
  -p "Your text here"
144
  ```
145
 
146
+ ### Text embeddings (server)
147
 
148
+ ```bash
149
+ ./build/bin/llama-server \
150
+ -m LCO-Embedding-Omni-7B-Q8_0.gguf \
151
+ --embedding --pooling last
152
+
153
+ curl -s http://localhost:8080/embeddings \
154
+ -d '{"content": "Your text here"}'
155
+ ```
156
+
157
+ ### Multimodal embeddings (vision + audio)
158
+
159
+ Requires the mmproj file:
160
 
161
  ```bash
162
  ./build/bin/llama-server \
163
  -m LCO-Embedding-Omni-7B-Q8_0.gguf \
164
  --mmproj mmproj-LCO-Embedding-Omni-7b-F16.gguf \
165
+ --embedding --pooling last
 
166
  ```
167
 
 
 
168
  ```bash
169
+ # Image embedding (base64-encoded image)
170
  curl -s http://localhost:8080/embeddings \
171
+ -d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-image-data>"]}]}'
172
 
173
+ # Audio embedding (base64-encoded WAV)
174
  curl -s http://localhost:8080/embeddings \
175
+ -d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-audio-data>"]}]}'
176
+ ```
177
+
178
+ ### JSON output (for programmatic use)
179
+
180
+ ```bash
181
+ ./build/bin/llama-embedding \
182
+ -m LCO-Embedding-Omni-7B-Q8_0.gguf \
183
+ --pooling last \
184
+ --embd-output-format json \
185
+ -p "Your text here"
186
  ```
187
 
188
  ## Notes
189
 
190
  - This is a quantization of [LCO-Embedding/LCO-Embedding-Omni-7B](https://huggingface.co/LCO-Embedding/LCO-Embedding-Omni-7B) -- see the original model card for benchmarks, training details, and licensing
191
+ - The `--pooling last` flag is required -- this model uses last-token pooling, not mean pooling
192
+ - Embedding dimensions: 3584
193
+ - All three modalities (text, vision, audio) have been tested and verified working
194
+ - Contributions and bug reports welcome at [ht-llama.cpp](https://github.com/heiervang-technologies/ht-llama.cpp/issues)
195
+
196
+ ## Citations
197
+
198
+ ### LCO-Embedding
199
+
200
+ ```bibtex
201
+ @article{xiao2025scaling,
202
+ title={Scaling Language-Centric Omnimodal Representation Learning},
203
+ author={Xiao, Chenghao and Chan, Hou Pong and Zhang, Hao and Xu, Weiwen and Aljunied, Mahani and Rong, Yu},
204
+ journal={arXiv preprint arXiv:2510.11693},
205
+ year={2025}
206
+ }
207
+ ```
208
+
209
+ ### Qwen 2.5 Omni
210
+
211
+ ```bibtex
212
+ @article{Qwen2.5-Omni,
213
+ title={Qwen2.5-Omni Technical Report},
214
+ author={Jin Xu and Zhifang Guo and Jinzheng He and Hangrui Hu and Ting He and Shuai Bai and Keqin Chen and Jialin Wang and Yang Fan and Kai Dang and Bin Zhang and Xiong Wang and Yunfei Chu and Junyang Lin},
215
+ journal={arXiv preprint arXiv:2503.20215},
216
+ year={2025}
217
+ }
218
+ ```