Merge branch 'main' of https://huggingface.co/jinaai/jina-embeddings-v5-text-small-text-matching
Browse files
README.md
CHANGED
|
@@ -26,7 +26,7 @@ library_name: llama.cpp
|
|
| 26 |
|
| 27 |
### **jina-embeddings-v5-text-small-text-matching**: Text-Matching-Targeted Embedding Distillation
|
| 28 |
|
| 29 |
-
[
|
| 30 |
|
| 31 |
### Model Overview
|
| 32 |
|
|
@@ -48,18 +48,7 @@ Trained using a novel approach that combines distillation with task-specific con
|
|
| 48 |
| Pooling Strategy | Last-token pooling |
|
| 49 |
| Base Model | jinaai/jina-embeddings-v5-text-small |
|
| 50 |
|
| 51 |
-
|
| 52 |
-
<img src="https://jina-ai-gmbh.ghost.io/content/images/2026/02/v5_mmteb-4.png" alt="MMTEB Multilingual Benchmark" width="500px">
|
| 53 |
-
</p>
|
| 54 |
-
|
| 55 |
-
<p align="center">
|
| 56 |
-
<img src="https://jina-ai-gmbh.ghost.io/content/images/2026/02/v5_mteb_en-4.png" alt="MTEB English Benchmark" width="500px">
|
| 57 |
-
</p>
|
| 58 |
-
|
| 59 |
-
<p align="center">
|
| 60 |
-
<img src="https://jina-ai-gmbh.ghost.io/content/images/2026/02/v5_retrieval-4.png" alt="Retrieval Benchmark Results" width="500px">
|
| 61 |
-
</p>
|
| 62 |
-
|
| 63 |
|
| 64 |
### Training and Evaluation
|
| 65 |
|
|
@@ -246,6 +235,52 @@ curl -X POST "http://127.0.0.1:8080/v1/embeddings" \
|
|
| 246 |
|
| 247 |
</details>
|
| 248 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 249 |
### License
|
| 250 |
|
| 251 |
The model is licensed under CC BY-NC 4.0. For commercial use, please [contact us]([email protected]).
|
|
|
|
| 26 |
|
| 27 |
### **jina-embeddings-v5-text-small-text-matching**: Text-Matching-Targeted Embedding Distillation
|
| 28 |
|
| 29 |
+
[Elastic Inference Service](https://www.elastic.co/docs/explore-analyze/elastic-inference/eis) | [ArXiv](https://arxiv.org/abs/2602.15547) | [Release Note](https://jina.ai/news/jina-embeddings-v5-text-distilling-4b-quality-into-sub-1b-multilingual-embeddings) | [Blog](https://www.elastic.co/search-labs/blog/jina-embeddings-v5-text)
|
| 30 |
|
| 31 |
### Model Overview
|
| 32 |
|
|
|
|
| 48 |
| Pooling Strategy | Last-token pooling |
|
| 49 |
| Base Model | jinaai/jina-embeddings-v5-text-small |
|
| 50 |
|
| 51 |
+

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
### Training and Evaluation
|
| 54 |
|
|
|
|
| 235 |
|
| 236 |
</details>
|
| 237 |
|
| 238 |
+
<details>
|
| 239 |
+
<summary> via <a href="https://huggingface.co/docs/optimum/index">Optimum (ONNX)</a></summary>
|
| 240 |
+
|
| 241 |
+
You can run the ONNX-optimized version of the model locally using Hugging Face's `optimum` library. Make sure you have the required dependencies installed (e.g., `pip install optimum[onnxruntime] transformers torch`):
|
| 242 |
+
|
| 243 |
+
```python
|
| 244 |
+
from optimum.onnxruntime import ORTModelForFeatureExtraction
|
| 245 |
+
from transformers import AutoTokenizer
|
| 246 |
+
import torch
|
| 247 |
+
|
| 248 |
+
model_id = "jinaai/jina-embeddings-v5-text-small-text-matching"
|
| 249 |
+
|
| 250 |
+
# 1. Load tokenizer and ONNX model
|
| 251 |
+
# We specify the subfolder 'onnx' where the weights are located
|
| 252 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
| 253 |
+
model = ORTModelForFeatureExtraction.from_pretrained(
|
| 254 |
+
model_id,
|
| 255 |
+
subfolder="onnx",
|
| 256 |
+
file_name="model.onnx",
|
| 257 |
+
provider="CPUExecutionProvider", # Or "CUDAExecutionProvider" for GPU
|
| 258 |
+
trust_remote_code=True,
|
| 259 |
+
)
|
| 260 |
+
|
| 261 |
+
# 2. Prepare input
|
| 262 |
+
texts = ["Document: How do I use Jina ONNX models?", "Document: Information about semantic matching."]
|
| 263 |
+
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
# 4. Inference
|
| 267 |
+
with torch.no_grad():
|
| 268 |
+
outputs = model(**inputs)
|
| 269 |
+
|
| 270 |
+
# 5. Pooling (Crucial for Jina-v5)
|
| 271 |
+
# Jina-v5 uses LAST-TOKEN pooling.
|
| 272 |
+
# We take the hidden state of the last non-padding token.
|
| 273 |
+
last_hidden_state = outputs.last_hidden_state
|
| 274 |
+
# Find the indices of the last token (usually the end of the sequence)
|
| 275 |
+
sequence_lengths = inputs.attention_mask.sum(dim=1) - 1
|
| 276 |
+
embeddings = last_hidden_state[torch.arange(last_hidden_state.size(0)), sequence_lengths]
|
| 277 |
+
|
| 278 |
+
print('embeddings shape:', embeddings.shape)
|
| 279 |
+
print('embeddings:', embeddings)
|
| 280 |
+
```
|
| 281 |
+
|
| 282 |
+
</details>
|
| 283 |
+
|
| 284 |
### License
|
| 285 |
|
| 286 |
The model is licensed under CC BY-NC 4.0. For commercial use, please [contact us]([email protected]).
|