Michael Günther commited on
Commit
22b43cf
·
2 Parent(s): a7c1eab f69955f

Merge branch 'main' of https://huggingface.co/jinaai/jina-embeddings-v5-text-small-text-matching

Browse files
Files changed (1) hide show
  1. README.md +48 -13
README.md CHANGED
@@ -26,7 +26,7 @@ library_name: llama.cpp
26
 
27
  ### **jina-embeddings-v5-text-small-text-matching**: Text-Matching-Targeted Embedding Distillation
28
 
29
- [Blog](https://jina.ai/news/jina-embeddings-v5-text-distilling-4b-quality-into-sub-1b-multilingual-embeddings) | [Elastic Inference Service](https://www.elastic.co/docs/explore-analyze/elastic-inference/eis) | [ArXiv](https://arxiv.org/abs/2602.15547) | [Blog](https://jina.ai/news/jina-embeddings-v5-text-distilling-4b-quality-into-sub-1b-multilingual-embeddings)
30
 
31
  ### Model Overview
32
 
@@ -48,18 +48,7 @@ Trained using a novel approach that combines distillation with task-specific con
48
  | Pooling Strategy | Last-token pooling |
49
  | Base Model | jinaai/jina-embeddings-v5-text-small |
50
 
51
- <p align="center">
52
- <img src="https://jina-ai-gmbh.ghost.io/content/images/2026/02/v5_mmteb-4.png" alt="MMTEB Multilingual Benchmark" width="500px">
53
- </p>
54
-
55
- <p align="center">
56
- <img src="https://jina-ai-gmbh.ghost.io/content/images/2026/02/v5_mteb_en-4.png" alt="MTEB English Benchmark" width="500px">
57
- </p>
58
-
59
- <p align="center">
60
- <img src="https://jina-ai-gmbh.ghost.io/content/images/2026/02/v5_retrieval-4.png" alt="Retrieval Benchmark Results" width="500px">
61
- </p>
62
-
63
 
64
  ### Training and Evaluation
65
 
@@ -246,6 +235,52 @@ curl -X POST "http://127.0.0.1:8080/v1/embeddings" \
246
 
247
  </details>
248
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
249
  ### License
250
 
251
  The model is licensed under CC BY-NC 4.0. For commercial use, please [contact us]([email protected]).
 
26
 
27
  ### **jina-embeddings-v5-text-small-text-matching**: Text-Matching-Targeted Embedding Distillation
28
 
29
+ [Elastic Inference Service](https://www.elastic.co/docs/explore-analyze/elastic-inference/eis) | [ArXiv](https://arxiv.org/abs/2602.15547) | [Release Note](https://jina.ai/news/jina-embeddings-v5-text-distilling-4b-quality-into-sub-1b-multilingual-embeddings) | [Blog](https://www.elastic.co/search-labs/blog/jina-embeddings-v5-text)
30
 
31
  ### Model Overview
32
 
 
48
  | Pooling Strategy | Last-token pooling |
49
  | Base Model | jinaai/jina-embeddings-v5-text-small |
50
 
51
+ ![v5_benchmarks_combined](https://cdn-uploads.huggingface.co/production/uploads/6476ff2699a5ce743ccea3fc/7WjMQChM6XAOI9LhREChg.png)
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ### Training and Evaluation
54
 
 
235
 
236
  </details>
237
 
238
+ <details>
239
+ <summary> via <a href="https://huggingface.co/docs/optimum/index">Optimum (ONNX)</a></summary>
240
+
241
+ You can run the ONNX-optimized version of the model locally using Hugging Face's `optimum` library. Make sure you have the required dependencies installed (e.g., `pip install optimum[onnxruntime] transformers torch`):
242
+
243
+ ```python
244
+ from optimum.onnxruntime import ORTModelForFeatureExtraction
245
+ from transformers import AutoTokenizer
246
+ import torch
247
+
248
+ model_id = "jinaai/jina-embeddings-v5-text-small-text-matching"
249
+
250
+ # 1. Load tokenizer and ONNX model
251
+ # We specify the subfolder 'onnx' where the weights are located
252
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
253
+ model = ORTModelForFeatureExtraction.from_pretrained(
254
+ model_id,
255
+ subfolder="onnx",
256
+ file_name="model.onnx",
257
+ provider="CPUExecutionProvider", # Or "CUDAExecutionProvider" for GPU
258
+ trust_remote_code=True,
259
+ )
260
+
261
+ # 2. Prepare input
262
+ texts = ["Document: How do I use Jina ONNX models?", "Document: Information about semantic matching."]
263
+ inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
264
+
265
+
266
+ # 4. Inference
267
+ with torch.no_grad():
268
+ outputs = model(**inputs)
269
+
270
+ # 5. Pooling (Crucial for Jina-v5)
271
+ # Jina-v5 uses LAST-TOKEN pooling.
272
+ # We take the hidden state of the last non-padding token.
273
+ last_hidden_state = outputs.last_hidden_state
274
+ # Find the indices of the last token (usually the end of the sequence)
275
+ sequence_lengths = inputs.attention_mask.sum(dim=1) - 1
276
+ embeddings = last_hidden_state[torch.arange(last_hidden_state.size(0)), sequence_lengths]
277
+
278
+ print('embeddings shape:', embeddings.shape)
279
+ print('embeddings:', embeddings)
280
+ ```
281
+
282
+ </details>
283
+
284
  ### License
285
 
286
  The model is licensed under CC BY-NC 4.0. For commercial use, please [contact us]([email protected]).