--- pipeline_tag: sentence-similarity tags: - gguf - embedding - qwen3 - llama-cpp - jina-embeddings-v5 language: - multilingual base_model: jinaai/jina-embeddings-v5-text-small base_model_relation: quantized inference: false license: cc-by-nc-4.0 library_name: llama.cpp --- # jina-embeddings-v5-text-small-retrieval-GGUF GGUF quantizations of [jina-embeddings-v5-text-small-retrieval](https://huggingface.co/jinaai/jina-embeddings-v5-text-small) using llama.cpp. A 677M parameter multilingual embedding model quantized for efficient inference. [Elastic Inference Service](https://www.elastic.co/docs/explore-analyze/elastic-inference/eis) | [ArXiv](https://arxiv.org/abs/2602.15547) | [Blog](https://jina.ai/news/jina-embeddings-v5-text-distilling-4b-quality-into-sub-1b-multilingual-embeddings) > [!IMPORTANT] > We highly recommend to first read [this blog post for more technical details and customized llama.cpp build](https://jina.ai/news/optimizing-ggufs-for-decoder-only-embedding-models). ## Overview

jina-embeddings-v5-text Architecture

`jina-embeddings-v5-text-small-retrieval` is a task-specific embedding model for **retrieval**, part of the [jina-embeddings-v5-text](https://huggingface.co/jinaai/jina-embeddings-v5-text-small) model family. | Feature | Value | | --- | --- | | Parameters | 677M | | Task | `retrieval` | | Embedding Dimension | 1024 | | Matryoshka Dimensions | 32, 64, 128, 256, 512, 768, 1024 | | Pooling Strategy | Last-token pooling | | Base Model | [jina-embeddings-v5-text-small](https://huggingface.co/jinaai/jina-embeddings-v5-text-small) |

MMTEB Multilingual Benchmark

MTEB English Benchmark

Retrieval Benchmark Results

## Usage with llama.cpp

via Elastic Inference Service

The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment. ```bash PUT _inference/text_embedding/jina-v5 { "service": "elastic", "service_settings": { "model_id": "jina-embeddings-v5-text-small" } } ``` See the [Elastic Inference Service documentation](https://www.elastic.co/docs/explore-analyze/elastic-inference/eis) for setup details.

```bash # Build llama.cpp (upstream) git clone https://github.com/ggml-org/llama.cpp cd llama.cpp && cmake -B build && cmake --build build --config Release # Run embedding ./build/bin/llama-embedding -m jina-embeddings-v5-text-small-retrieval-Q8_0.gguf \ --pooling last -p "Your text here" ``` ## License CC-BY-NC-4.0. For commercial use, please [contact us](https://jina.ai/contact-sales).