---
pipeline_tag: sentence-similarity
tags:
- gguf
- embedding
- qwen3
- llama-cpp
- jina-embeddings-v5
language:
- multilingual
base_model: jinaai/jina-embeddings-v5-text-small
base_model_relation: quantized
inference: false
license: cc-by-nc-4.0
library_name: llama.cpp
---
# jina-embeddings-v5-text-small-retrieval-GGUF
GGUF quantizations of [jina-embeddings-v5-text-small-retrieval](https://huggingface.co/jinaai/jina-embeddings-v5-text-small) using llama.cpp. A 677M parameter multilingual embedding model quantized for efficient inference.
[Elastic Inference Service](https://www.elastic.co/docs/explore-analyze/elastic-inference/eis) | [ArXiv](https://arxiv.org/abs/2602.15547) | [Blog](https://jina.ai/news/jina-embeddings-v5-text-distilling-4b-quality-into-sub-1b-multilingual-embeddings)
> [!IMPORTANT]
> We highly recommend to first read [this blog post for more technical details and customized llama.cpp build](https://jina.ai/news/optimizing-ggufs-for-decoder-only-embedding-models).
## Overview
`jina-embeddings-v5-text-small-retrieval` is a task-specific embedding model for **retrieval**, part of the [jina-embeddings-v5-text](https://huggingface.co/jinaai/jina-embeddings-v5-text-small) model family.
| Feature | Value |
| --- | --- |
| Parameters | 677M |
| Task | `retrieval` |
| Embedding Dimension | 1024 |
| Matryoshka Dimensions | 32, 64, 128, 256, 512, 768, 1024 |
| Pooling Strategy | Last-token pooling |
| Base Model | [jina-embeddings-v5-text-small](https://huggingface.co/jinaai/jina-embeddings-v5-text-small) |
## Usage with llama.cpp
via Elastic Inference Service
The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.
```bash
PUT _inference/text_embedding/jina-v5
{
"service": "elastic",
"service_settings": {
"model_id": "jina-embeddings-v5-text-small"
}
}
```
See the [Elastic Inference Service documentation](https://www.elastic.co/docs/explore-analyze/elastic-inference/eis) for setup details.
```bash
# Build llama.cpp (upstream)
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build --config Release
# Run embedding
./build/bin/llama-embedding -m jina-embeddings-v5-text-small-retrieval-Q8_0.gguf \
--pooling last -p "Your text here"
```
## License
CC-BY-NC-4.0. For commercial use, please [contact us](https://jina.ai/contact-sales).