Inference Providers documentation
Fill-mask
Get Started
Guides
Your First API CallBuilding Your First AI AppStructured Outputs with LLMsFunction CallingResponses API (beta)How to use OpenAI gpt-ossBuild an Image EditorAutomating Code Review with GitHub ActionsAgentic Coding Environments with OpenEnvEvaluating Models with Inspect
Integrations
OverviewAdd Your IntegrationClaude CodeHermes AgentNeMo Data DesignerMacWhisperOpenCodePiVision AgentsVS Code with GitHub Copilot
Inference Tasks
Providers
CerebrasCohereDeepInfraFal AIFeatherless AIFireworksGroqHyperbolicHF InferenceNovitaNscaleOVHcloud AI EndpointsPublic AIReplicateSambaNovaScalewayTogetherWaveSpeedAIZ.ai
Hub APIRegister as an Inference ProviderFill-mask
Mask filling is the task of predicting the right word (token to be precise) in the middle of a sequence.
For more details about the
fill-masktask, check out its dedicated page! You will find examples and related materials.
Recommended models
- FacebookAI/xlm-roberta-base: A multilingual model trained on 100 languages.
Explore all available models and find the one that suits you best here.
Using the API
Language
Client
Provider
Copied
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
result = client.fill_mask(
"The answer to the universe is undefined.",
model="google-bert/bert-base-uncased",
)API specification
Request
| Headers | ||
|---|---|---|
| authorization | string | Authentication header in the form 'Bearer: hf_****' when hf_**** is a personal user access token with “Inference Providers” permission. You can generate one from your settings page. |
| Payload | ||
|---|---|---|
| inputs* | string | The text with masked tokens |
| parameters | object | |
| top_k | integer | When passed, overrides the number of predictions to return. |
| targets | string[] | When passed, the model will limit the scores to the passed targets instead of looking up in the whole vocabulary. If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with a warning, and that might be slower). |
Response
| Body | ||
|---|---|---|
| (array) | object[] | Output is an array of objects. |
| sequence | string | The corresponding input with the mask token prediction. |
| score | number | The corresponding probability |
| token | integer | The predicted token id (to replace the masked one). |
| token_str | string | The predicted token (to replace the masked one). |