---
language: id
license: mit
tags:
- indonesian
- nli
- natural-language-inference
- text-classification
- afaji--indonli
datasets:
- afaji/indonli
---
# Indo-RoBERTa for Indonesian Natural Language Inference

This model is fine-tuned on the [IndoNLI dataset](https://huggingface.co/datasets/afaji/indonli) for natural language inference in Indonesian.

## Model Description

- **Model Type:** RoBERTa-based model (indo-roberta-base-epoch-4)
- **Task:** Natural Language Inference (Textual Entailment)
- **Language:** Indonesian
- **License:** MIT

## Performance

The model performance on different dataset splits:

### Validation Set
- **Accuracy:** 0.7692
- **F1 Score:** 0.7662
- **Precision:** 0.7680
- **Recall:** 0.7654

Our benchmark indicates this model achieved the best performance on the validation set compared to other variants.

## Training Procedure

This model was fine-tuned from indo-roberta-base for 4 epochs on the IndoNLI training dataset with a classification head for the NLI task.

## Dataset

This model was trained on the [IndoNLI dataset](https://huggingface.co/datasets/afaji/indonli), which contains 10k sentence pairs as a benchmark for natural language inference (NLI) in Indonesian.

The dataset is split into:
- Training set: 10k pairs
- Validation set: 2.5k pairs
- Test set (lay): 2.5k pairs
- Test set (expert): 2.5k pairs

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("fabhiansan/indo-roberta-nli")
model = AutoModelForSequenceClassification.from_pretrained("fabhiansan/indo-roberta-nli")

# Prepare the input
premise = "Seorang wanita sedang makan di restoran."
hypothesis = "Seorang wanita sedang berada di luar ruangan."

# Tokenize the input
inputs = tokenizer(premise, hypothesis, return_tensors="pt")

# Get the prediction
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=1)

# Map predictions to labels
id2label = {0: "entailment", 1: "neutral", 2: "contradiction"}
predicted_label = id2label[predictions.item()]
print(f"Predicted label: {predicted_label}")
## Citation

If you use this model, please cite the IndoNLI paper:

```bibtex
@inproceedings{mahendra-etal-2021-indonli,
    title = {IndoNLI: A Natural Language Inference Dataset for Indonesian},
    author = {Mahendra, Rahmad and Aji, Alham Fikri and Louvan, Samuel and Rahman, Fahrurrozi and Vania, Clara},
    booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
    year = {2021},
    publisher = {Association for Computational Linguistics},
}
```