Text Classification
Transformers
Safetensors
PyTorch
English
distilbert
reward-hacking
ai-safety
misalignment-detection
llm-safety
alignment
Eval Results (legacy)
Instructions to use Aerosta/rewardhackwatch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Aerosta/rewardhackwatch with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Aerosta/rewardhackwatch")# Load model directly from transformers import AutoTokenizer, RewardHackClassifier tokenizer = AutoTokenizer.from_pretrained("Aerosta/rewardhackwatch") model = RewardHackClassifier.from_pretrained("Aerosta/rewardhackwatch") - Notebooks
- Google Colab
- Kaggle
Ctrl+K