--- title: LLM Safety Assessment emoji: 🛡️ colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 6.3.0 app_file: src/app.py pinned: false --- # LLM Safety Evaluation Space This Hugging Face Space allows you to evaluate the safety of Large Language Models (LLMs) by generating responses to prompts from safety datasets. ## Features - Load and evaluate various LLMs (e.g., SmolLM, Llama) - Use safety datasets like AgentHarm, HH-RLHF, etc. - Customize sampling parameters (temperature, max tokens, etc.) - View results in JSON format ## Usage 1. Enter the model name (Hugging Face model ID). 2. Specify the dataset name and optional config/split. 3. Set the number of samples and generation parameters. 4. Click "Run Evaluation" to generate responses. 5. View the results showing prompts and model responses. ## Requirements - GPU-enabled Space for vLLM compatibility. - HF_TOKEN secret for accessing private models/datasets (optional). ## Note This is a demo for evaluating LLM safety. Ensure compliance with dataset licenses and ethical guidelines.