Artifacts for paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968)
Jack Zhang
jackzhang
AI & ML interests
None yet
Organizations
datasets
35
jackzhang/JBDistill-Bench
Viewer
•
Updated
•
1k
•
28
jackzhang/wjharm-or79k-stage2
Viewer
•
Updated
•
79.5k
•
9
jackzhang/wjharm-or79k-stage1
Viewer
•
Updated
•
79.5k
•
10
jackzhang/cosalign_train_simplified
Viewer
•
Updated
•
125k
•
10
jackzhang/cosalign_test_simplfied
Viewer
•
Updated
•
3.2k
•
12
jackzhang/wjharm-or79k
Viewer
•
Updated
•
159k
•
13
jackzhang/wjtrain_prompts-advonly-held500
Viewer
•
Updated
•
161k
•
15
jackzhang/gsm8k_sysp-test
Viewer
•
Updated
•
1.32k
•
13
jackzhang/gsm8k_sysp-train
Viewer
•
Updated
•
7.47k
•
13
jackzhang/wjtrain_prompts-dev-held500-mmsysp
Viewer
•
Updated
•
500
•
13