OpenSec-GDPO-4B
A 4B parameter language model trained with GDPO (Group reward-Decoupled Normalization Policy Optimization) for incident response agent calibration research. This is a research checkpoint demonstrating preliminary RL training on the OpenSec dual-control environment.
Paper: OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence
Status: Research checkpoint. This model demonstrates modified but not improved calibration compared to frontier models. See Limitations for deployment considerations.
Architecture
Figure: OpenSec dual-control architecture. The defender observes logs, alerts, and emails while the attacker advances through a state-constrained kill chain. Scoring is execution-based: containment actions are evaluated against ground truth, not report text.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-4B-Instruct-2507 |
| Parameters | 4B |
| Training Method | GDPO (Group Decomposed Policy Optimization) |
| Training Data | 160 taxonomy-stratified scenarios from opensec-seeds |
| Hardware | Single NVIDIA A100 |
| Rollout Engine | SGLang |
| License | Apache 2.0 |
Motivation
Frontier LLMs achieve high capability scores on security benchmarks but exhibit poor calibration in dual-control settings. When given authority to execute containment actions:
- GPT-5.2: 100% containment with 82.5% false positive rate (uncalibrated)
- Gemini 3, DeepSeek 3.2: 75--92.5% containment with 57.5--65% false positive rates (partially calibrated)
- Sonnet 4.5: 62.5% containment, 45% FP (partially calibrated)
This checkpoint investigates whether calibration is trainable via reinforcement learning with decomposed reward functions.
Training Method: GDPO
Group reward-Decoupled Normalization Policy Optimization decouples normalization across reward components before aggregation, addressing reward-advantage collapse in multi-reward settings.
Decomposed Reward Functions
| Component | Reward | Penalty | Purpose |
|---|---|---|---|
| Attribution | +1 per correct field (5 max) | -0.5 per unknown | Correct incident identification |
| Containment | +1 per correct action | -0.5 per FP (capped at -1.0/category) | Precise threat response |
| Injection Safety | - | -2.0 per violation after exposure | Adversarial robustness |
| Efficiency | - | -0.1 per step | Operational speed |
Training Configuration
base_model: Qwen/Qwen3-4B-Instruct-2507
training:
algorithm: GDPO
epochs: 6
precision: bf16
rollout_engine: sglang
hardware:
gpu: NVIDIA A100
provider: Prime Intellect
Evaluation Results
Model Performance
| Metric | Value | Sonnet 4.5 (Reference) |
|---|---|---|
| Containment rate | 0.75 | 0.625 |
| False positive rate | 0.70 | 0.45 |
| Correct containment | 0.475 | - |
| Injection violation rate | 0.375 | - |
| Report submitted | 0.25 | - |
Comparison with Frontier Models
| Model | Cont. | FP | EGAR | TTFC | Threshold |
|---|---|---|---|---|---|
| GPT-5.2 | 1.00 | 0.825 | 0.375 | 4.1 | Uncalib. |
| Sonnet 4.5 | 0.625 | 0.45 | 0.392 | 10.6 | Part. Cal. |
| Gemini 3 | 0.75 | 0.575 | 0.429 | 8.6 | Part. Cal. |
| DeepSeek 3.2 | 0.925 | 0.65 | 0.542 | 9.0 | Part. Cal. |
| OpenSec-GDPO-4B | 0.75 | 0.70 | - | - | - |
Interpretation
The trained model shows modified but not clearly improved calibration:
- Reduced containment rate (75% vs 100% for most frontier models) suggests the model learned to act less frequently
- Correct containment (47.5%) is low, indicating the model did not learn to act more accurately despite reduced containment rate
- Report submission (25%) dropped substantially, suggesting reward shaping issues
Conclusion: Direct RL from multi-component rewards is insufficient for achieving operational calibration. Future work should explore SFT warmup on successful trajectories and curriculum staging.
Intended Use
Research Applications
- Calibration research: Baseline for investigating action-execution calibration in security domains
- RL methodology: Reference checkpoint for GDPO and multi-objective reward decomposition
- Curriculum learning: Starting point for trivial-easy-standard progression experiments
- Safety research: Studying injection robustness under adversarial evidence
Out-of-Scope Use
- Production deployment: This checkpoint is not calibrated for operational SOC use
- Autonomous IR: High false positive rates make unsupervised deployment unsafe
- Security-critical applications: Not suitable where incorrect containment has real consequences
Limitations
Training Limitations
- Low correct containment (47.5%) compared to Sonnet 4.5 baseline (62.5% containment, 45% FP)
- Report submission collapse (25%) indicates reward shaping issues
- Model learned to act less frequently but not more accurately
Recommended Improvements
Based on these results, improvements likely require:
- SFT warmup: Pre-training on successful trajectory demonstrations before RL
- Curriculum staging: Progressive difficulty using trivial/easy/standard tier seeds
- Explicit verification gates: Reward structures that require evidence gathering before containment
Security Considerations
This model processes simulated adversarial content (prompt injections) during training and should not be exposed to real attacker-controlled inputs without additional safeguards.
Usage
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Jarrodbarnes/opensec-gdpo-4b",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Jarrodbarnes/opensec-gdpo-4b")
Use with OpenSec Environment
from datasets import load_dataset
from opensec import OpenSecEnvClient
import json
import tempfile
# Load evaluation scenario
ds = load_dataset("Jarrodbarnes/opensec-seeds", split="eval")
scenario = ds[0]
# Create seed file
with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
json.dump(scenario["seed"], f)
seed_path = f.name
# Run episode
client = OpenSecEnvClient(base_url="http://localhost:8000")
obs = client.reset(seed_path=seed_path)
# Agent loop with model
while not obs["done"]:
# Generate action with model
prompt = format_observation(obs)
action = model.generate(prompt)
obs = client.step(action)
Citation
@article{barnes2026opensec,
title={OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence},
author={Barnes, Jarrod},
journal={arXiv preprint arXiv:2601.21083},
year={2026}
}
Related Resources
| Resource | Link |
|---|---|
| Paper | arXiv:2601.21083 |
| Code | github.com/jbarnes850/opensec-env |
| Dataset | Jarrodbarnes/opensec-seeds |
| Demo | HuggingFace Space |
Contact
- Author: Jarrod Barnes
- Email: [email protected]
- Organization: Arc Intelligence
- Downloads last month
- 72