OpenSec-GDPO-4B

A 4B parameter language model trained with GDPO (Group reward-Decoupled Normalization Policy Optimization) for incident response agent calibration research. This is a research checkpoint demonstrating preliminary RL training on the OpenSec dual-control environment.

Paper: OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Status: Research checkpoint. This model demonstrates modified but not improved calibration compared to frontier models. See Limitations for deployment considerations.

Architecture

Figure: OpenSec dual-control architecture. The defender observes logs, alerts, and emails while the attacker advances through a state-constrained kill chain. Scoring is execution-based: containment actions are evaluated against ground truth, not report text.

Model Details

Property	Value
Base Model	Qwen/Qwen3-4B-Instruct-2507
Parameters	4B
Training Method	GDPO (Group Decomposed Policy Optimization)
Training Data	160 taxonomy-stratified scenarios from opensec-seeds
Hardware	Single NVIDIA A100
Rollout Engine	SGLang
License	Apache 2.0

Motivation

Frontier LLMs achieve high capability scores on security benchmarks but exhibit poor calibration in dual-control settings. When given authority to execute containment actions:

GPT-5.2: 100% containment with 82.5% false positive rate (uncalibrated)
Gemini 3, DeepSeek 3.2: 75--92.5% containment with 57.5--65% false positive rates (partially calibrated)
Sonnet 4.5: 62.5% containment, 45% FP (partially calibrated)

This checkpoint investigates whether calibration is trainable via reinforcement learning with decomposed reward functions.

Training Method: GDPO

Group reward-Decoupled Normalization Policy Optimization decouples normalization across reward components before aggregation, addressing reward-advantage collapse in multi-reward settings.

Decomposed Reward Functions

Component	Reward	Penalty	Purpose
Attribution	+1 per correct field (5 max)	-0.5 per unknown	Correct incident identification
Containment	+1 per correct action	-0.5 per FP (capped at -1.0/category)	Precise threat response
Injection Safety	-	-2.0 per violation after exposure	Adversarial robustness
Efficiency	-	-0.1 per step	Operational speed

Training Configuration

base_model: Qwen/Qwen3-4B-Instruct-2507
training:
  algorithm: GDPO
  epochs: 6
  precision: bf16
  rollout_engine: sglang
hardware:
  gpu: NVIDIA A100
  provider: Prime Intellect

Evaluation Results

Model Performance

Metric	Value	Sonnet 4.5 (Reference)
Containment rate	0.75	0.625
False positive rate	0.70	0.45
Correct containment	0.475	-
Injection violation rate	0.375	-
Report submitted	0.25	-

Comparison with Frontier Models

Model	Cont.	FP	EGAR	TTFC	Threshold
GPT-5.2	1.00	0.825	0.375	4.1	Uncalib.
Sonnet 4.5	0.625	0.45	0.392	10.6	Part. Cal.
Gemini 3	0.75	0.575	0.429	8.6	Part. Cal.
DeepSeek 3.2	0.925	0.65	0.542	9.0	Part. Cal.
OpenSec-GDPO-4B	0.75	0.70	-	-	-

Interpretation

The trained model shows modified but not clearly improved calibration:

Reduced containment rate (75% vs 100% for most frontier models) suggests the model learned to act less frequently
Correct containment (47.5%) is low, indicating the model did not learn to act more accurately despite reduced containment rate
Report submission (25%) dropped substantially, suggesting reward shaping issues

Conclusion: Direct RL from multi-component rewards is insufficient for achieving operational calibration. Future work should explore SFT warmup on successful trajectories and curriculum staging.

Intended Use

Research Applications

Calibration research: Baseline for investigating action-execution calibration in security domains
RL methodology: Reference checkpoint for GDPO and multi-objective reward decomposition
Curriculum learning: Starting point for trivial-easy-standard progression experiments
Safety research: Studying injection robustness under adversarial evidence

Out-of-Scope Use

Production deployment: This checkpoint is not calibrated for operational SOC use
Autonomous IR: High false positive rates make unsupervised deployment unsafe
Security-critical applications: Not suitable where incorrect containment has real consequences

Limitations

Training Limitations

Low correct containment (47.5%) compared to Sonnet 4.5 baseline (62.5% containment, 45% FP)
Report submission collapse (25%) indicates reward shaping issues
Model learned to act less frequently but not more accurately

Recommended Improvements

Based on these results, improvements likely require:

SFT warmup: Pre-training on successful trajectory demonstrations before RL
Curriculum staging: Progressive difficulty using trivial/easy/standard tier seeds
Explicit verification gates: Reward structures that require evidence gathering before containment

Security Considerations

This model processes simulated adversarial content (prompt injections) during training and should not be exposed to real attacker-controlled inputs without additional safeguards.

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Jarrodbarnes/opensec-gdpo-4b",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Jarrodbarnes/opensec-gdpo-4b")

Use with OpenSec Environment

from datasets import load_dataset
from opensec import OpenSecEnvClient
import json
import tempfile

# Load evaluation scenario
ds = load_dataset("Jarrodbarnes/opensec-seeds", split="eval")
scenario = ds[0]

# Create seed file
with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
    json.dump(scenario["seed"], f)
    seed_path = f.name

# Run episode
client = OpenSecEnvClient(base_url="http://localhost:8000")
obs = client.reset(seed_path=seed_path)

# Agent loop with model
while not obs["done"]:
    # Generate action with model
    prompt = format_observation(obs)
    action = model.generate(prompt)
    obs = client.step(action)

Citation

@article{barnes2026opensec,
  title={OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence},
  author={Barnes, Jarrod},
  journal={arXiv preprint arXiv:2601.21083},
  year={2026}
}

Related Resources

Resource	Link
Paper	arXiv:2601.21083
Code	github.com/jbarnes850/opensec-env
Dataset	Jarrodbarnes/opensec-seeds
Demo	HuggingFace Space

Contact

Author: Jarrod Barnes
Email: [email protected]
Organization: Arc Intelligence

Downloads last month: 72

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Jarrodbarnes/opensec-gdpo-4b

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(885)

this model

Quantizations

1 model

Dataset used to train Jarrodbarnes/opensec-gdpo-4b

Paper for Jarrodbarnes/opensec-gdpo-4b

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Paper • 2601.21083 • Published 13 days ago • 1