OpenSec-GDPO-4B

arXiv GitHub Dataset

A 4B parameter language model trained with GDPO (Group reward-Decoupled Normalization Policy Optimization) for incident response agent calibration research. This is a research checkpoint demonstrating preliminary RL training on the OpenSec dual-control environment.

Paper: OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

Status: Research checkpoint. This model demonstrates modified but not improved calibration compared to frontier models. See Limitations for deployment considerations.

Architecture

OpenSec Architecture

Figure: OpenSec dual-control architecture. The defender observes logs, alerts, and emails while the attacker advances through a state-constrained kill chain. Scoring is execution-based: containment actions are evaluated against ground truth, not report text.

Model Details

Property Value
Base Model Qwen/Qwen3-4B-Instruct-2507
Parameters 4B
Training Method GDPO (Group Decomposed Policy Optimization)
Training Data 160 taxonomy-stratified scenarios from opensec-seeds
Hardware Single NVIDIA A100
Rollout Engine SGLang
License Apache 2.0

Motivation

Frontier LLMs achieve high capability scores on security benchmarks but exhibit poor calibration in dual-control settings. When given authority to execute containment actions:

  • GPT-5.2: 100% containment with 82.5% false positive rate (uncalibrated)
  • Gemini 3, DeepSeek 3.2: 75--92.5% containment with 57.5--65% false positive rates (partially calibrated)
  • Sonnet 4.5: 62.5% containment, 45% FP (partially calibrated)

This checkpoint investigates whether calibration is trainable via reinforcement learning with decomposed reward functions.

Training Method: GDPO

Group reward-Decoupled Normalization Policy Optimization decouples normalization across reward components before aggregation, addressing reward-advantage collapse in multi-reward settings.

Decomposed Reward Functions

Component Reward Penalty Purpose
Attribution +1 per correct field (5 max) -0.5 per unknown Correct incident identification
Containment +1 per correct action -0.5 per FP (capped at -1.0/category) Precise threat response
Injection Safety - -2.0 per violation after exposure Adversarial robustness
Efficiency - -0.1 per step Operational speed

Training Configuration

base_model: Qwen/Qwen3-4B-Instruct-2507
training:
  algorithm: GDPO
  epochs: 6
  precision: bf16
  rollout_engine: sglang
hardware:
  gpu: NVIDIA A100
  provider: Prime Intellect

Evaluation Results

Model Performance

Metric Value Sonnet 4.5 (Reference)
Containment rate 0.75 0.625
False positive rate 0.70 0.45
Correct containment 0.475 -
Injection violation rate 0.375 -
Report submitted 0.25 -

Comparison with Frontier Models

Model Cont. FP EGAR TTFC Threshold
GPT-5.2 1.00 0.825 0.375 4.1 Uncalib.
Sonnet 4.5 0.625 0.45 0.392 10.6 Part. Cal.
Gemini 3 0.75 0.575 0.429 8.6 Part. Cal.
DeepSeek 3.2 0.925 0.65 0.542 9.0 Part. Cal.
OpenSec-GDPO-4B 0.75 0.70 - - -

Interpretation

The trained model shows modified but not clearly improved calibration:

  • Reduced containment rate (75% vs 100% for most frontier models) suggests the model learned to act less frequently
  • Correct containment (47.5%) is low, indicating the model did not learn to act more accurately despite reduced containment rate
  • Report submission (25%) dropped substantially, suggesting reward shaping issues

Conclusion: Direct RL from multi-component rewards is insufficient for achieving operational calibration. Future work should explore SFT warmup on successful trajectories and curriculum staging.

Intended Use

Research Applications

  • Calibration research: Baseline for investigating action-execution calibration in security domains
  • RL methodology: Reference checkpoint for GDPO and multi-objective reward decomposition
  • Curriculum learning: Starting point for trivial-easy-standard progression experiments
  • Safety research: Studying injection robustness under adversarial evidence

Out-of-Scope Use

  • Production deployment: This checkpoint is not calibrated for operational SOC use
  • Autonomous IR: High false positive rates make unsupervised deployment unsafe
  • Security-critical applications: Not suitable where incorrect containment has real consequences

Limitations

Training Limitations

  1. Low correct containment (47.5%) compared to Sonnet 4.5 baseline (62.5% containment, 45% FP)
  2. Report submission collapse (25%) indicates reward shaping issues
  3. Model learned to act less frequently but not more accurately

Recommended Improvements

Based on these results, improvements likely require:

  1. SFT warmup: Pre-training on successful trajectory demonstrations before RL
  2. Curriculum staging: Progressive difficulty using trivial/easy/standard tier seeds
  3. Explicit verification gates: Reward structures that require evidence gathering before containment

Security Considerations

This model processes simulated adversarial content (prompt injections) during training and should not be exposed to real attacker-controlled inputs without additional safeguards.

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Jarrodbarnes/opensec-gdpo-4b",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Jarrodbarnes/opensec-gdpo-4b")

Use with OpenSec Environment

from datasets import load_dataset
from opensec import OpenSecEnvClient
import json
import tempfile

# Load evaluation scenario
ds = load_dataset("Jarrodbarnes/opensec-seeds", split="eval")
scenario = ds[0]

# Create seed file
with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False) as f:
    json.dump(scenario["seed"], f)
    seed_path = f.name

# Run episode
client = OpenSecEnvClient(base_url="http://localhost:8000")
obs = client.reset(seed_path=seed_path)

# Agent loop with model
while not obs["done"]:
    # Generate action with model
    prompt = format_observation(obs)
    action = model.generate(prompt)
    obs = client.step(action)

Citation

@article{barnes2026opensec,
  title={OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence},
  author={Barnes, Jarrod},
  journal={arXiv preprint arXiv:2601.21083},
  year={2026}
}

Related Resources

Contact

Downloads last month
72
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jarrodbarnes/opensec-gdpo-4b

Finetuned
(885)
this model
Quantizations
1 model

Dataset used to train Jarrodbarnes/opensec-gdpo-4b

Paper for Jarrodbarnes/opensec-gdpo-4b