opnsense-agent-phi35 β€” a LoRA that gives Phi-3 mini an OPNsense brain

"Did my firewall just get AI?"

Sort of, yes β€” but not the way you might think. This is a 3.8 B parameter LoRA adapter on top of unsloth/Phi-3-mini-4k-instruct trained to emit structured JSON tool calls for the OPNsense REST API (firewall rules, NAT, WireGuard, Suricata, Unbound, IPsec, OpenVPN, traffic shaping, ACME, cron, monit, diagnostics…).

It does not chat about firewalls β€” it acts on them. You feed it an admin intent in natural language, it picks the right OPNsense endpoint and produces a well-formed argument blob. A surrounding agent (provided in the training repo) then executes the call with the usual safety rails (scope confirmation, read/write separation, audit log).

⚠️ Note on naming: the repository name keeps the _phi35 suffix for historical reasons, but the actual base model is Phi-3 mini 4k, not Phi-3.5. A Phi-3.5 variant is in the backlog.


TL;DR

Property Value
Base model unsloth/Phi-3-mini-4k-instruct
Adapter type LoRA (PEFT 0.18+)
Rank / alpha r = 64, lora_alpha = 128
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training samples 13 701 SFT (combined dataset)
Final eval loss 0.29 (run v7)
Verification 102 / 102 OPNsense functions on CAP v1 packets (see Team play)
Context length 4 096
License MIT (matches Phi-3)

What it can do

The training set covers 102 distinct OPNsense API functions across all the moving parts of a typical firewall deployment:

  • Filter rules β€” list / create / toggle / delete pf rules
  • NAT β€” port-forward, outbound NAT, 1:1
  • WireGuard β€” instances, peers, key rotation
  • OpenVPN β€” server/client, certificates
  • IPsec β€” phase1/phase2, mobile clients
  • Routing β€” static routes, default gateway
  • Traffic shaper β€” pipes, queues, rules (QoS)
  • Unbound DNS β€” overrides, blocklists, restart
  • Suricata β€” toggle rules, reload, alert tail (IDS/IPS)
  • ACME β€” Let's Encrypt certificates, renewal
  • Monit β€” service status, restart
  • Cron β€” list / schedule / toggle jobs
  • Diagnostics β€” interfaces, ARP, sockets, gateway status

Out of training set β‡’ either polite refusal or hallucinated arguments. Always wrap with a tool whitelist on the client side.


Deployment topology

The natural question is "can I just run this inside my OPNsense box?" You technically can β€” FreeBSD ports include misc/llama-cpp and a Q4_K_M quant of Phi-3 mini fits in ~2.5 GB of RAM. We strongly recommend you don't. Here's why, and what to do instead.

Option A β€” embedded (NOT recommended)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ OPNsense (FreeBSD) ────────────────────┐
β”‚  pf rules Β· NAT Β· WireGuard Β· Suricata Β· …                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ llama-server (Phi-3 + LoRA)  ← CPU contention        β”‚  β”‚
β”‚  β”‚ python agent harness         ← extra attack surface  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Problems: every inference (10–30 s on CPU) steals cycles from the packet-forwarding path, you've widened the attack surface of an edge router, and you're now responsible for a hand-built llama.cpp on each OPNsense upgrade.

Option B β€” sidecar VM (recommended)

β”Œβ”€β”€ debian-llm VM ──┐         REST/HTTPS         β”Œβ”€β”€ OPNsense ──┐
β”‚ llama-server      β”‚ ─────  + API key   ─────►  β”‚ pf Β· NAT Β· … β”‚
β”‚ + Phi-3 + LoRA    β”‚ ◄── 200/4xx/5xx/JSON ───── β”‚ /api/...     β”‚
β”‚ + agent harness   β”‚                            β”‚              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
   admin chat / scripts

OPNsense stays minimal and audited. The LLM runs on a Debian/Ubuntu VM with whatever resources you can afford to give it (CPU-only is fine; GPU optional). The agent talks to OPNsense exclusively via its authenticated REST API, with a scope_confirmed guard before any mutating call.

This is the topology used by the reference opnsense-wg-agent.py which drives this LoRA in production.


Team play β€” part of an agentic SOC

This LoRA was not trained in isolation. It is one of three specialist agents inside an agentic SOC built around a coordinator-pilot pattern. A natural-language request like "block all chinese IPs that scanned port 22 in the last hour" gets:

  1. Parsed by the coordinator's pilot agent (a larger reasoning LLM, e.g. Qwen 2.5 7B-Instruct).
  2. Decomposed into a plan of one or more CAP v1 packets (Coordinator-Agent Packet β€” a typed JSON envelope with a directive, named entities, and arguments).
  3. Dispatched to the right specialist:
    • OPNsense agent (this LoRA) β€” firewall rules, NAT, VPN…
    • WireGuard agent β€” peer onboarding, key rotation
    • CrowdSec agent β€” decision lists, bouncers, scenarios
  4. Synthesised back into a single human-readable report.
                    natural-language admin request
                                    β”‚
                                    β–Ό
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚  Coordinator / Pilot (Qwen 2.5 7B)    β”‚
                β”‚  plan β–Έ execute β–Έ synthesise          β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚  CAP v1 JSON
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β–Ό               β–Ό               β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  OPNsense    β”‚ β”‚  WireGuard   β”‚ β”‚  CrowdSec    β”‚
        β”‚  LoRA (this) β”‚ β”‚  LoRA        β”‚ β”‚  LoRA        β”‚
        β”‚  Phi-3 mini  β”‚ β”‚  Phi-3 mini  β”‚ β”‚  Phi-3 mini  β”‚
        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚                β”‚                β”‚
               β–Ό                β–Ό                β–Ό
        OPNsense REST    /etc/wireguard       CrowdSec LAPI

The CAP v1 packet

Each specialist agent consumes a CAP v1 packet β€” a normalised envelope produced by the coordinator. Example:

{
  "directive": "block_ip",
  "entities": {
    "IP_ADDRESS":  ["203.0.113.42"],
    "INTERFACE":   ["wan"],
    "PORT_NUMBER": [],
    "HOSTNAME":    [],
    "IP_SUBNET":   []
  },
  "args":    {"action": "block"},
  "context": {"source": "coordinator", "run_id": "plan-abc-1234", "confidence": 0.97}
}

The OPNsense LoRA's job is then to map directive β†’ the right OPNsense tool call, with entities and args projected into the call's parameters. This is the format that production traffic actually uses β€” much narrower than free-form NL prompting, and the model is trained on both representations (~40 % CAP v1 / ~60 % NL β†’ tool-call) in the combined dataset.

Integration verification

scripts/verify_opnsense_v2.py exercises CAP v1 β†’ tool_call on the full 102-function surface (one CAP packet per function, with realistic entity payloads):

Run Loss CAP v1 verify Coverage
v3 (Mar 1) 0.32 β€” 92 functions
v5 (Mar 4) 0.30 β‰₯ 99 % 99 functions
v6 (Mar 5) 0.28 100 % (69/69 on CAP v2 sample) 99 functions
v7 (Mar 7) 0.2876 100 % (102/102) 102 functions

The lesson from the v1 β†’ v7 cycles is in the training journal: the right verification target is the production format (CAP v1), not the free-form NL chat format. Once that was understood (around v5), targeted dataset augmentation became surgical instead of guesswork.

What "team play" means in practice

  • The OPNsense agent can ask the coordinator for clarification when an entity is ambiguous (e.g. INTERFACE: ["wan", "opt1"]).
  • The coordinator can chain CAP packets: "block IP X" may produce (1) CrowdSec.add_decision + (2) OPNsense.create_filter_rule to enforce the same intent at two layers.
  • Each agent runs on its own port (3000/3001/3002), so the failure of one specialist does not poison the others β€” the coordinator marks the failed step and continues the plan.

The full coordinator + agents architecture is documented in the cyber-agent-engine repository (see coordinator/README.md and AGENTS.md).


Quickstart β€” running it on a Debian VM

1. Build llama.cpp from source (b3813 or later)

Older builds break LoRA loading (specifically, b1-9c69907 silently ignores --lora-init-without-apply, which causes the adapter to be merged at the wrong moment and produces garbage for OPNsense tools).

sudo apt install -y build-essential cmake git git-lfs
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && git checkout b3813
cmake -B build -DGGML_CUDA=OFF       # add =ON if you have a GPU
cmake --build build -j"$(nproc)"

2. Get the base model as GGUF

mkdir -p ~/models
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf \
    Phi-3-mini-4k-instruct-q4.gguf \
    --local-dir ~/models/phi-3-mini

(Or any other Q4 / Q5 / Q8 quant of Phi-3-mini-4k-instruct you trust.)

3. Clone this LoRA

git lfs install
git clone https://huggingface.co/patlegu/opnsense-agent-phi35 \
    ~/loras/opnsense-agent-phi35

4. Convert the LoRA to GGUF (one-time)

llama-server wants GGUF, not raw safetensors:

cd ~/llama.cpp
python convert_lora_to_gguf.py \
    --base ~/models/phi-3-mini \
    ~/loras/opnsense-agent-phi35
# produces opnsense-agent-phi35-F16-LoRA.gguf next to the adapter

5. Run llama-server with the LoRA applied at load time

~/llama.cpp/build/bin/llama-server \
    -m ~/models/phi-3-mini/Phi-3-mini-4k-instruct-q4.gguf \
    --lora ~/loras/opnsense-agent-phi35/opnsense-agent-phi35-F16-LoRA.gguf \
    --host 0.0.0.0 --port 8080 \
    --ctx-size 4096 \
    -t "$(nproc)"

Add -ngl 33 if compiled with CUDA β€” offloads all 32 layers + output to the GPU and you'll get ~80 tok/s on a 24 GB card vs ~7 tok/s on CPU.

6. Smoke test

curl -s http://localhost:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "phi-3-mini",
      "messages": [
        {"role": "system", "content": "You are an OPNsense agent."},
        {"role": "user",   "content": "List all scheduled cron jobs."}
      ],
      "tools": [{
        "type": "function",
        "function": {
          "name": "get_cron_jobs",
          "description": "Get list of all scheduled Cron jobs",
          "parameters": {"type": "object", "properties": {}, "required": []}
        }
      }],
      "tool_choice": "auto"
    }' | jq '.choices[0].message.tool_calls'

Expected output: a single tool_calls entry pointing at get_cron_jobs with arguments: "{}".


Tool-calling format

The LoRA was trained on OpenAI-style messages with tools and tool_calls. A typical training example looks like:

{
  "messages": [
    {"role": "user", "content": "Retrieve all scheduled Cron jobs."},
    {"role": "assistant", "content": null, "tool_calls": [{
      "id": "call_g2zmJfQcE",
      "type": "function",
      "function": {"name": "get_cron_jobs", "arguments": "{}"}
    }]},
    {"role": "tool", "tool_call_id": "call_g2zmJfQcE",
     "content": "{\"status\":\"success\",\"details\":\"…\"}"},
    {"role": "assistant", "content": "The following cron jobs are…"}
  ],
  "tools": [{
    "type": "function",
    "function": {"name": "get_cron_jobs",
                 "description": "Get list of all scheduled Cron jobs",
                 "parameters": {"type":"object","properties":{},"required":[]}}
  }]
}

A minimal agent loop:

  1. You: build the tools list (whitelist of safe OPNsense functions).
  2. Model: returns a tool_calls[] block.
  3. You: dispatch the call to your OPNsense API client, capture the result.
  4. You: append a role: "tool" message with the result.
  5. Re-prompt β†’ the model summarises the result in natural language.

A reference Python implementation lives in agents/opnsense/ of the training repository.


Limitations & safety

  • Tool selection only β€” the model does not execute anything on the firewall. Make sure your agent loop applies a scope_confirmed: bool guard before any mutating call (rule create/delete, service restart, etc.).
  • Argument hallucination β€” for tools outside the training set, the model will happily make up plausible-looking parameters. Whitelist your tools and validate every argument server-side.
  • Not an autonomous decision-maker β€” treat it as an admin shortcut ("which API do I call to do X?"), not as a replacement for a human reviewing the change before it lands.
  • English-first, French-aware β€” training set is mostly English with some French. Other languages will degrade fast.
  • llama.cpp version pin β€” LoRA loading is broken in some older llama.cpp builds. Use b3813 or later.
  • License of the base model still applies. Phi-3 mini is MIT, but you're responsible for downstream compliance as defined in Microsoft's model card.

Training details

Setting Value
Base unsloth/phi-3-mini-4k-instruct-bnb-4bit (4-bit base + LoRA)
Epochs 3
Batch 2 Γ— 8 grad accum (= 16 effective)
Learning rate 2e-4, cosine, 10 % warmup
Sequence length 4 096
Optimizer adamw_8bit (bf16)
Trainer Unsloth + TRL SFT
Dataset 13 701 SFT examples (combined: base + IDS + traffic-shaping + ACME + IPsec/OpenVPN)
Final eval loss 0.2876
Verification 102 / 102 functions answered correctly on verify_opnsense_v2.py

The training script and dataset generation pipeline live in the cyber-agent-engine repository: scripts/train_opnsense_lora.py and data/sft/opnsense_combined_train.jsonl.


Acknowledgements

Citation

If you use this adapter in research or production, please cite the parent project:

@software{opnsense_agent_phi35_2026,
  author  = {Le Guyader, P. and contributors},
  title   = {opnsense-agent-phi35: a Phi-3 LoRA for OPNsense tool-calling},
  year    = {2026},
  url     = {https://huggingface.co/patlegu/opnsense-agent-phi35}
}
Downloads last month
30
GGUF
Model size
4B params
Architecture
phi3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for patlegu/opnsense-agent-phi35

Adapter
(325)
this model