BaseX
← All posts
AI

May 2, 2026 · 8 min read

The Training Flywheel: How Every Argus Run Improves Aegis

Every engagement Argus runs produces structured trace data — observations, hypotheses, probes, findings, and verified artifacts. That data feeds directly back into Aegis.


Why this matters

Most AI products consume training data once and ship. The model is static. The product improves only when the team manually curates new data, runs another training job, and ships a new version. That cycle is slow and expensive.

Argus is designed differently. Every run it completes is a structured record of real-world offensive security reasoning — what the agents observed, what hypotheses Aegis generated, which probes confirmed or killed those hypotheses, and what artifacts were captured as proof. That record is the highest-signal training data possible: real targets, real reasoning chains, real success and failure labels from deterministic verification.

The flywheel is the mechanism that turns operational use into model improvement. The more Argus runs, the better Aegis gets. The better Aegis gets, the more effective Argus becomes.

What a run produces

A single Argus engagement produces an append-only JSONL trace file. Each line is one event — typed, time-ordered, and self-contained. The event types map directly to the phases of an offensive security engagement:

planThe coordinator decomposes the engagement into a task graph.
observationAn agent records a fact — open port, discovered endpoint, leaked subdomain.
hypothesisThe correlator proposes a breakable invariant — "parameter X flows into an unparameterized SQL statement."
probeThe exploit agent sends a cheap request to kill or confirm the hypothesis.
exploit_attemptA full exploit is fired against a confirmed hypothesis.
findingA confirmed vulnerability with a captured artifact and deterministic re-run label.

Every event carries the full context needed to reconstruct the reasoning chain: the actor, the input, the output, a success/failure label, and a cost record (latency, tokens, bytes). Confirmed findings carry a determinism flag — the exploit was re-run and produced the same artifact.

From trace to training signal

Raw traces are high-volume and noisy. The pipeline that converts them into training signal applies several filters: events without a confirmed finding downstream are weighted lower; novel vulnerability compositions are weighted higher; reasoning chains where Aegis's hypothesis was wrong are kept as negative examples.

The result is a structured dataset of multi-turn conversations — each one grounded in a real engagement, with real tool outputs, real reasoning steps, and a real ground-truth label. This is qualitatively different from synthetic data generated without a live target.

The compounding effect

A security LLM that improves every time it is used is a compounding asset. The first version of Aegis was trained on a curated static corpus. Every subsequent version incorporates traces from real Argus runs. The model learns from its own operational history.

A scanner that produces exploits is useful. A scanner that trains the model that built it — and gets measurably better with every engagement — is a different kind of product entirely.

What comes next

A scanner that produces exploits is useful. A scanner that trains the model that built it — and gets measurably better with every engagement — is a different kind of product entirely.