Blind Verification: How False Positives Get Killed

A security scan finds 200 “possible vulnerabilities.” Four hours of triage later, 190 are noise and the other 10 are maybes. The only way to confirm any of them is to write a manual proof of concept.

This is the state of security tooling in 2026. Fixing it required three architectural iterations.

Attempt 1: Template-Based Scanning

The first version of the engine was simple. YAML templates. Regex patterns. Send a payload, check if the response matches a known-bad pattern. This is how most scanners work — Nuclei, Nikto, the whole ecosystem.

# template-v1.yaml
id: ssrf-check
payloads:
  - "http://169.254.169.254/latest/meta-data/"
  - "http://localhost:6379"
matchers:
  - type: regex
    pattern: "(ami-id|instance-id|ERR wrong)"

It worked for the obvious cases. The false positive rate was brutal. A response containing the word “instance-id” in an error message? Flagged. An API that returns user input in the response body? Flagged. Regex cannot understand context — it sees patterns, not meaning.

Triage time exceeded the time it would have taken to pentest the target manually.

Attempt 2: Agentic Scanning

If regex cannot understand context, the next step is a scanner that can reason. The template engine was replaced with an AI agent that read the code, crafted payloads based on what it saw, and reasoned about responses.

This was substantially better. The agent could look at a function, understand the data flow, and craft a targeted attack. It could tell the difference between user input being reflected in an error message versus user input being passed to exec().

It had a new problem: hallucination.

The agent would find something that looked suspicious, then reason itself into a vulnerability that did not exist. “This function could be vulnerable if the input is not sanitized upstream…” It would check upstream, find no sanitization, and report a critical finding — without noticing the WAF sitting in front of the whole system, or the type coercion that made the payload harmless.

“Could be vulnerable” is not the same as “is vulnerable.” The agent could not always tell the difference.

Attempt 3: Single Agent with Proof of Concept

The next iteration forced the agent to prove it. Do not just report a finding — write a concrete proof of concept that demonstrates the exploit. No working PoC, no finding.

This eliminated most of the hallucinations. Either the PoC works or it does not.

There was a subtler problem: confirmation bias.

The same agent that decided something was vulnerable was also writing the PoC. If it already believed the vulnerability was real, it would write a PoC that looked convincing but did not actually prove anything. It tested the happy path. It assumed its payload got through. It wrote assertions that passed because they were testing the wrong thing.

This is the same problem that affects human pentesters. The person who found the bug is the worst person to verify it. They already believe it is real.

In academic peer review, the reviewer does not know who wrote the paper or what the author was thinking. They get the paper and nothing else. They must independently evaluate whether the conclusions follow from the evidence.

The same principle applies to vulnerability verification.

The research agent does its work — discovers attack surfaces, crafts payloads, launches multi-turn attacks, writes PoC code. One long agent session. Then only the PoC code and the file path are extracted, all reasoning and context stripped, and handed to a completely separate verify agent.

The verify agent has no idea why the researcher thought this was vulnerable. It does not know the attack narrative. It receives a PoC script and a file to examine. Its job: independently trace the data flow, run the PoC, and confirm whether the exploit actually works.

If it cannot confirm, the finding is killed. No negotiation.

// the pipeline

// 1. research agent: one multi-turn session
//    discovers + attacks + writes PoC
const findings = await engine.run({
  target: packageDir,
  mode: "audit"
});
// Returns: [{ file, vulnerability, poc, reasoning }]

// 2. verify agents: parallel, independent, blind
//    each gets ONLY poc + file path
const verified = await Promise.all(
  findings.map(f => verifyAgent.run({
    poc: f.poc,        // just the PoC code
    filePath: f.file   // just the file path
    // NO reasoning, NO context, NO attack narrative
  }))
);

// 3. only confirmed findings make the report
const confirmed = verified.filter(v => v.status === "confirmed");

The Engine Scanned Itself

The cleanest way to test a security tool is to point it at itself.

The research agent went through the engine’s own codebase and surfaced 6 potential vulnerabilities:

Command injection via unsanitized package names passed to shell.
SSRF through the target URL parameter in scan mode.
Arbitrary file read via path traversal in the review command.
Prompt injection in the LLM-powered analysis pipeline.
Two more related to input validation edge cases.

Six findings. The pre-verification pipeline would have reported all six as vulnerabilities.

The blind verify agents independently rejected all six as false positives.

Every rejection was correct. The code had proper mitigations in place — input sanitization, URL validation, path normalization, sandboxed execution — that the research agent missed or underestimated. The verify agents, starting from scratch with only the PoC and file path, traced the actual data flow and found that none of the PoCs would succeed against the real code.

Verification Result

Metric	Count
Reported by research agent	6
Confirmed by verify agents	0
Correct rejections	6

An obvious objection: why not have the same agent verify its own findings, or pass the reasoning along so the verify agent has more context?

Because context is exactly how bias propagates. If the verify agent reads “this is a command injection because the package name flows into a shell command,” it will look for ways to confirm that narrative. It will focus on the shell command and miss the sanitization step three functions up the call stack.

Making the verification blind forces the verify agent to build its own understanding from the ground up. It must:

Read the PoC code and understand what it is trying to exploit.
Open the target file and trace the data flow independently.
Determine if the PoC would actually succeed against the real code.
Return a structured verdict: confirmed or rejected, with evidence.

If the research agent missed a sanitization function, the verify agent will find it. If the PoC makes assumptions about the runtime environment, the verify agent will catch that. Two independent analyses are exponentially harder to fool than one.

Parallel, Cheap, Fast

The verify agents run in parallel — one per finding. If the research agent reports 8 vulnerabilities, 8 verify agents spin up simultaneously. Each one is a short, focused session. They do not need multi-turn conversations or tool access. They read code, trace data flow, and output a verdict.

// structured output schema per verify agent

interface VerifyResult {
  finding_id: string;
  status: "confirmed" | "rejected";
  confidence: number;       // 0-100
  evidence: string;         // what the agent found
  data_flow_trace: string;  // source -> sink analysis
  rejection_reason?: string;// why it's a false positive
}

The structured output schema returns machine-parseable results from every verify agent. No regex parsing of natural language. No “let me summarize my findings” that might miss details. Just a typed verdict that pipes straight into the report.

And because the engine is runtime-agnostic, this works with any backend — Claude, Codex, Gemini, or any other model API. Same pipeline, different backend.

The Pipeline, End to End

Step 1 — Research Agent

One multi-turn session. Reads code, maps attack surface, crafts payloads, launches attacks, writes a PoC for every finding.

Step 2 — Strip Context

Extract only PoC code and file path from each finding. Discard reasoning, attack narrative, and confidence scores.

Step 3 — Verify Agents (Parallel)

N agents spin up simultaneously. Each receives one PoC and one file. Independently traces data flow and confirms or rejects.

Step 4 — Report Generation

Only confirmed findings appear. SARIF for GitHub, markdown and JSON with full evidence chains.

Why This Matters

False positives are not just annoying — they are actively harmful.

Every false positive erodes trust in the tool. After the third “critical” finding that turns out to be nothing, developers stop looking at the reports. The real vulnerability that comes next gets ignored because the signal-to-noise ratio trained them to ignore it.

Blind verification does not just reduce false positives. It makes every confirmed finding trustworthy. When the engine reports a vulnerability, it means two independent AI agents — one attacking, one verifying — both agree it is real. The verify agent has traced the data flow from source to sink and confirmed the PoC works. That is a finding worth acting on.

It is the same principle that makes peer review work in science. The same principle behind adversarial testing. The same principle behind separation of duties in security. The person who writes the check does not approve the check.

The Bottom Line

Blind verification is built into the engine. It runs automatically on every audit. The research agent finds what it finds. The verify agents kill what does not hold up. Only the real findings survive.