Daily Research Note - 2026-05-20

A claim that cannot be replayed should not become a reliable action.

Modern AI systems are good at producing answers. Reliable systems need a harder contract: every claim should carry enough evidence to be replayed, audited, bounded, and repaired after failure.

Evidence Map Paper Map

Claim-to-replay evidence contract recorder in a stressed environment.

Core Thesis

Confidence is not enough. The claim needs a ledger.

In clean benchmarks, an output can look convincing because the evaluation surface is fixed. In deployment, the world is dirty: sensors drift, maps expire, logs go missing, and failure cases appear outside the prompt. A reliable system therefore needs a claim-to-replay contract before a claim is treated as action-worthy.

The contract is simple: record the input, assumptions, evidence path, failure samples, boundary conditions, and repair verification. If another process cannot replay the claim, the system should lower autonomy rather than pretend certainty.

Evidence Contract

Every claim should be attached to a trace.

A claim-to-replay contract is not only a log file. It is a reliability interface between perception, reasoning, world state, memory, and recovery. It lets a system ask: what did I see, what did I assume, what contradicted me, what failed, and what repair changed the next decision?

Input trace: what entered the system before the claim.
Assumption boundary: what must be true for the claim to hold.
Evidence path: which observations and checks supported it.
Failure sample: what would make the claim degrade or reverse.
Repair verification: what changed after recovery.

Claim

The system emits a judgment, plan, route, detection, or recommendation.

Ledger

The claim carries its input, assumptions, evidence, uncertainty, and provenance.

Replay

Another process can reconstruct why the claim was made and where it may break.

Recovery

When contradiction appears, the system repairs its state instead of hiding the failure.

Failure As Material

The evidence contract turns failure into a recoverable sample.

A failure that disappears into a metric cannot teach the next system anything. A failure with provenance becomes material: it can be compressed, cross-examined, turned into an antigen, added to a replay set, and used to test whether recovery actually improves.

This connects the Wisdom Science portfolio across longitudinal evaluation, cognitive immunity, embodied recovery, anti-interference reliability, representation search, and reflexive world models. The public claim is bounded: this is an evidence architecture note, not a product guarantee and not a detector/tracker SOTA claim.

Practical Contract

What must be replayable?

FieldQuestionReliability FunctionBoundary

InputWhat entered the system?Prevents hidden context driftDoes not prove correctness alone AssumptionWhat must be true?Makes failure surfaces visibleMay be incomplete EvidenceWhat supported the claim?Enables cross-checkingEvidence can still be noisy FailureWhat contradicted it?Feeds immunity and recoveryRequires careful curation RepairWhat changed after replay?Measures learning after failureNot a universal deployment proof