Validation Route

Help break the claim, not the project.

This page is the public route for five useful attacks: formula counterexamples, data leakage reports, stronger baselines, reproduction failures, and claim-boundary overreach. It is a scientific challenge, not a security target and not a request for private data.

Challenge public claims with public evidence, not private people or private systems.

claim input failure repair
Counterexample route visual for public claim repair.

Template

Use the smallest reproducible objection.

A good counterexample names one challenge class, identifies the target claim, gives public input, states expected behavior, shows observed failure, and proposes the narrowest repair. Use the local template when filing an issue.

Open the counterexample issue template

See five minimal example packets

Class 1

Formula counterexample

Give a concrete variable assignment, boundary case, or toy state that violates a published formula or protocol rule.

Class 2

Data leakage

Show train/test contamination, future information, duplicate rows, leaked labels, or provenance mismatch using public evidence.

Class 3

Stronger baseline

Use the same scoring boundary to show a simpler baseline matches, beats, or removes the reported effect.

Class 4

Repro failure

Give task id, command, environment, expected output, observed output, and the smallest public replay case.

Class 5

Claim-boundary overreach

Point to exact wording where a public claim exceeds the evidence envelope and propose a narrower boundary.

Rule

A useful objection must be reproducible.

The strongest critique is not a slogan. It names a claim, provides a public input or state, states the expected safe behavior, shows the observed failure, and identifies the missing evidence or receipt.

Accepted counterexamples create repair work orders, benchmark tasks, claim-boundary edits, or new gate tests. Repeated counterexamples become part of the evidence ladder.

Allowed Scope

Five useful ways to attack the work.

Submit only public, non-sensitive examples. Do not include credentials, customer data, private prompts, live trading logs, unpublished review material, or operational harm instructions.

ClassSubmitInvalid ifMinimal verification
Formula counterexampleFormula/protocol id, variable assignment, expected result, observed contradiction.It only says the formula feels wrong or needs private assumptions.Replay the assignment; check whether the stated rule, inequality, gate, or invariant fails.
Data leakageDataset row ids, split ids, hashes, duplicate evidence, future timestamp, or provenance mismatch.It requires private data, protected logs, or guesses about hidden pipelines.Recompute split/provenance checks on public artifacts; confirm whether label, future, or duplicate leakage exists.
Stronger baselineBaseline description, public code/command, same task set, same metric, seeds, and result table.The baseline changes the task, metric, data boundary, or allowed information.Run the baseline under the same scoring contract; compare effect size and confidence interval.
Repro failureCommand, environment, artifact version, expected output, observed output, and minimal failing case.It omits the command, uses private dependencies, or reports only a screenshot without replay details.Run the stated command from a clean checkout or artifact package; confirm the mismatch.
Claim-boundary overreachExact sentence, claimed scope, supporting evidence, missing evidence, and proposed narrower wording.It attacks a claim the project does not make or asks for private disclosure.Trace claim -> artifact -> metric -> limitation; decide whether wording, evidence, or boundary must change.

Not In Scope

This is not an open attack surface.

Do not submit exploit instructions, credential tests, harassment, private deployment guesses, live trading log requests, customer-data requests, or claims that require protected commercial orchestration to verify.

A valid challenge should be public, minimal, reproducible, and connected to a named claim. If the challenge needs private material, the correct public outcome is usually a claim-boundary downgrade, not disclosure.

Submission Template

Make the objection useful enough to replay.

A strong challenge should fit this structure. If a field cannot be filled with public information, do not publish the private material; request a boundary downgrade instead.

FieldWhat to provideWhy it mattersSafety boundary
Target claimExact claim, card, paper section, metric, or README line.Prevents vague criticism.No personal claims.
Counterexample classBenchmark error, proof gap, false no-go, credit contamination, or boundary mismatch.Routes it to the right repair queue.No exploit class.
Public inputMinimal task, transcript, toy state, code snippet, or public dataset row.Makes replay possible.No credentials or private logs.
Expected behaviorWhat the protocol should have done under its own rules.Tests the stated boundary.No demand for private execution.
Observed failureWhat actually happened, with artifact link or screenshot if safe.Separates evidence from opinion.Redact sensitive material.
Evidence gapMissing threshold, falsifier, receipt, null arm, baseline, provenance, stop rule, or confidence interval.Turns criticism into a fixable object.No secret-system inference.
Proposed repairPatch, new test, boundary downgrade, stronger baseline, or reproduction command.Creates forward motion.Keep it public and safe.
Required

Claim attacked

Name the exact claim, metric, gate, paper section, README line, or system behavior being challenged.

Required

Public input

Provide a task, state, transcript, toy case, or minimal script that can be shared without private data.

Required

Expected vs observed

State what a safe system should do and what the current artifact actually does.

Required

Evidence gap

Name the missing proof: falsifier, receipt, null arm, baseline, provenance, confidence interval, or stop rule.

Counterexample route visual for claim boundaries and repair routes.

Where To Submit

Use the narrowest public channel.

  • Proof-carrying action, warrants, receipts, no-credit repair: use the proof-carrying-action issue tracker.
  • Benchmark tasks, scoring, longitudinal metrics: use the WisdomBench issue tracker.
  • Website wording, evidence boundaries, public link errors: use the website source repository or email.
  • Private deployment, customer data, financial execution details, or sensitive operational logs: do not post publicly.

Triage

How reports are handled.

StatusMeaningPublic outcomePrivate boundary
AcceptedThe counterexample changes a claim, metric, or gate.Patch, repair work order, or benchmark task.No customer data exposed.
Needs reproductionThe case may be valid but is not yet replayable.Request for smaller public input.No private logs requested publicly.
Boundary issueThe objection attacks a claim the project does not make.Claim-boundary clarification.No expansion into unsupported claims.
Security private routeThe report may be valid but contains sensitive operational details.Public summary plus private handling if needed.No exploit or private data published.
DeclinedThe report is unreproducible, unsafe, or outside scope.Short reason and safer route if possible.No debate theater.

Validation Thesis

A system that cannot accept structured counterexamples cannot claim reliable action.

This is the public pressure valve: critique becomes evidence, and evidence becomes repair.