Validation Route

Help break the claim, not the project.

This page is the public route for seven useful challenges: formula counterexamples, data leakage reports, stronger baselines, reproduction failures, claim-boundary overreach, credit-leak reports, and authority-leak reports. It is a scientific challenge, not a security target and not a request for private data.

Challenge public claims with public evidence, not private people or private systems.

claim input failure repair

Proof Action Issues WisdomBench Issues Example Packets Issue Template Evidence Map

Counterexample route visual for public claim repair.

Template

Use the smallest reproducible objection.

A good counterexample names one challenge class, identifies the target claim, gives public input, states expected behavior, shows observed failure, and proposes the narrowest repair. Use the local template when filing an issue.

Open the counterexample issue template

See seven minimal example packets

Class 1

Formula counterexample

Give a concrete variable assignment, boundary case, or toy state that violates a published formula or protocol rule.

Class 2

Data leakage

Show train/test contamination, future information, duplicate rows, leaked labels, or provenance mismatch using public evidence.

Class 3

Stronger baseline

Use the same scoring boundary to show a simpler baseline matches, beats, or removes the reported effect.

Class 4

Repro failure

Give task id, command, environment, expected output, observed output, and the smallest public replay case.

Class 5

Claim-boundary overreach

Point to exact wording where a public claim exceeds the evidence envelope and propose a narrower boundary.

Class 6

Credit leak

Show where repair intent, bootstrap notes, semantic guesses, or paper-only artifacts accidentally become metric, reward, denominator, or clean-learning credit.

Class 7

Authority leak

Show where research-only, shadow, suggestion, or no-go output is presented as action permission, deployment readiness, or gate authority.

Rule

A useful objection must be reproducible.

The strongest critique is not a slogan. It names a claim, provides a public input or state, states the expected safe behavior, shows the observed failure, and identifies the missing evidence or receipt.

Accepted counterexamples create repair work orders, benchmark tasks, claim-boundary edits, or new gate tests. Repeated counterexamples become part of the evidence ladder.

Allowed Scope

Seven useful ways to attack the work.

Submit only public, non-sensitive examples. Do not include credentials, customer data, sensitive instructions, live operational logs, unpublished review material, or operational harm instructions.

ClassSubmitInvalid ifMinimal verification

Formula counterexampleFormula/protocol id, variable assignment, expected result, observed contradiction.It only says the formula feels wrong or needs private assumptions.Replay the assignment; check whether the stated rule, inequality, gate, or invariant fails.

Data leakageDataset row ids, split ids, hashes, duplicate evidence, future timestamp, or provenance mismatch.It requires private data, protected logs, or guesses about hidden pipelines.Recompute split/provenance checks on public artifacts; confirm whether label, future, or duplicate leakage exists.

Stronger baselineBaseline description, public code/command, same task set, same metric, seeds, and result table.The baseline changes the task, metric, data boundary, or allowed information.Run the baseline under the same scoring contract; compare effect size and confidence interval.

Repro failureCommand, environment, artifact version, expected output, observed output, and minimal failing case.It omits the command, uses private dependencies, or reports only a screenshot without replay details.Run the stated command from a clean checkout or artifact package; confirm the mismatch.

Claim-boundary overreachExact sentence, claimed scope, supporting evidence, missing evidence, and proposed narrower wording.It attacks a claim the project does not make or asks for private disclosure.Trace claim -> artifact -> metric -> limitation; decide whether wording, evidence, or boundary must change.

Credit leakMetric/reward/denominator/clean-learning field, forbidden source, public artifact or wording, and affected claim.It requires private logs or only says the system may have rewarded itself.Trace source tag -> credit field -> claim; confirm whether a no-credit or quarantine rule was bypassed.

Authority leakRoute, UI label, API field, README wording, or public claim that implies permission to act without a closed warrant.It asks for private execution details or attacks an action claim the project does not make.Trace wording -> warrant requirement -> action boundary; confirm whether the public label must be downgraded.

Not In Scope

This is not an open challenge surface.

Do not submit exploit instructions, credential tests, harassment, private deployment guesses, live trading log requests, customer-data requests, or claims that require protected commercial orchestration to verify.

A valid challenge should be public, minimal, reproducible, and connected to a named claim. If the challenge needs non-public material, the correct public outcome is usually a claim-boundary downgrade, not disclosure.

Report Template

Make the objection useful enough to replay.

A strong challenge should fit this structure. If a field cannot be filled with public information, do not publish the non-public material; request a boundary downgrade instead.

FieldWhat to provideWhy it mattersSafety boundary

Target claimExact claim, card, paper section, metric, or README line.Prevents vague criticism.No personal claims.

Counterexample classBenchmark error, proof gap, false no-go, credit leak, authority leak, or boundary mismatch.Routes it to the right repair queue.No exploit class.

Public inputMinimal task, transcript, toy state, code snippet, or public dataset row.Makes replay possible.No credentials or private logs.

Expected behaviorWhat the protocol should have done under its own rules.Tests the stated boundary.No demand for private execution.

Observed failureWhat actually happened, with artifact link or screenshot if safe.Separates evidence from opinion.Redact sensitive material.

Evidence gapMissing threshold, falsifier, receipt, null arm, baseline, provenance, stop rule, or confidence interval.Turns criticism into a fixable object.No secret-system inference.

Proposed repairPatch, new test, boundary downgrade, stronger baseline, or reproduction command.Creates forward motion.Keep it public and safe.

Required

Claim attacked

Name the exact claim, metric, gate, paper section, README line, or system behavior being challenged.

Required

Public input

Provide a task, state, transcript, toy case, or minimal script that can be shared without private data.

Required

Expected vs observed

State what a safe system should do and what the current artifact actually does.

Required

Evidence gap

Name the missing proof: falsifier, receipt, null arm, baseline, provenance, confidence interval, or stop rule.

Counterexample route visual for claim boundaries and repair routes.

Where To Submit

Use the narrowest public channel.

Proof-carrying action, warrants, receipts, no-credit repair: use the proof-carrying-action issue tracker.
Benchmark tasks, scoring, longitudinal metrics: use the WisdomBench issue tracker.
Website wording, evidence boundaries, public link errors: use the website source repository or email.
Private deployment, customer data, financial execution details, or sensitive operational logs: do not post publicly.

Open proof action issue Open WisdomBench issue

Triage

How reports are handled.

StatusMeaningPublic outcomePrivate boundary

AcceptedThe counterexample changes a claim, metric, or gate.Patch, repair work order, or benchmark task.No customer data exposed.

Needs reproductionThe case may be valid but is not yet replayable.Request for smaller public input.No private logs requested publicly.

Boundary issueThe objection attacks a claim the project does not make.Claim-boundary clarification.No expansion into unsupported claims.

Security private routeThe report may be valid but contains sensitive operational details.Public summary plus private handling if needed.No exploit or private data published.

DeclinedThe report is unreproducible, unsafe, or outside scope.Short reason and safer route if possible.No debate theater.

Validation Thesis

A system that cannot accept structured counterexamples cannot claim reliable action.

This is the public pressure valve: critique becomes evidence, and evidence becomes repair.