A model score is meaningful inside a benchmark distribution, task definition, scorer, sampling procedure, contamination control, and evaluation assumption. Removing those conditions turns evidence into a slogan. The review packet therefore records the score and the condition together.
claim_id
The public claim under review.