docs: add eval evidence guide by YOMXXX · Pull Request #1598 · obra/superpowers · GitHub

YOMXXX · 2026-05-21T12:21:34Z

What problem are you trying to solve?

The PR template asks contributors to show evaluation, rigor, and adversarial evidence, and writing-skills explains RED/GREEN pressure testing for skill work. But contributors still have to infer how to package that evidence for reviewers across different change types.

That makes review harder than it needs to be: useful evidence can be buried in prose, a baseline can be missing, or a docs-only PR can overclaim behavior evidence it did not run. #1597 proposes standardizing this evidence before attempting larger workflow-state or preference features.

What does this PR change?

Adds docs/eval-evidence.md, a contributor-facing guide for packaging PR evidence. It defines a reusable evidence packet, a change-type evidence matrix, and short templates for runtime bugfixes, skill behavior changes, and docs-only guidance. It also links the guide from the README contribution steps and docs/testing.md.

Is this change appropriate for the core library?

Yes. This is contributor infrastructure for all Superpowers changes. It is not project-specific, harness-specific, or tied to a third-party service, and it does not change runtime or skill behavior.

What alternatives did you consider?

Put this directly in the PR template. Rejected because the PR template is already long and should stay focused on required fields. A separate guide can include examples without making every PR body heavier.
Put this inside writing-skills. Rejected because the guidance applies to runtime bugs, hook/installer fixes, harness support, and docs-only contributor guidance, not only skill authorship.
Wait for a workflow-state proposal. Rejected because evaluation packaging is useful immediately and is a lower-risk first step before larger state/preference work.
Do nothing. Rejected because current evidence expectations exist, but the shape of a good evidence packet is still implicit.

Does this PR contain multiple unrelated changes?

No. All changes support one concern: helping contributors present test and eval evidence in a reviewer-friendly format.

Existing PRs

I have reviewed all open AND closed PRs for duplicates or prior art
Related PRs: none found that add an eval evidence guide or docs/eval-evidence.md

Related prior art and nearby work:

feat: add plan review cycle skill #1473 contains a detailed evaluation section for a plan-review skill, but it does not add reusable contributor guidance.
writing-skills: add Script vs Prose guidance #1274 and fix(brainstorming): ground recommendations in named comparison dimensions #1512 are examples of PRs where evaluation/rigor disclosure matters, but they do not standardize the format for future contributors.
RFC: standardize eval evidence before adding workflow state and preferences #1597 is the RFC issue opened for sequencing eval evidence before workflow state and preferences.

Searches run included exact terms for docs/eval-evidence.md, Eval Evidence for Superpowers PRs, and evidence packet; no direct duplicate was found.

Environment tested

Harness (e.g. Claude Code, Cursor)	Harness version	Model	Model version/ID
Local shell documentation checks	macOS zsh/bash	N/A	N/A

New harness support (required if this PR adds a new harness)

Not applicable. This PR does not add or modify harness support.

Clean-session transcript for "Let's make a react todo list"

N/A - this PR does not add a new harness.

Evaluation

Initial trigger: my human partner asked me to plan and execute the next useful feature direction for Superpowers. Discussions are disabled on the repository, so I opened RFC: standardize eval evidence before adding workflow state and preferences #1597 as an RFC issue and then started with the lowest-risk first phase: contributor evidence guidance.
Eval sessions after making the change: 0 live model eval sessions. This is docs-only contributor guidance, not behavior-shaping skill text.
Before: README pointed contributors to writing-skills, the eval harness, and the PR template, but there was no focused guide that explained how to package baseline, after-change, adversarial, verification, and limits evidence for reviewers.
After: docs/eval-evidence.md gives a standard evidence packet, change-type matrix, templates, and common mistakes; README and docs/testing.md link to it from the contribution/test flow.

Verification commands run:

git diff --cached --check
git diff --check HEAD~1 HEAD
test -f docs/eval-evidence.md
rg -n "docs/eval-evidence.md|Eval Evidence for Superpowers PRs|Reporting Evidence in PRs" README.md docs/testing.md docs/eval-evidence.md

All commands exited 0.

Rigor

If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing (paste results below)
This change was tested adversarially, not just on the happy path
I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) without extensive evals showing the change is an improvement

This is a docs-only contributor guide. It does not modify skill behavior, Red Flags tables, rationalization guidance, or any prompt used by agents. The unchecked boxes are intentional because no live adversarial model eval was run or needed for a docs-only guide.

Human review

A human has reviewed the COMPLETE proposed diff before submission

The complete staged diff was shown before submission. The human partner previously instructed me to treat shown diffs as reviewed for these PRs unless they say otherwise.

Refs #1597.

YOMXXX · 2026-05-21T12:22:02Z

Reviewer note: this is intentionally the smallest first step from RFC #1597.

What changed:

added docs/eval-evidence.md with a reusable PR evidence packet, change-type matrix, and short templates
linked it from README contributing steps
linked it from docs/testing.md

What did not change:

no skill behavior
no runtime code
no harness support
no eval harness behavior

Verification run after the commit:

git diff --check HEAD~1 HEAD
test -f docs/eval-evidence.md
rg -n "docs/eval-evidence.md|Eval Evidence for Superpowers PRs|Reporting Evidence in PRs" README.md docs/testing.md docs/eval-evidence.md

All exited 0.

docs: add eval evidence guide

6cf5df1

YOMXXX mentioned this pull request May 22, 2026

RFC: standardize eval evidence before adding workflow state and preferences #1597

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add eval evidence guide#1598

docs: add eval evidence guide#1598
YOMXXX wants to merge 1 commit into
obra:devfrom
YOMXXX:docs/eval-evidence-kit

YOMXXX commented May 21, 2026

Uh oh!

YOMXXX commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

YOMXXX commented May 21, 2026

What problem are you trying to solve?

What does this PR change?

Is this change appropriate for the core library?

What alternatives did you consider?

Does this PR contain multiple unrelated changes?

Existing PRs

Environment tested

New harness support (required if this PR adds a new harness)

Evaluation

Rigor

Human review

Uh oh!

YOMXXX commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant