chi-square-analysis

Here is 1 public repository matching this topic...

rozetyp / win95stack-benchmark

LLM behavioral benchmark from 25-month narrative gameplay. 540 runs, 6 models, pre-registered statistical analysis. GPT-4o-mini shows a perfect binary switch on a social decision from prompt framing alone.

gemini claude narrative-game chi-square-analysis open-dataset prompt-engineering llm-evaluation llm-agents gpt-4o llm-benchmark behavioral-benchmark

Updated Apr 21, 2026
TypeScript

Improve this page

Add a description, image, and links to the chi-square-analysis topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chi-square-analysis topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chi-square-analysis

Here is 1 public repository matching this topic...

rozetyp / win95stack-benchmark

Improve this page

Add this topic to your repo