The Self-Evolving Agent Ecosystem — Trading agents that evolve through Darwinian selection and adversarial self-play
-
Updated
Apr 13, 2026 - Python
The Self-Evolving Agent Ecosystem — Trading agents that evolve through Darwinian selection and adversarial self-play
AI Robustness Evaluation System
Open-source AI agent red-team engine, SDK, and CLI. Run offline or against the Humanbound Platform.
bili-core is an open-source framework for LLM benchmarking using LangChain, LangGraph, Streamlit, and Flask. It enables effective LLM model comparisons, Retrieval-Augmented Generation (RAG), and customizable decision workflows. Part of MSU Denver’s Sustainability Hub, bili-core promotes data democracy and transparent, reproducible AI research. 🚀
Red-team your AI agents from any coding IDE. Adversarial security testing skills for Claude Code, Cursor, Codex, and 40+ agents.
A marketplace of Claude Code plugins for adversarial security and architectural code review.
Elenchus MCP Server - Adversarial verification system for code review
AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation
Adversarial testing of LLMs on constraint satisfaction deadlocks
Context engineering toolkit for LLMs — pack, cache, debug, red-team, and orchestrate context windows. Council of Experts, adversarial testing, immune system, context compiler, drift detection, multi-agent entanglement. TypeScript + Python.
Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.
Agent-driven adversarial paper audit framework
Cross-model orchestration for Claude Code — Claude builds, Codex validates. Blind TDD, adversarial stress testing, mixed-model teams, and automatic fallback. Two AI models enter, better code leaves.
API for generating LLM bot/agent personalities based on the Big Five personality model.
Mechanism-grounded taxonomy of 40 LLM jailbreak patterns across 10 categories. Full evaluation harness for 4 frontier models. AI safety research with responsible disclosure.
Go toolkit + library: structured adversarial corpora for LLM/RAG safety + quality testing. Prompt injection, KB exfiltration, jailbreak, system-prompt probing. CI/CD-ready.
CLI for Audn.ai — CI/CD security gate and developer workflows for AI agent red-teaming
Adversarial testing and red-teaming framework for enterprise LLM deployments. Covers OWASP LLM Top 10 across 11 attack modules, RAG poisoning, tool-call abuse, PII leakage, credential harvesting, hallucination, and more. Built to run in CI/CD pipelines.
9-stage enterprise development pipeline for Claude Code. TDD, adversarial testing, mechanical verification. Any stack.
Add a description, image, and links to the adversarial-testing topic page so that developers can more easily learn about it.
To associate your repository with the adversarial-testing topic, visit your repo's landing page and select "manage topics."