Test how well LLM agents use your MCP tools, compare different models, and track quality over time with automated testing and detailed reports.
Rich visual reports, detailed traces, and interactive dashboards.








Track pass rates, latency trends and recent runs at a glance.
Up and running in under a minute.
1. Install
2. Create eval config
servers:
my-server:
transport: "http"
url: "http://localhost:3000/mcp"
agents:
claude:
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
temperature: 0
scenarios:
- id: "basic-test"
agent: "claude"
servers: ["my-server"]
prompt: "Use the tools to complete this task..."
eval:
tool_constraints:
required_tools: ["my_tool"]
response_assertions:
- type: "regex"
pattern: "success|completed" 3. Run evaluation
Built-in AI assistants to supercharge your workflow.
AI chat to help design and refine evaluation scenarios. Describe what you want to test and get ready-to-use YAML configurations.
AI chat to analyze and explain completed run results. Understand failures, spot patterns, and get actionable improvement suggestions.
Automated review of your MCP tool definitions for quality, safety, and LLM-friendliness. Get recommendations before testing.
Agent Workflows
Install the `mcplab-assistant` skill and reuse the same prompts across Claude, OpenAI Codex, and similar coding agents.
Use the Skills CLI installation flow documented in the MCPLab docs.
These prompts are agent-neutral and work as reusable starting points.
Generate a minimal, valid starter config before scaling up scenarios and agents.
Run one config across multiple agents and summarize performance differences clearly.
Analyze run artifacts and return concrete fixes tied to failed scenarios.
Documentation
Start quickly, then dive deeper with guides for setup, scenario design, app workflows, debugging, and advanced evaluation analysis.