App
Scenario Setup in the App
Create and manage evaluation scenarios directly in the MCPLab app UI.
Open the Config Editor
In the app sidebar, open MCP Evaluations, then click Create New (or edit an existing evaluation).
The editor opens with tabs for Scenarios and Agents. Use Scenarios to build the test cases in your evaluation config.
- Path: Lab -> MCP Evaluations -> Create New.
- Existing configs: open a row and click Edit.
- Use Name/Description fields to document the evaluation purpose.
Add Scenarios (Reference or Inline)
The Scenarios tab supports three entry methods so you can mix reusable library scenarios with config-specific inline scenarios.
- Add Ref: reference a scenario from the scenario library.
- Import Inline: copy a library scenario into this config for local customization.
- Add scenario: create a brand-new inline scenario from scratch.
Edit Inline Scenario Details
Expand an inline scenario row to edit prompt, server bindings, tool constraints, assertions, and extraction rules through the scenario form.
Inline scenarios require a name before saving. Keep names unique and descriptive for easier run analysis.
- Click the chevron on an inline row to expand details.
- Use concise, deterministic prompts first, then tighten assertions.
- Save after each meaningful scenario change to keep diffs reviewable.
Organize and Normalize Scenario Entries
- Use up/down arrows to reorder scenario execution.
- Use Convert to inline to copy a referenced scenario into editable inline form.
- Use Remove to drop scenarios you no longer need.
- Fix Missing badges for broken references before running.
Run and Validate from the App
After saving, open Run Evaluation, choose your config, select scenarios and agents, then execute a baseline run.
Start with one scenario and one run to validate setup before scaling to more scenarios or higher variance.
- Use scenario selection to isolate failures quickly.
- Use Results and Result Detail to verify assertions and tool usage.
- Iterate in Config Editor, save, and rerun until the baseline is stable.