CLI

Running Evaluations

The mcplab run command and all its options.

Basic Run

Point mcplab at your eval config to run all scenarios.

run all scenarios

npx @inspectr/mcplab run -c eval.yaml

Filter Scenarios

Run a single scenario by its ID using -s. Pass the flag multiple times to run several.

single scenario

npx @inspectr/mcplab run -c eval.yaml -s basic-test

multiple scenarios

npx @inspectr/mcplab run -c eval.yaml -s test-one -s test-two

Select Agents

By default all agents defined in the config are used. Narrow the selection with --agents or expand to include all agents defined in the library with --agents-all.

specific agents

npx @inspectr/mcplab run -c eval.yaml --agents claude,gpt4o

all agents (config + library)

npx @inspectr/mcplab run -c eval.yaml --agents-all

Variance Runs

Run each scenario multiple times to measure consistency. The -n flag sets the number of runs per scenario. Results include a pass rate across all runs.

5 runs per scenario

npx @inspectr/mcplab run -c eval.yaml -n 5

Interactive Mode

Interactive mode prompts you to pick a config and scenarios at the terminal instead of specifying them as flags. Useful for ad-hoc runs during development.

interactive

npx @inspectr/mcplab run --interactive

Annotate and Organise Runs

Add a human-readable note to a run for easier identification in reports and the App. Change the output directory with --runs-dir.

annotated run

npx @inspectr/mcplab run -c eval.yaml --run-note "after refactor"

custom output dir

npx @inspectr/mcplab run -c eval.yaml --runs-dir ./my-runs

Batch Runs — Directory Mode

Pass a directory path to -c/--config and MCPLab will discover and run all .yaml and .yml files in that directory recursively. This is useful for running an entire suite of eval configs in one command.

Use --bail to stop the batch after the first config that has any failing scenario (fail-fast mode). Without --bail, all configs run regardless of individual failures.

run all configs in a directory

npx @inspectr/mcplab run -c ./evals/

stop on first failure

npx @inspectr/mcplab run -c ./evals/ --bail

Exit Codes

mcplab run exits 0 when all scenarios pass and non-zero when any scenario fails. Use this in CI to fail a pipeline on a regression.