AgentEval Documentation

Welcome to the AgentEval documentation. AgentEval is the first .NET-native AI agent testing, evaluation, and benchmarking framework.

Quick Install

dotnet add package AgentEval --prerelease

NuGet: https://www.nuget.org/packages/AgentEval

Getting Started

Guide	Description
Installation	Install AgentEval and verify setup
Quick Start	Run your first agent test in 5 minutes
Walkthrough	Step-by-step tutorial with examples

Features

Tool Usage Assertions

Assert on tool calls, order, arguments, results, errors, and duration with a fluent API.

result.ToolUsage!
    .Should()
    .HaveCalledTool("SearchFlights")
        .BeforeTool("BookFlight")
        .WithArgument("destination", "Paris")
    .And()
    .HaveNoErrors();

Performance Metrics

Track latency, TTFT (Time To First Token), tokens, estimated cost, and per-tool timing.

result.Performance!
    .Should()
    .HaveTotalDurationUnder(TimeSpan.FromSeconds(10))
    .HaveTimeToFirstTokenUnder(TimeSpan.FromSeconds(2))
    .HaveEstimatedCostUnder(0.10m);

Multi-Turn Conversation Testing

Test complex multi-turn conversations with the ConversationalTestCase builder and ConversationRunner. See Conversations.

Workflow Testing

Test multi-agent orchestration with edge assertions, conditional routing, and Mermaid diagram export. See Workflow Testing.

Snapshot Testing

Compare agent responses against saved baselines with JSON diff, field ignoring, pattern scrubbing, and semantic similarity. See Snapshots.

RAG Metrics

Evaluate faithfulness, relevance, context precision/recall, and answer correctness.

Agentic Metrics

Measure tool selection accuracy, tool arguments, tool success, task completion, and efficiency.

Benchmarks

Run latency, throughput, cost, and agentic benchmarks with percentile statistics (p50/p90/p95/p99). See Benchmarks.

CLI Tool

Full command-line interface for CI/CD integration with multiple output formats (JSON, JUnit XML, Markdown) and dataset loaders (JSON, JSONL, CSV, YAML). See CLI Reference.

Guides

Guide	Description
Architecture	Component diagrams and metric hierarchy
Benchmarks	BFCL, GAIA, ToolBench guides
CLI Reference	Command-line tool usage
Conversations	Multi-turn testing guide
Embedding Metrics	Semantic similarity metrics
Extensibility	Custom metrics, plugins, adapters
Snapshots	Snapshot testing guide
Tracing & Record/Replay	Deterministic testing with trace capture
Workflow Testing	Multi-agent orchestration testing
Roadmap	Future development plans

API Reference

API documentation is auto-generated from XML comments. Browse the API Reference section in the navigation menu for detailed type documentation.

Test Coverage

AgentEval has 1,000+ tests (3,000+ across 3 target frameworks) covering all major features.

Community

GitHub: https://github.com/joslat/AgentEval
NuGet: https://www.nuget.org/packages/AgentEval
Issues: https://github.com/joslat/AgentEval/issues
Discussions: https://github.com/joslat/AgentEval/discussions

Contributing

Contributions are welcome! Please read: