AgentEval Documentation

Welcome to the AgentEval documentation. AgentEval is the first .NET-native AI agent testing, evaluation, and benchmarking framework.

NuGet Version GitHub Stars


Quick Install

dotnet add package AgentEval --prerelease

NuGet: https://www.nuget.org/packages/AgentEval


Getting Started

Guide Description
Installation Install AgentEval and verify setup
Quick Start Run your first agent test in 5 minutes
Walkthrough Step-by-step tutorial with examples

Features

Tool Usage Assertions

Assert on tool calls, order, arguments, results, errors, and duration with a fluent API.

result.ToolUsage!
    .Should()
    .HaveCalledTool("SearchFlights")
        .BeforeTool("BookFlight")
        .WithArgument("destination", "Paris")
    .And()
    .HaveNoErrors();

Performance Metrics

Track latency, TTFT (Time To First Token), tokens, estimated cost, and per-tool timing.

result.Performance!
    .Should()
    .HaveTotalDurationUnder(TimeSpan.FromSeconds(10))
    .HaveTimeToFirstTokenUnder(TimeSpan.FromSeconds(2))
    .HaveEstimatedCostUnder(0.10m);

Multi-Turn Conversation Testing

Test complex multi-turn conversations with the ConversationalTestCase builder and ConversationRunner. See Conversations.

Workflow Testing

Test multi-agent orchestration with edge assertions, conditional routing, and Mermaid diagram export. See Workflow Testing.

Snapshot Testing

Compare agent responses against saved baselines with JSON diff, field ignoring, pattern scrubbing, and semantic similarity. See Snapshots.

RAG Metrics

Evaluate faithfulness, relevance, context precision/recall, and answer correctness.

Agentic Metrics

Measure tool selection accuracy, tool arguments, tool success, task completion, and efficiency.

Benchmarks

Run latency, throughput, cost, and agentic benchmarks with percentile statistics (p50/p90/p95/p99). See Benchmarks.

CLI Tool

Full command-line interface for CI/CD integration with multiple output formats (JSON, JUnit XML, Markdown) and dataset loaders (JSON, JSONL, CSV, YAML). See CLI Reference.


Guides

Guide Description
Architecture Component diagrams and metric hierarchy
Benchmarks BFCL, GAIA, ToolBench guides
CLI Reference Command-line tool usage
Conversations Multi-turn testing guide
Embedding Metrics Semantic similarity metrics
Extensibility Custom metrics, plugins, adapters
Snapshots Snapshot testing guide
Tracing & Record/Replay Deterministic testing with trace capture
Workflow Testing Multi-agent orchestration testing
Roadmap Future development plans

API Reference

API documentation is auto-generated from XML comments. Browse the API Reference section in the navigation menu for detailed type documentation.


Test Coverage

AgentEval has 1,000+ tests (3,000+ across 3 target frameworks) covering all major features.


Community


Contributing

Contributions are welcome! Please read: