Ranking evaluation tells you how well a system orders results relative to what users actually find relevant. Use these metrics when evaluating search engines, recommendation feeds, or retrieval stages in RAG pipelines.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/itsubaki/reval/llms.txt
Use this file to discover all available pages before exploring further.
Relevance judgments
All ranking metrics inreval accept a map[string]int relevance map that associates each item ID with an integer relevance score:
0— not relevant1— relevant (binary) or minimally relevant (graded)2,3, … — increasingly relevant (graded judgments)
You only need to include items you have judgments for. Items in
predicted that are absent from the map are treated as relevance 0.Precision@K
Precision@K measures what fraction of the top-K retrieved items are relevant.Recall@K
Recall@K measures what fraction of all relevant items appear in the top-K results.NDCG@K
NDCG (Normalized Discounted Cumulative Gain) accounts for both relevance and position. Highly relevant items appearing lower in the ranking are penalized.Multi-query evaluation with MAP
For a system evaluated across multiple queries, use Mean Average Precision (MAP) to aggregate across all queries into a single number.QueryResult pairs a ranked list of retrieved IDs with the relevance map for that query. MAP averages the Average Precision score across all queries.
Choosing the right metric
When to use Precision@K
When to use Precision@K
Use when users look at only the top K results and you care about result quality more than coverage. Common for web search and featured recommendations.
When to use Recall@K
When to use Recall@K
Use when missing relevant items is costly — for example, legal document retrieval or medical record search where completeness matters.
When to use MAP
When to use MAP
Use when evaluating across many queries simultaneously with binary relevance labels. MAP is the standard offline benchmark metric for information retrieval research.
When to use NDCG
When to use NDCG
Use when relevance is graded (not just relevant/not-relevant) or when the ranking position of highly relevant items matters to your product. NDCG is preferred for e-commerce and recommendation systems.