Learn Evaluating Predictions: MRR and Hits@k

Swipe to show menu

When you evaluate a link prediction model for knowledge graphs, you want to measure how well the model ranks the correct triples among all possible candidates. Two widely used ranking metrics are Mean Reciprocal Rank (MRR) and Hits@k.

Mean Reciprocal Rank (MRR) measures the average of the reciprocal ranks of the correct answer. For each test query, you rank all possible candidate entities by their predicted scores. The reciprocal rank is $\frac{\raisebox{1pt}{$1$}}{\raisebox{-1pt}{$\text{rank}$}}$ , where $\text{rank}$ is the position of the correct entity in the sorted list (with rank 1 being the highest score). You then average the reciprocal ranks over all queries:

\mathrm{MRR} = \frac{1}{N} \sum_{i=1}^N \frac{1}{\mathrm{rank}_i}

where $N$ is the number of queries and $\mathrm{rank}_i$ is the rank of the correct entity for the $i$ -th query.

Hits@k measures the proportion of test queries for which the correct entity appears in the top $$kv predictions:

\mathrm{Hits@}k = \frac{1}{N} \sum_{i=1}^N \mathbb{I}(\mathrm{rank}_i \leq k)

where $\mathbb{I}$ is the indicator function.

Walk through a step-by-step calculation. Suppose you have a test set with three queries, and for each query, your model assigns scores to candidate entities. The correct entity's rank for each query is as follows:

Query 1: Correct entity is ranked $2$ ;
Query 2: Correct entity is ranked $1$ ;
Query 3: Correct entity is ranked $4$ .

First, calculate the reciprocal ranks:

Query 1: $\frac{\raisebox{1pt}{$1$}}{\raisebox{-1pt}{$2$}} = 0.5$ ;
Query 2: $\frac{\raisebox{1pt}{$1$}}{\raisebox{-1pt}{$1$}} = 1.0$ ;
Query 3: $\frac{\raisebox{1pt}{$1$}}{\raisebox{1pt}{$4$}} = 0.25$ .

To get the MRR, take the mean:

\mathrm{MRR} = \frac{0.5 + 1.0 + 0.25}{3} = 0.583

For Hits@1, count how many times the correct entity is ranked first: only once (Query 2), so

\mathrm{Hits@}1 = \frac{1}{3} \approx 0.333

For Hits@3, count how many times the correct entity is ranked in the top $3$ : Query 1 and Query 2 both qualify, so

\mathrm{Hits@}3 = \frac{2}{3} \approx 0.667


              1234567891011121314151617181920212223242526272829303132
            
import numpy as np

# Example: predicted scores for 3 queries, each with 4 candidate entities
# Rows: queries, Columns: candidate entities
scores = np.array([
    [0.2, 0.9, 0.3, 0.5],  # Query 1
    [0.8, 0.1, 0.4, 0.7],  # Query 2
    [0.6, 0.2, 0.9, 0.1]   # Query 3
])

# The index of the correct entity for each query
correct_indices = np.array([1, 0, 2])

def compute_mrr_and_hitsk(scores, correct_indices, k=3):
    mrr = 0
    hits_k = 0
    num_queries = scores.shape[0]
    for i in range(num_queries):
        # Sort scores descending, get ranking
        ranking = np.argsort(-scores[i])
        # Find rank (1-based) of the correct entity
        rank = np.where(ranking == correct_indices[i])[0][0] + 1
        mrr += 1.0 / rank
        if rank <= k:
            hits_k += 1
    mrr /= num_queries
    hits_k /= num_queries
    return mrr, hits_k

mrr, hits3 = compute_mrr_and_hitsk(scores, correct_indices, k=3)
print("MRR:", mrr)
print("Hits@3:", hits3)

Study More

Other evaluation metrics for knowledge graphs include Mean Rank (MR), Area Under the Curve (AUC), and precision/recall at various cutoffs. These metrics can provide additional perspectives on model performance, especially in different application scenarios.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 2