Evaluating Predictions: MRR and Hits@k
When you evaluate a link prediction model for knowledge graphs, you want to measure how well the model ranks the correct triples among all possible candidates. Two widely used ranking metrics are Mean Reciprocal Rank (MRR) and Hits@k.
Mean Reciprocal Rank (MRR) measures the average of the reciprocal ranks of the correct answer. For each test query, you rank all possible candidate entities by their predicted scores. The reciprocal rank is rank1, where rank is the position of the correct entity in the sorted list (with rank 1 being the highest score). You then average the reciprocal ranks over all queries:
MRR=N1i=1∑Nranki1where N is the number of queries and ranki is the rank of the correct entity for the i-th query.
Hits@k measures the proportion of test queries for which the correct entity appears in the top $$kv predictions:
Hits@k=N1i=1∑NI(ranki≤k)where I is the indicator function.
Walk through a step-by-step calculation. Suppose you have a test set with three queries, and for each query, your model assigns scores to candidate entities. The correct entity's rank for each query is as follows:
- Query 1: Correct entity is ranked 2;
- Query 2: Correct entity is ranked 1;
- Query 3: Correct entity is ranked 4.
First, calculate the reciprocal ranks:
- Query 1: 21=0.5;
- Query 2: 11=1.0;
- Query 3: 41=0.25.
To get the MRR, take the mean:
MRR=30.5+1.0+0.25=0.583For Hits@1, count how many times the correct entity is ranked first: only once (Query 2), so
Hits@1=31≈0.333For Hits@3, count how many times the correct entity is ranked in the top 3: Query 1 and Query 2 both qualify, so
Hits@3=32≈0.6671234567891011121314151617181920212223242526272829303132import numpy as np # Example: predicted scores for 3 queries, each with 4 candidate entities # Rows: queries, Columns: candidate entities scores = np.array([ [0.2, 0.9, 0.3, 0.5], # Query 1 [0.8, 0.1, 0.4, 0.7], # Query 2 [0.6, 0.2, 0.9, 0.1] # Query 3 ]) # The index of the correct entity for each query correct_indices = np.array([1, 0, 2]) def compute_mrr_and_hitsk(scores, correct_indices, k=3): mrr = 0 hits_k = 0 num_queries = scores.shape[0] for i in range(num_queries): # Sort scores descending, get ranking ranking = np.argsort(-scores[i]) # Find rank (1-based) of the correct entity rank = np.where(ranking == correct_indices[i])[0][0] + 1 mrr += 1.0 / rank if rank <= k: hits_k += 1 mrr /= num_queries hits_k /= num_queries return mrr, hits_k mrr, hits3 = compute_mrr_and_hitsk(scores, correct_indices, k=3) print("MRR:", mrr) print("Hits@3:", hits3)
Other evaluation metrics for knowledge graphs include Mean Rank (MR), Area Under the Curve (AUC), and precision/recall at various cutoffs. These metrics can provide additional perspectives on model performance, especially in different application scenarios.
Bedankt voor je feedback!
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.
Can you explain why the MRR and Hits@3 are both 1.0 in this example?
How would the results change if the correct entity was ranked lower for one of the queries?
Can you show how to compute Hits@1 using this code?
Geweldig!
Completion tarief verbeterd naar 7.69
Evaluating Predictions: MRR and Hits@k
Veeg om het menu te tonen
When you evaluate a link prediction model for knowledge graphs, you want to measure how well the model ranks the correct triples among all possible candidates. Two widely used ranking metrics are Mean Reciprocal Rank (MRR) and Hits@k.
Mean Reciprocal Rank (MRR) measures the average of the reciprocal ranks of the correct answer. For each test query, you rank all possible candidate entities by their predicted scores. The reciprocal rank is rank1, where rank is the position of the correct entity in the sorted list (with rank 1 being the highest score). You then average the reciprocal ranks over all queries:
MRR=N1i=1∑Nranki1where N is the number of queries and ranki is the rank of the correct entity for the i-th query.
Hits@k measures the proportion of test queries for which the correct entity appears in the top $$kv predictions:
Hits@k=N1i=1∑NI(ranki≤k)where I is the indicator function.
Walk through a step-by-step calculation. Suppose you have a test set with three queries, and for each query, your model assigns scores to candidate entities. The correct entity's rank for each query is as follows:
- Query 1: Correct entity is ranked 2;
- Query 2: Correct entity is ranked 1;
- Query 3: Correct entity is ranked 4.
First, calculate the reciprocal ranks:
- Query 1: 21=0.5;
- Query 2: 11=1.0;
- Query 3: 41=0.25.
To get the MRR, take the mean:
MRR=30.5+1.0+0.25=0.583For Hits@1, count how many times the correct entity is ranked first: only once (Query 2), so
Hits@1=31≈0.333For Hits@3, count how many times the correct entity is ranked in the top 3: Query 1 and Query 2 both qualify, so
Hits@3=32≈0.6671234567891011121314151617181920212223242526272829303132import numpy as np # Example: predicted scores for 3 queries, each with 4 candidate entities # Rows: queries, Columns: candidate entities scores = np.array([ [0.2, 0.9, 0.3, 0.5], # Query 1 [0.8, 0.1, 0.4, 0.7], # Query 2 [0.6, 0.2, 0.9, 0.1] # Query 3 ]) # The index of the correct entity for each query correct_indices = np.array([1, 0, 2]) def compute_mrr_and_hitsk(scores, correct_indices, k=3): mrr = 0 hits_k = 0 num_queries = scores.shape[0] for i in range(num_queries): # Sort scores descending, get ranking ranking = np.argsort(-scores[i]) # Find rank (1-based) of the correct entity rank = np.where(ranking == correct_indices[i])[0][0] + 1 mrr += 1.0 / rank if rank <= k: hits_k += 1 mrr /= num_queries hits_k /= num_queries return mrr, hits_k mrr, hits3 = compute_mrr_and_hitsk(scores, correct_indices, k=3) print("MRR:", mrr) print("Hits@3:", hits3)
Other evaluation metrics for knowledge graphs include Mean Rank (MR), Area Under the Curve (AUC), and precision/recall at various cutoffs. These metrics can provide additional perspectives on model performance, especially in different application scenarios.
Bedankt voor je feedback!