Learn Scoring Functions and Triple Plausibility

Swipe to show menu

When working with knowledge graph embeddings, you often need a way to determine how likely a given triple — composed of a head entity, a relation, and a tail entity — is to be true within the graph. Scoring functions are mathematical formulas that take vector representations (embeddings) of the head, relation, and tail, and output a numerical score reflecting the plausibility of the triple.

A typical scoring function is written as:

f(h, r, t) = -\|h + r - t\|

where $h$, $r$, and $t$ are the embedding vectors for the head, relation, and tail, and $|\cdot|$ denotes a vector norm (such as L1 or L2). A higher score typically means the triple is more likely to be valid, while a lower score suggests it is less plausible. These functions are central to training and evaluating embedding models, as they guide the model to assign higher scores to true triples and lower scores to false ones.


              12345678910111213141516
            
import numpy as np

def tr_score_L2(head_emb, rel_emb, tail_emb):
    """
    TransE-style L2 scoring function for a triple (h, r, t).
    Lower scores indicate higher plausibility.
    """
    return -np.linalg.norm(head_emb + rel_emb - tail_emb, ord=2)

# Example usage:
head = np.array([0.3, 0.7, 0.5])
rel = np.array([0.2, -0.1, 0.4])
tail = np.array([0.5, 0.6, 0.9])

score = tr_score_L2(head, rel, tail)
print("Triple plausibility score (L2):", score)

Study More

Besides the L2 (Euclidean) distance, you can use other distance metrics as scoring functions, such as the L1 (Manhattan) distance or even more complex similarity measures. The choice of metric can significantly influence how the model learns and which patterns it captures in the data.

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 2