Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Node Similarity and Clustering | Graph-Based Machine Learning Tasks
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Graph Theory for Machine Learning with Python

bookNode Similarity and Clustering

Node embeddings turn graph nodes into vectors, allowing you to use mathematical operations to compare nodes and discover patterns. Measuring node similarity with embeddings is typically done using a similarity metric such as cosine similarity. This approach captures how "close" or "related" two nodes are based on their feature representations, rather than just direct graph connections. When you have embeddings for all nodes, you can build a similarity matrix that quantifies the relationships throughout the graph.

Clustering is another powerful tool enabled by embeddings. By grouping nodes with similar embeddings, clustering methods like k-means can reveal communities or modules within the graph. These clusters might correspond to groups of users with similar interests in a social network, or proteins with related functions in a biological network. Unlike node similarity, which focuses on pairs of nodes, clustering considers the global structure and seeks to partition the graph into meaningful subsets.

1234567891011121314151617181920212223242526
import numpy as np from sklearn.cluster import KMeans # Example node embeddings (each row is a node embedding) embeddings = np.array([ [0.1, 0.2, 0.9], [0.2, 0.1, 0.8], [0.9, 0.8, 0.2], [0.8, 0.9, 0.1] ]) # Compute pairwise cosine similarity def cosine_similarity_matrix(X): norm = np.linalg.norm(X, axis=1, keepdims=True) X_normalized = X / norm return np.dot(X_normalized, X_normalized.T) similarity_matrix = cosine_similarity_matrix(embeddings) print("Pairwise Cosine Similarity Matrix:") print(similarity_matrix) # Perform k-means clustering (e.g., 2 clusters) kmeans = KMeans(n_clusters=2, random_state=42, n_init=10) labels = kmeans.fit_predict(embeddings) print("Cluster assignments for each node:") print(labels)
copy
Node similarity
expand arrow

Measures how alike two specific nodes are, based on their embeddings; it answers the question, "Are these two nodes similar?"

Node clustering
expand arrow

Groups all nodes into subsets (clusters) so that nodes in the same cluster are more similar to each other than to those in other clusters; it answers, "Which nodes form natural groups or communities?"

1. What is the goal of clustering nodes in a graph?

2. How does cosine similarity help in finding similar nodes?

question mark

What is the goal of clustering nodes in a graph?

Select the correct answer

question mark

How does cosine similarity help in finding similar nodes?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how cosine similarity works in this context?

What do the cluster assignments mean for these nodes?

How can I interpret the similarity matrix values?

bookNode Similarity and Clustering

Swipe to show menu

Node embeddings turn graph nodes into vectors, allowing you to use mathematical operations to compare nodes and discover patterns. Measuring node similarity with embeddings is typically done using a similarity metric such as cosine similarity. This approach captures how "close" or "related" two nodes are based on their feature representations, rather than just direct graph connections. When you have embeddings for all nodes, you can build a similarity matrix that quantifies the relationships throughout the graph.

Clustering is another powerful tool enabled by embeddings. By grouping nodes with similar embeddings, clustering methods like k-means can reveal communities or modules within the graph. These clusters might correspond to groups of users with similar interests in a social network, or proteins with related functions in a biological network. Unlike node similarity, which focuses on pairs of nodes, clustering considers the global structure and seeks to partition the graph into meaningful subsets.

1234567891011121314151617181920212223242526
import numpy as np from sklearn.cluster import KMeans # Example node embeddings (each row is a node embedding) embeddings = np.array([ [0.1, 0.2, 0.9], [0.2, 0.1, 0.8], [0.9, 0.8, 0.2], [0.8, 0.9, 0.1] ]) # Compute pairwise cosine similarity def cosine_similarity_matrix(X): norm = np.linalg.norm(X, axis=1, keepdims=True) X_normalized = X / norm return np.dot(X_normalized, X_normalized.T) similarity_matrix = cosine_similarity_matrix(embeddings) print("Pairwise Cosine Similarity Matrix:") print(similarity_matrix) # Perform k-means clustering (e.g., 2 clusters) kmeans = KMeans(n_clusters=2, random_state=42, n_init=10) labels = kmeans.fit_predict(embeddings) print("Cluster assignments for each node:") print(labels)
copy
Node similarity
expand arrow

Measures how alike two specific nodes are, based on their embeddings; it answers the question, "Are these two nodes similar?"

Node clustering
expand arrow

Groups all nodes into subsets (clusters) so that nodes in the same cluster are more similar to each other than to those in other clusters; it answers, "Which nodes form natural groups or communities?"

1. What is the goal of clustering nodes in a graph?

2. How does cosine similarity help in finding similar nodes?

question mark

What is the goal of clustering nodes in a graph?

Select the correct answer

question mark

How does cosine similarity help in finding similar nodes?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 3
some-alt