We use similarity distances.

Let d(i,j) be the length of the shortest path between i and j.

Similarity DistancesFor the maximum linkage, at each step, the two clusters separated by the shortest distance are combined.

The similarity distances can be illustrated as follows :LinkageBack to our Karate example.

Then, before applying hierarchical clustering, we need to define the matrix of distances between each node.

pcc_longueurs=list(nx.

all_pairs_shortest_path_length(G_karate))distances=np.

zeros((n,n))# distances[i, j] is the length of the shortest path between i and jfor i in range(n): for j in range(n): distances[i, j] = pcc_longueurs[i][1][j]Now, we’ll use the AgglomerativeClustering function of sklearnto identify hierarchical clustering.

from sklearn.

cluster import AgglomerativeClusteringclustering = AgglomerativeClustering(n_clusters=2,linkage='average',affinity='precomputed').

fit_predict(distances)And finally, draw the resulting graph with colors depending on the cluster :nx.

draw(G_karate, node_color = clustering)Hierarchical Clustering7.

Clustering CoefficientThe clustering coefficient measures how well two nodes tend to cluster together.

A local clustering coefficient measures how close a node i and its neighbors are to being a complete graph.

Clustering CoefficientThe local clustering coefficient is a ratio of the number of triangles centered at node i over the number of triples centered at node i.

I have tried to illustrate this computation of clustering coefficients for the following graph :Clustering CoefficientA global coefficient measures the density of triangles (local clusters) in the graph :Global Clustering CoefficientIn the graph shown above, the clustering coefficient is equal to :For Erdos-Rényi random graphs, E[CC]=E[Ci]=p where p the probability defined in the previous article.

For Baràbasi-Albert random graphs, the global clustering coefficient follows a power law depending on the number of nodes.

The average clustering coefficient of nodes with degree k is proportional to the inverse of k:Nodes with a low degree are connected to other nodes in their community.

Nodes with high degrees are linked to nodes in different communities.

For a given graph, in networkx, the clustering coefficient can be easily computed.

First, let’s begin with the local clustering coefficients :# List of local clustering coefficientslist(nx.

clustering(G_barabasi).

values())This should return something quite similar to :0.

13636363636363635,0.

2,0.

07602339181286549,0.

04843304843304843,0.

09,0.

055384615384615386,0.

07017543859649122,.

And average the results to find the global clustering coefficient of the graph :# Global clustering coefficientnp.

mean(list(nx.

clustering(G_barabasi).

values()))Which heads :0.

0965577637155059III.

Centrality algorithmsIt is hard to give a universal measure of centrality.

Centrality measures express by how important a node is when we want to identify important web pages, bottlenecks in transportation networks…A walk is a path which can go through the same node several times.

Centrality measures vary with the type of walk considered and the way of counting them.

1.

PageRank AlgorithmPageRank estimates a current node’s importance from its linked neighbors and then again from their respective neighbors.

Although popularized by Google, it’s widely recognized as a way of detecting influential nodes in any network.

It is for example used to suggest connections on social networks.

PageRank is computed by either iteratively distributing one node’s rank (originally based on the degree) over its neighbors or by randomly traversing the graph and counting the frequency of hitting each node during these walks.

Neo4J summary of the Page Rank AlgorithmPageRank is usually computed on directed graphs.

However, it will also execute on undirected graphs by converting each edge in the directed graph to two edges.

For example, the PageRank of the Karate graph can be accessed by :nx.

pagerank(G_karate, alpha=0.

9)Where alpha is the damping parameter (by default 0.

85).

It should give you a list of rankings in return :{0: 0.

09923208031303203, 1: 0.

0543403155825792, 2: 0.

05919704684187155, 3: 0.

036612460562853694,.

2.

Degree CentralityDegree Centrality measures the number of incoming and outgoing relationships from a node based on the number of walks of length 1 ending at node i.

It is therefore given by C(Xi)=di.

Degree Centrality is used to identify the most influential persons on a social network for example.

c_degree = nx.

degree_centrality(G_karate)c_degree = list(c_degree.

values())3.

Eigenvector CentralityEigenvector Centrality represents the number of walks of infinite length ending at node i.

This gives more importance to nodes with well-connected neighbors.

Eigenvector Centralityc_eigenvector = nx.

eigenvector_centrality(G_karate)c_eigenvector = list(c_eigenvector.

values())4.

Closeness CentralityCloseness Centrality is a way of detecting nodes that are able to spread information efficiently through a graph.

It is used to research organizational networks where individuals with high closeness centrality are in a favorable position to control and acquire vital information and resources with the organization, for example, networks of fake news accounts or terrorist cells.

Closeness Centrality is inversely proportional to the sum of lengths of the shortest paths to other nodes.

c_closeness = nx.

closeness_centrality(G_karate)c_closeness = list(c_closeness.

values())5.

Betweenness CentralityBetweenness Centrality is a way of detecting the amount of influence a node has over the flow of information in a graph.

It is often used to find nodes that serve as a bridge from one part of a graph to another, for example in the package delivery processor in a telecommunication network, or in the propagation of fake news.

Where :σjk the number of shortest paths between j and kσjk(i) the number of shortest paths between j and k going through iThe betweenness centrality measures the number of times a node acts as a bridge between two nodes.

For example :Centrality Measurec_betweenness = nx.

betweenness_centrality(G_karate)c_betweenness = list(c_betweenness.

values())In Python, the implementation relies on the built-in functions of networkx :# Plot the centrality of the nodesplt.

figure(figsize=(18, 12))# Degree Centralityf, axarr = plt.

subplots(2, 2, num=1)plt.

sca(axarr[0,0])nx.

draw(G_karate, cmap = plt.

get_cmap('inferno'), node_color = c_degree, node_size=300, pos=pos, with_labels=True)axarr[0,0].

set_title('Degree Centrality', size=16)# Eigenvalue Centralityplt.

sca(axarr[0,1])nx.

draw(G_karate, cmap = plt.

get_cmap('inferno'), node_color = c_eigenvector, node_size=300, pos=pos, with_labels=True)axarr[0,1].

set_title('Eigenvalue Centrality', size=16)# Proximity Centralityplt.

sca(axarr[1,0])nx.

draw(G_karate, cmap = plt.

get_cmap('inferno'), node_color = c_closeness, node_size=300, pos=pos, with_labels=True)axarr[1,0].

set_title('Proximity Centrality', size=16)# Betweenness Centralityplt.

sca(axarr[1,1])nx.

draw(G_karate, cmap = plt.

get_cmap('inferno'), node_color = c_betweenness, node_size=300, pos=pos, with_labels=True)axarr[1,1].

set_title('Betweenness Centrality', size=16)The different centrality measuresWe observe that the different nodes highlighted by the centrality measures are quite distinct.

Betweenness centrality, for example, produces results far from the other methods, since they don’t measure the same thing.

IV.

ConclusionWe have now covered the introduction to graphs, the main types of graphs, the different graph algorithms and their implementation in Python with Networkx.

In the next article, we’ll cover graph learning which provides ways to predict nodes and edges in a graph to handle missing values or predict new relations.

Feel free to comment if you have any question or remark.

Stay tuned, the last article of this series is coming out next week :)Sources :A Comprehensive Guide to Graph Algorithms in Neo4j, Mark Needham & Amy E.

HodlerNetworkx documentation, https://networkx.

github.

io/documentation/stable/If you’d like to read more from me, my previous articles can be found here:Introduction to Graphs (Part 1)Main concepts, properties, and applications in Pythontowardsdatascience.

comMarkov Chains and HMMsIn this article, we’ll focus on Markov Models, where an when they should be used, and Hidden Markov Models.

This…towardsdatascience.

comA guide to Face Detection in PythonIn this tutorial, we’ll see how to create and launch a face detection algorithm in Python using OpenCV and Dlib.

We’ll…towardsdatascience.

comBoosting and AdaBoost clearly explainedA Visual Explanationtowardsdatascience.

com.. More details