5

Suppose we have a graph $G=(V,E)$ with $n$ non-overlapping subgraphs, the clusters $C_1, C_2, \dots, C_n$ which covers the graph $C_1 \cup \dots \cup C_n = G$.

I'm looking for a good metric to measure the quality of these clusters.

Let's call it $m(C)$. I would like $m(C)$ to be

  • high when most of the neighbours of the nodes within $C$ are also in $C$
  • the highest when $C$ is a complete graph without links outside of $C$
  • low when most of the neighbours of the nodes within $C$ are in other clusters
  • the lowest when $C$ has no edge and each node in $C$ is linked to all nodes in $G$ and outside of $C$
  • and null when there is exactly the same number of neighbours inside and outside $C$.

(All this taking into account that two nodes neighbours in $C$ shouldn't be count twice.)

Does such a metric exist? What is its name? Thanks.

explanation

1 Answers1

2

The measure of modularity proposed by M. Girvan and M.E.J. Newman[1]. is defined as: \begin{eqnarray} Q & = & (\mbox{fraction of edges within communities}) \\ & & - (\mbox{expected fraction of such edges}) \\ & = & \frac{1}{2m} \sum_{i,j} \left[ A_{ij} - \frac{k_i k_j}{2m} \ \right] \delta(c_i, c_j) \end{eqnarray} Where $m=\frac{1}{2} \sum_{ij} A_{ij}$, $k_i = \sum_j A_{ij}$ and $A_{ij}$ is the weight between the edge connecting $i$ and $j$.
$\delta(c_i, c_j)$ is the Kroneker Delta which is 1 only if the community assignment $c_i$ is equal to the community assignment $c_j$.

This measure is positive when the fraction of edges inside communities is higher than the expected amount of edges inside communities if edges were distributed at random. It satisfies your first 4 points.

[1] M. Girvan and M.E.J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002.

Arnfred
  • 121