An inspiring article tells us how networks may exhibit heterogeneous mixing patterns that would not be unveiled by using an averaged, single score -- as the traditional Newman's assortativity coefficient suggested so far.
We wanted to contribute to the problem and we defined Conformity , a path-aware measure incorporating the idea that the $n$-neighborhoods of nodes have different impact on their homophilic embeddedness.
In this note, I want to let you play with Conformity. Let's start!
In the paper, we started to define a few support functions.
Considering a node $u\in V$ we define the set $N_{u,d}$ as the set of $u$ neighboring nodes at a distance $d$:
The figure below shows a graph of $13$ nodes whose edges are placed accordingly to our purpose to emphasize the role of different neighbors at different distances. An attribute color specifies whether a node is $red$ or $blue$.
Our scope is to assign a conformity score to each node w/r/t to such an attribute. Intuitively, $A$ seems more homophilic than $O$ because it is embedded within a context of neighbors of the same color, while $O$ is surrounded by nodes with different labels, and the contribution of a similar node appears only in its third neighborhood level.
Let's calculate the conformity of a node that is both influenced by nodes with similar and different labels, for instance $B$. Let's consider a linear decrease, thus $\alpha=1$.
Let's start to identify the set of neighbors of $B$ at distance $1$: $N_{B,1} = \{A, C, E, I\}$
For each node in $N_{B,1}$ we need to indicate whether its label is equal to the label of $B$ -- our target node -- then we measure the fraction of similar node in the first neighborhood of each neighbor in $N_{B,1}$. For instance: $I(B,A) = 1$ and $f_{A, l_A} = 1$: $A$ has the same color of $B$ and the whole neighborhood of $A$ has the same color of $A$. For $C$ and $E$, the indicator and the similar neighbors ratio functions are the same. $I(B,I)$ is still 1 but $f_{I, l_I} = 3/4$.
The figure below indicates the value of $f_{v,l_v}$ for each $v\in N_{B,1}$:
Moreover, $|N_{B,1}| = 4$ and $d^\alpha = 1^1$. It follows that this component is equal to $ \frac{3*1 + 0.75}{4} = 0.9375$.
The set of neighbors of $B$ at distance $2$ is:
$N_{B,2} = \{D, F, H, O\}$
The figure below indicates the value of $f_{v,l_v}$ for each $v\in N_{B,2}$:
Moreover, $|N_{B,2}| = 4$ and $d^\alpha = 2^1$. It follows that this component is equal to $\frac{1 + 0.75*2 - 1}{8} = 0.1875$
The set of neighbors of $B$ at distances $3$ and $4$ are $N_{B,3} = \{G, L, N\}$ and $N_{B,4} = \{M\}$
The figure below indicates the value of $f_{v,l_v}$ for each $v\in N_{B,3}$ (left) and $v\in N_{B,4}$ (right) :
Moreover, $|N_{B,3}| = 3$ and $d^\alpha = 3^1$ (left) and $|N_{B,4}| = 1$ and $d^\alpha = 4^1$ (right)
These components are equal to $\frac{-1*2 + 0.75}{9} = -0.138$ (left) and $\frac{-1}{4} = -0.25$ (right)
The whole numerator is equal to $0.9375 + 0.1875 - 0.138 - 0.25 = 0.737 $
Normalizing (i.e., $\sum_{d\in D} d^{-\alpha} = 2.08$) we obtain $\psi(B, 1) = 0.35$
Let's pay more attention to each component in $\sum_{d\in D} \frac{\sum_{v \in N_{u,d}} I_{u,v} f_{v,l_v}}{|N_{u,d}| d^\alpha}$:
The unique neighbor of $B$ in the fourth level neighborhood contribute with $-0.25$ and not $-1$ if no decrease would be imposed (i.e., $\alpha=0$, thus $\frac{-1}{|N_{B,4}|d^0}=-1$).
It follows that any $\alpha>1$ would impose an exponential decrease of the contribution of neighbors at higher distances (e.g., from $-0.25$ to $-0.0626$ for $d=4$ and $\alpha=2$).For clarity, $\psi(B, 0) = -0.02$ and $\psi(B, 2) = 0.64$.
It follows that the role of $\alpha$ is to account for the distance that separate the nodes considered by the source.