Notes on Conformity

PUBLISHED ON MAR 08, 2021

An inspiring article tells us how networks may exhibit heterogeneous mixing patterns that would not be unveiled by using an averaged, single score -- as the traditional Newman's assortativity coefficient suggested so far.

We wanted to contribute to the problem and we defined Conformity , a path-aware measure incorporating the idea that the $n$-neighborhoods of nodes have different impact on their homophilic embeddedness.

In this note, I want to let you play with Conformity. Let's start!

In the paper, we started to define a few support functions.
Considering a node $u\in V$ we define the set $N_{u,d}$ as the set of $u$ neighboring nodes at a distance $d$:

$ N_{u,d} = \{v| dist(u,v)=d\}$

Moreover, $I(u,v)$ is the indicator function that compares the attribute values of two nodes $u,v \in V$:
$I_{u,v} = \left\{ \begin{array}{ll} 1 & \mbox{if } l_u=l_v \\ -1 & \mbox{otherwise} \end{array}\right.$

and $f_{u,l_u}$ the function that, if among the neighboring nodes of $u$ there is at least one node sharing the same attribute value $l_u$, computes the ratio of $u$'s neighbors sharing it:
$f_{u,l_u} = \frac{|\{v|v\in \Gamma(u) \land l_u=l_v\}|}{|\Gamma(u)|}$

where $\Gamma(u)$ is the first order neighborhood of node $u$, i.e., the set of nodes adjacent to it.
Moreover, to assure a consistent interpretation of Conformity, we force $f_{u,l_u}$ to assume values in $(0, 1]$ by setting its value to 1 when its numerator nullifies.
Finally, we define the Conformity score for a node $u\in V$ and a given real number $\alpha$ in $[0, +\infty)$ as:
$\psi(u,\alpha) = \frac{\sum_{d\in D} \frac{\sum_{v \in N_{u,d}} I_{u,v} f_{v,l_v}}{|N_{u,d}| d^\alpha}}{\sum_{d\in D} d^{-\alpha}}$

where $D$ is $max(\{dist(i,j)|i,j \in V\})$ and $\alpha$ is the parameter controlling the level of interaction between nodes at different degrees of separation.

The figure below shows a graph of $13$ nodes whose edges are placed accordingly to our purpose to emphasize the role of different neighbors at different distances. An attribute color specifies whether a node is $red$ or $blue$.

Our scope is to assign a conformity score to each node w/r/t to such an attribute. Intuitively, $A$ seems more homophilic than $O$ because it is embedded within a context of neighbors of the same color, while $O$ is surrounded by nodes with different labels, and the contribution of a similar node appears only in its third neighborhood level.



Let's calculate the conformity of a node that is both influenced by nodes with similar and different labels, for instance $B$. Let's consider a linear decrease, thus $\alpha=1$.


Let's start to identify the set of neighbors of $B$ at distance $1$: $N_{B,1} = \{A, C, E, I\}$

For each node in $N_{B,1}$ we need to indicate whether its label is equal to the label of $B$ -- our target node -- then we measure the fraction of similar node in the first neighborhood of each neighbor in $N_{B,1}$. For instance: $I(B,A) = 1$ and $f_{A, l_A} = 1$: $A$ has the same color of $B$ and the whole neighborhood of $A$ has the same color of $A$. For $C$ and $E$, the indicator and the similar neighbors ratio functions are the same. $I(B,I)$ is still 1 but $f_{I, l_I} = 3/4$.

The figure below indicates the value of $f_{v,l_v}$ for each $v\in N_{B,1}$:



Moreover, $|N_{B,1}| = 4$ and $d^\alpha = 1^1$. It follows that this component is equal to $ \frac{3*1 + 0.75}{4} = 0.9375$.

The set of neighbors of $B$ at distance $2$ is:

$N_{B,2} = \{D, F, H, O\}$

The figure below indicates the value of $f_{v,l_v}$ for each $v\in N_{B,2}$:



Moreover, $|N_{B,2}| = 4$ and $d^\alpha = 2^1$. It follows that this component is equal to $\frac{1 + 0.75*2 - 1}{8} = 0.1875$


The set of neighbors of $B$ at distances $3$ and $4$ are $N_{B,3} = \{G, L, N\}$ and $N_{B,4} = \{M\}$

The figure below indicates the value of $f_{v,l_v}$ for each $v\in N_{B,3}$ (left) and $v\in N_{B,4}$ (right) :


Moreover, $|N_{B,3}| = 3$ and $d^\alpha = 3^1$ (left) and $|N_{B,4}| = 1$ and $d^\alpha = 4^1$ (right)

These components are equal to $\frac{-1*2 + 0.75}{9} = -0.138$ (left) and $\frac{-1}{4} = -0.25$ (right)

The whole numerator is equal to $0.9375 + 0.1875 - 0.138 - 0.25 = 0.737 $

Normalizing (i.e., $\sum_{d\in D} d^{-\alpha} = 2.08$) we obtain $\psi(B, 1) = 0.35$


Let's pay more attention to each component in $\sum_{d\in D} \frac{\sum_{v \in N_{u,d}} I_{u,v} f_{v,l_v}}{|N_{u,d}| d^\alpha}$:

The unique neighbor of $B$ in the fourth level neighborhood contribute with $-0.25$ and not $-1$ if no decrease would be imposed (i.e., $\alpha=0$, thus $\frac{-1}{|N_{B,4}|d^0}=-1$).

It follows that any $\alpha>1$ would impose an exponential decrease of the contribution of neighbors at higher distances (e.g., from $-0.25$ to $-0.0626$ for $d=4$ and $\alpha=2$).

For clarity, $\psi(B, 0) = -0.02$ and $\psi(B, 2) = 0.64$.

It follows that the role of $\alpha$ is to account for the distance that separate the nodes considered by the source.