2

I want to find the expression for the PageRank of a webpage defined as in the original paper of Sergey and Larry (The Anatomy of a Large-Scale Hypertextual Web Search Engine).

Consider a directed graph of $n$ vertices with adjacency matrix $A$. Choose $\alpha\in[0,1]$ and obtain the matrix $A'$ as follows:

If row $i$ of $A$ is composed only by zeros, replace each entry of the row by the number $\frac{1}{n}$, where $n$ is the number of vertices in the graph. Otherwise, divide each entry of the row by $C_{T_{i}}$, the number of non zero entries of the row. Define

$$ P = (1-\alpha)A' + \frac{\alpha}{n}{1_{N}} $$

Where $1_{N}$ is the square matrix of dimension $N$ where every entry is $1$.

Now it is easy to prove that P is an stochastic matrix. Also, we can show that this matrix has a unique stationary distribution $\Pi_{0}$. Furthermore, we can set the following equation to find $\Pi_{0}$:

$$ \Pi_0 = \Pi_0 P $$

Then each entry $\pi_i$ of $\Pi_0$ satisfies

$$ \begin{equation}\label{eq1} \pi_i = \sum_{j=1}^{n}{\pi_j\left(\frac{1-\alpha}{C_{T_j}}+\frac{\alpha}{n}\right)} \end{equation} $$

Considering the previous expression, is it possible to get the following equation?

$$ PR(A) = \alpha + (1-\alpha)\sum_{j=1}^{n}{\frac{PR(W_j)}{C_{T_j}}} $$

According to the expression deduced by first principles we can get

$$ \begin{align}\label{eq2} \pi_i =& (1-\alpha)\sum_{j=1}^{n}{\frac{\pi_j}{C_{T_j}}+\frac{\pi_j\alpha}{n}} \\ =&\sum_{j=1}^{n}\frac{\pi_j\alpha}{n} + (1-\alpha)\sum_{j=1}^{n}{\frac{\pi_j}{C_{T_j}}} \\ =& \frac{\alpha}{n} + (1-\alpha)\sum_{j=1}^{n}{\frac{\pi_j}{C_{T_j}}} \end{align} $$

Is there any way to get rid of that $\frac{1}{n}$?

Thank you

1 Answers1

1

Is there any way to get rid of that $\frac{1}{n}$?

No; the formula you started with and the formula that you are citing are using different bias/personalization vectors (though I agree that the paper you are looking into is somewhat confusing about this).

To be more precise, let $\alpha\in[0,1]$ be the damping factor (this will be $1-\alpha$ in your notation), $S$ be an $n\times n$ column-stochastic matrix (possibly obtained from a network by the procedure you mention; in which case $n$ would be the number of webpages in the network), $B = (B_1,B_2,...,B_n)$ be an $n\times 1$ probability vector, and denote by $\underline{1}$ the $n\times 1$ vector $(1,1,..,1)$. Then the Google matrix with damping factor $\alpha$ and bias vector $B$ is by definition the stochastic matrix

$$ G(\alpha;B) = \alpha S + (1-\alpha) B \underline{1}^T. $$

The formula you started with uses as $B$ the uniform probability vector, which models the behavior of the random surfer teleporting to a random webpage with probability $(1-\alpha)$. The formula you cite instead uses as $B$ a Dirac probability, which models the behavior of the random surfer teleporting to the "homepage" with probability $(1-\alpha)$. In the formula you cite the homepage is declared to be the webpage marked by "$A$".

For further details on the bias/personalization vector it might be good to look into the follow-up paper (cited at page rank algorithm iterative conversion) as well as the patent (https://patents.google.com/patent/US6285999).

Alp Uzman
  • 12,209