The Google PageRank, named after Larry Page, one of the co-founders of Google, is an algorithm that ranks the webpages based on the hyperlink connections of the webpages on relevant topics, hence the Google search can return the list of the webpages in an appropriate order of the PageRank indices.
According to Google, the PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
The PageRank algorithm outputs a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page.
Suppose there are $n$ webpages on a specific topic, where some of them have hyperlinks to others. Then the PageRank of the webpage $i$, $p_i$, can be represented by
$$ p_i = \frac{1-\zeta}{n} + \sum_{j\in{N}_i} \zeta\frac{p_j}{d_j} $$where
The above relation can be compactly written by the following matrix form,
$$ p = \frac{1-\zeta}{n} {\bf 1} + \zeta M p $$where $p = \bmat{p_1 & p_2 & \cdots & p_n}^T$, and ${\bf 1}\in\R^{n}$ is the one-vector whose elements are all 1's.
(Problem 1) What is $M_{ij}$, for $i,j \in \{1,\dots,n\}$?
$$ M_{ij} = \begin{cases} ? &\quad \text{if $j\in{N}_i$ ($j$ has outbound link to $i$)} \\ ? &\quad \text{otherwise} \end{cases} $$
# Your answer here
Once you obtained $M\in\R^{n\times n}$, you can solve the above linear equations for $p$. In other words, you can obtain the PageRank for all the $n$ webpages by
$$ p = \frac{1-\zeta}{n}\left(I - \zeta M \right)^{-1} {\bf 1} $$However in most cases where the number of webpages that contain somethings you are interested in is extremely large, computing the inverse appearing above is practically impossible. For example googling "BTS" returns approximately billions of documents; you won't be able to compute the inverse of billions times billions matrix.
Instead of this direct method, you can also solve for $p$ by using the following iterative method.
$$ p^{k+1} = \frac{1-\zeta}{n} {\bf 1} + \zeta M p^{k} $$where $p^k$ stands for the page rank $p$ at the $k$-th iteration step. From a random initial condition $p^0$, you can keep updating $p^k$ until the update amount is sufficiently small. It turns out that this update rule converges for $0<\zeta<1$.
Now a toy example. Suppose we have the small space of 11 webpages whose list of outbound link connections are given by
(Problem 2) Build $M$. In other words, define a $11\times 11$ matrix $M$ and assign the explicit numeric value for each $M_{ij}$.
# your code here
[[1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] [1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0.] [1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] [1. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1.] [1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] [[0.091 0. 0. 0.5 0. 0. 0. 0. 0. 0. 0. ] [0.091 0. 1. 0.5 0.333 0.5 0.5 0.5 0.5 0. 0. ] [0.091 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. ] [0.091 0. 0. 0. 0.333 0. 0. 0. 0. 0. 0. ] [0.091 0. 0. 0. 0. 0.5 0.5 0.5 0.5 1. 1. ] [0.091 0. 0. 0. 0.333 0. 0. 0. 0. 0. 0. ] [0.091 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ] [0.091 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ] [0.091 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ] [0.091 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ] [0.091 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]]
(Problem 3) Randomly generate initial $p^0>0$ such that $\|p^0\|_1=1$, and use the update rule
$$ p^{k+1} = \frac{1-\zeta}{n} {\bf 1} + \zeta M p^{k} $$$$\|p^{K+1}-p^{K}\|_2<10^{-6}$$with $\zeta=0.85$ until the update amount is sufficiently small. For example you could keep iterating until $k=K$ when the following convergence is achieved.
a) Which webpage has the highest PageRank?
b) Plot the history of $p_1^k, \dots, p_n^k$, for $k=0,\dots,K$.
# your code here
convergence in 79 iterations [0.033 0.384 0.343 0.039 0.081 0.039 0.016 0.016 0.016 0.016 0.016]
By the way, the solution, $p_1,\dots,p_n$, turns out to be an eigenvector of some matrix.
From the definition, the column sum of $M$ is equal to 1 for all columns,
$$ {\bf 1}^TM={\bf 1}^T $$and recall that the PageRank solution $p_1,\dots,p_n$ satisfies the following.
$$ p = \frac{1-\zeta}{n} {\bf 1} + \zeta M p $$Now left-multiplying ${\bf{1}}^T$ to the above gives
$$ \begin{aligned} {\bf{1}}^Tp &= \frac{1-\zeta}{n} {\bf{1}}^T{\bf 1} + \zeta {\bf{1}}^TM p \\ &= (1-\zeta) + \zeta {\bf{1}}^Tp \end{aligned} $$which implies $ {\bf{1}}^Tp=1$, therefore the PageRank values sum to 1.
Now consider
$$ G = \frac{1-\zeta}{n} {\bf 1} {\bf 1}^T + \zeta M $$and right-multiplying $p$ to the above gives
$$ \begin{align*} Gp &= \frac{1-\zeta}{n} {\bf 1} {\bf 1}^Tp + \zeta Mp \\ &= \frac{1-\zeta}{n} {\bf 1} + \zeta Mp \\ &= p \end{align*} $$This implies that the PageRank $p$ is the eigenvector of $G$ with the associated eigenvalue 1. The matrix $G$ is s also known as "the Google matrix".
(Problem 4) Compare the solution you obtained in (Problem 3) with the eigenvalue of the Google matrix. Check if they coincide.
#your code here