The power of network science, the beauty of network visualization. Network Science, a textbook for network science, is freely available under the Creative Commons license. http://networksciencebook.com/
随机网络模型
Definition: A random graph is a graph of N nodes where each pair of nodes is connected by probability p.
N lableled nodes are connected with L randomly placed links. Erdös-Rényi(1959)
Each pair of N labeled nodes is connected with probability p. Gilbert (1959)
If the number exceeds $p$, connect the selected node pair with a link, otherwise leave them disconnected.
The probability that a random network has exactly $L$ links and $N$ nodes:
$$p_L = C_{\frac{N(N-1)}{2}}^L p^L (1-p)^{\frac{N(N-1)}{2} -L}$$二项分布即重复n次独立的*伯努利试验
If we toss a fair coin N times, tails and heads occur with the same probability p = 1/2. The binomial distribution provides the probability $p_x$ that we obtain exactly x heads in a sequence of N throws. In general, the binomial distribution describes the number of successes in N independent experiments with two possible outcomes, in which the probability of one outcome is p, and of the other is 1-p.
$$ p_x = \sum_{x}^N p^x (1-p)^{N-x}$$The mean of the distribution (first moment) is
$$<x> = \sum_{x = 0}^N xp_x = Np$$$P_L$is a binomial distribution, the expected number of links in a random graph is
$$ <L> = \sum_{L = 0}^{\frac{N(N-1)}{2}} L p_L = \frac{N(N-1)}{2} p$$$$L_{max} = \frac{N(N-1)}{2}$$The average degree of a random network is :
$$<k> = \frac{2<L>}{N} = (N-1)p$$$$k_{max} = N-1$$In a given realization of a random network some nodes gain numerous links, while others acquire only a few or no links. These differences are captured by the degree distribution, $p_k$.
$p_k$ is the probability that a randomly chosen node has degree $k$.
In a random network the probability that node i has exactly $k$ links is the product of three terms:
$k_{max} = N-1$, 节点i的边的最大数量是N-1
Consequently the degree distribution of a random network is:
$p_k = C_{N-1}^k p^k (1-p)^{N-1-k}$
which follows the binomial distribution. The shape of this distribution depends on the system size $N$ and the probability $p$.
$p_k = e^{-<k>} \frac{<k>^k}{k!}$
大于平均度的节点数量随着度的增加指数下降
The evolution of random networks The relative size of the giant component in function of the average degree $<k>$ in the Erdős-Rényi model. The figure illustrates the phase tranisition at $<k> = 1$, responsible for the emergence of a giant component with nonzero $N_G$. REAL NETWORKS ARE SUPERCRITICAL
also known as six degrees of separation, has long fascinated the general public. It states that if you choose any two individuals anywhere on Earth, you will find a path of at most six acquaintances between them.
Consider a random network with average degree $<k>$.
A node i in this network has on average:
从For循环的角度理解
for $<k> ≈ 1,000$, which is the estimated number of acquaintences an individual has, we expect $10^6$ individuals at distance two and about a billion, i.e. almost the whole earth’s population, at distance three from us.
从一个开始的节点出发走d步,共有节点数量:
$N(d) = 1 + <k> + <k>^2 + <k>^3 + ...+ <k>^d = \frac{<k>^{d+1}-1}{<k>-1}$
The expected number of nodes up to distance d from our starting node is $N(d)$
$N(d)$比N小,当$d = d_{max}$的时候: $N(d_{max}) \sim N$
我们知道: $N(d) = 1 + <k> + <k>^2 + <k>^3 + ...+ <k>^d = \frac{<k>^{d+1}-1}{<k>-1}$
若$<k>$ 远大于1,那么: $N(d_{max}) = <k>^{d_{max}} \sim N $,
因此,$d_{max} = \frac{ln N}{ln <k>}$ The diameter depends logarithmically on the system size.
The diameter depends logarithmically on the system size.
The $\frac{1}{ln <k>}$ term implies that the denser the network, the smaller is the distance between the nodes.
Let us illustrate the implications of (3.19) for social networks. Using $N ≈ 7 ×10^9$ and $<k> ≈ 10^3$, we obtain
$$d_{max} = \frac{ln 7 ×10^9}{ln 10^3} = 3.28$$I. de Sola Pool and M. Kochen. Contacts and Influence. Social Networks, 1: 5-51, 1978.
The first empirical study of the small world phenomena took place in 1967
He found that the median number of intermediates was 5.2,
Using Facebook’s social graph of May 2011, consisting of 721 million active users and 68 billion symmetric friendship links, researchers found an average distance 4.74 between the users.
L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four degrees of separation. In ACM Web Science 2012: Conference Proceedings, pages 45−54. ACM Press, 2012.
In lattices the path lengths are significantly longer than in a random network.
Local clustering coefficient
We need to estimate the expected number of links $L_i$ between the node’s $k_i$ neighbors.
In a random network the probability that two of i’s neighbors link to each other is $p$.
As there are $\frac{k_i(k_i - 1)}{2}$ possible links between the $k_i$ neighbors of node i, the expected value of $L_i$ is
$$<L> = p\frac{k_i(k_i - 1)}{2}$$Numerical simulations indicate that for a range of rewiring parameters the model's average path length is low but the clustering coefficient is high.
D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393: 409–10, 1998.
The dependence of the average path length $d(p)$ and clustering coefficient $<C(p)>$ on the rewiring parameter $p$.
Note that $d(p)$ and $<C(p)>$ have been normalized by $d(0)$ and $<C(0)>$ obtained for a regular lattice.
The rapid drop in $d(p)$ signals the onset of the small-world phenomenon.
D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393: 409–10, 1998.
REAL NETWORKS ARE NOT POISSON
度分布的n阶矩被定义为:
$ <k^n> = \sum_{k_{min}}^{k_{max}}k^np_k = \int_{k_{min}}^{k_{max}}k^np_kdk $ (1)
低阶矩具有明确的统计意义:
对于无标度网络而言,满足幂律分布:
$p(k) = Ck^{-\gamma}$ (2)
由公式(1)和(2)可以得到:
$ <k^n> = \int_{k_{min}}^{k_{max}}k^np_kdk = C \frac{k_{max}^{n- \gamma +1} - k_{min}^{n - \gamma +1}}{n - \gamma + 1} $ (3)
可以使用wolframalpha的积分计算器积分来进行简单验证,例如x^(n-r)dx从10到100积分 网页链接:http://www.wolframalpha.com/input/?i=integrate+x%5E%28n-r%29+dx+from+10+to+100
显然:
对于无标度网络而言,一般幂参数$2 < \gamma < 3$,所以:
对于n = 1的情况,即一阶矩平均度$<k^{}>$是有限的。
但对于n >= 2的情况,即$k^2$或$k^3$是无极限的。二阶和高阶矩无穷大是“无标度”的来源
在网络中随机抽取一个节点的度可以显著的不同于平均度$<k>$
上图最为直接的描绘出了这种特点,即与正态分布等相比,无标度网络下降的慢
。
对于任何指数类型的分布,如泊松分布或高斯分布,随机选取一个节点的度在平均度附近,因此平均度就是这些网络的尺度
。
对于一个幂律分布而言,因为二阶矩发散,在网络中随机抽取一个节点的度可以显著的不同于平均度$<k^{}>$, 因此平均度不再是网络的尺度,称之为无标度。
对于不同的网络,平均度与标准差的取值
"Mean-Field": many -> one
The complex interaction between a large number of components can be simplified into a single averaged effect of all the other individuals on any given one.
"Continuum": discrete -> Continuous
A discrete variables that is "smooth enough", i.e. increase by one unit in each period, such as time steps [1], spatial jumps [2-3], and population, can be viewed as a continuous geometric structure.
Any measure on this structure, like degree [1], number of unique locations visited [2], and distance [3], can be described by differential equations.
其中老节点分到的度数是m,新加入的那一个节点分到的度数为m. 那么到时间t的时候,网络的总节点数是$m_0 + t$,网络的总度数为$2mt$。
那么显然,加入的越早($t_i$越小)越容易获得更多的链接数。 从时间0开始,每一个时间步系统中的节点度$k_i$是不断增加的。
$k_i$在一个时间步获得一个度的概率表示为$\prod (k_i) $, 那么有:
$$\prod (k_i) = \frac{k_i}{\sum k_i} = \frac{k_i}{2mt}$$也就是说,在时间点t,节点i获得一个度的概率(能力)是节点i的度占网络总度数的比值。一个时间步,$k_i$随t的变化率可以表达为: $$\frac{\partial k_i}{\partial t} = \Delta k \prod (k_i) = m \frac{k_i}{2mt} = \frac{k_i}{2t}$$
$$\frac{\partial k_i}{k_i} = \frac{\partial t}{2t}$$$$\int \frac{1}{k_i} d k_i = \int \frac{1}{2t} dt$$https://www.wolframalpha.com/input/?i=integrate+1%2F%282x%29+dx+
积分结果
$$k_i = C (t) ^ {0.5} (公式1)$$此时,根据模型的初始条件,每个新加入节点获得的度是m:
$k_i(t_i) = m $ 代入公式(1)
可以得到$C =\frac{m}{(t_i)^{0.5}}$ 公式(2)
代入公式(1),得到:
$k_i = m (\frac{t}{t_i})^{0.5}$ 公式(3)
对于一个节点i,其加入网络的时间$t_i$是固定的,我们可以观察其度$k_i$随着时间的幂律关系。
当我们思考一个累积概率分布的时候,我们想要的是$k_i(t) < k$的概率:$P(k_i(t) < k) $
由公式$k_i = m (\frac{t}{t_i})^{0.5}$ 公式(3),可以知道:
$P(k_i(t) < k) = P( m (\frac{t}{t_i})^{0.5} < k ) = P( t_i > \frac{m^2 t}{k^2} ) = 1 - P(t_i \leqslant \frac{m^2 t}{k^2} )$(4)
在初始状态$ t = 0$, 有$m_0$个节点,那么$t_{m_0} = 0$
假设我们将节点加入的时间步是均匀的,那么$t_i$的概率是一个常数:
$P(t_i) = \frac{1}{m_0 + t}$ 公式(5)
根据均匀分布的性质,将公式(5)代入公式(4)得到:
$P(k_i(t) < k) = 1- \frac{m^2 t}{k^2} P(t_i) = 1 - \frac{m^2 t}{k^2 (m_0 + t)} $ 公式(6)
对累积概率函数求微分,就可以到达概率密度函数:
$P( k ) = \frac{\partial P(k_i(t) < k)}{\partial k} = \frac{2m^2 t}{m_0 + t} \frac{1}{k^3}$ 公式(7)
也就是说:$\gamma = 3$, 与m无关。