Category Archives: networks

Community structure in social and biological networks– most networks have community structure

From the Sante Fe institute, a classic from 2002.

The largest component of the Santa Fe Institute collaboration network, with the primary divisions detected by the author’s algorithm indicated by different vertex shapes.

This work proposed identifying community structure by focusing on the edges that are most “between” communities. They identify such edges using the betweenness centrality measure, under the insight that if two communities are connected by a single edge, all shortest paths between the two communities go along that edge. Thus that edge will be in many of the “shortest paths” in the graph.

The betweenness for all m edges can be found in time O(mn) by Newman’s algorithm (Phys Rev E 64, 2001)

Open access!
Community structure in social and biological networks. Girvan M, Newman ME. Proc Natl Acad Sci U S A. 2002 Jun 11;99(12):7821-6.

A number of recent studies have focused on the statistical properties of networked systems such as social networks and the Worldwide Web. Researchers have concentrated particularly on a few properties that seem to be common to many networks: the small-world property, power-law degree distributions, and network transitivity. In this article, we highlight another property that is found in many networks, the property of community structure, in which network nodes are joined together in tightly knit groups, between which there are only looser connections. We propose a method for detecting such communities, built around the idea of using centrality indices to find community boundaries. We test our method on computer-generated and real-world graphs whose community structure is already known and find that the method detects this known structure with high sensitivity and reliability. We also apply the method to two networks whose community structure is not well known–a collaboration network and a food web–and find that it detects significant and informative community divisions in both cases.

Advertisements

a model of the internet based on the k-shell

Visualization of our data of the Internet at the AS level. (Upper) A plot of all nodes, ordered by their k-shell indices, using the program of ref. 13. The legend to the left denotes degree, and the legend to the right denotes k-shell index. (Lower) A schematic plot of the suggested Medusa model decomposition of the AS level Internet into three components.

The authors apply the k-shell to the internet aththe autonomous system level of 20,000 nodes (see DIMES project, www.netdimes.com). The analysis divides the net into three layers:

  1. a nucleus of ~100 tightly connected nodes (all nodes with k-shell 43, the max)
  2. A peer-connected component of ~15,000 nodes. These can connect without the nucleus, with a 42% increase in the number of hops needed.
  3. Isolated tendrils (~5,000 nodes).

This structure is not observed in all networks, specifically the actor network has disconnected k-cores and no tendrils.

Open access!

A model of Internet topology using k-shell decomposition.
Carmi S, Havlin S, Kirkpatrick S, Shavitt Y, Shir E.
Proc Natl Acad Sci U S A. 2007 Jul 3;104(27):11150-4.

We study a map of the Internet (at the autonomous systems level), by introducing and using the method of k-shell decomposition and the methods of percolation theory and fractal geometry, to find a model for the structure of the Internet. In particular, our analysis uses information on the connectivity of the network shells to separate, in a unique (no parameters) way, the Internet into three subcomponents: (i) a nucleus that is a small ( approximately 100 nodes), very well connected globally distributed subgraph; (ii) a fractal subcomponent that is able to connect the bulk of the Internet without congesting the nucleus, with self-similar properties and critical exponents predicted from percolation theory; and (iii) dendrite-like structures, usually isolated nodes that are connected to the rest of the network through the nucleus only. We show that our method of decomposition is robust and provides insight into the underlying structure of the Internet and its functional consequences. Our approach of decomposing the network is general and also useful when studying other complex networks.

See also Scale-free models for the structure of business firm networks., which claims

the sizes of the nucleus and the tendrils in scale-free networks decrease as the exponent of the power-law degree distribution lambda increases, and disappear for \lambda \geq 3

Stability criteria for complex ecosystems

Allesina and Tang do a more detailed analysis of May’s seminal work on stability matrices.

May’s approach defines a community matrix M of size SxS, where S is the number of species, and Mi,j is the effect of species j on species i. Entries Mi,j are N(0,\sigma^2) with prob C, zero otherwise. May showed that when the complexity  \sigma \sqrt{SC} >1 the probability of stability is near null. Thus rich (high S) or highly connected (high C) communities should be rare.

The current authors allow the community matrix to have structure.  Preditor-prey networks are structured so that Mi,j and Mj,i have opposite signs. A mixture of competition and mutulalism arises when Mi,j and Mj,i are constrained to have the same sign.

a, Plot of the eigenvalues of 10 matrices following the random, predator–prey or mixture prescriptions. The black ellipses are derived analytically. b, Numerical simulations for the corresponding stability profiles. The phase transition between stability and instability is accurately predicted by our derivation.

Stability arises when the eigen values have negative real parts.  Random matrices constrain their eigenvalues to a circle, pred-prey to a vertical ellipse, and comp/mut to a horizontal ellipse. More formally, the difference in stability is driven exclusively by the arrangement of the coefficients into pairs with  random, opposite, and same signs. Intermediate cases can be formed by linear combinations.

Imposing realistic food webs decreases stability.

What if Mi,j are not normally distributed? Imagine many weak interactions: preditor-prey networks become less stable, competition/mutulalism networks become more stable, and random are unchanged.

Very interesting work.

Stability criteria for complex ecosystems“, Stefano Allesina,Si Tang. Nature (2012)
doi:10.1038/nature10832

The million user fallacy (measuring Twitter influence)

Notes on Measuring User Influence in Twitter: The Million Follower Fallacy, Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, Krishna P. Gummadi, Proc. International AAAI Conference on Weblogs and Social Media (ICWSM), May 2010.

The million follower fallacy states that having lots of followers translates into having lots of influence.  This paper is one of several disproving that hypothesis. The term comes from an article posted by Avnit in 2009, but which seems to have been taken down.

Traditional communication theory claims that a minority of people, informed, respected, and well connected, shape the opinion of the society. These are the hubs, mavens, influence leaders, …

A more modern view says that the important factor is how open the society is. The Watts 2007 paper on this found that in a homogeneous degree network, anyone could start a long cascade. not having read the paper, my criticism could be off, but I wonder if this finding isn’t just a result of the network structure. If degree is homogeneous, then by definition you don’t have hubs/mavens.

The current work investigates three measures of influence: followers, retweets, and mentions. The most followed twitterers are public figures and news media. The most retweeted twitterers are content aggregation sites. The most mentioned users were celebrities. The three categories do not overlap well.

Influence follows a power law.

Influence runs across topics; a user who is influential in one topic is often influential in other topics as well. “Most influential users hold significant influence over a variety of topics.”

One can gain influence by focusing on a single topic, and posting creative and insightful tweets on that topic.

See also “What is Twitter, a social network or a news media”, by Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. Proceedings of the 19th International World Wide Web (WWW) Conference, April 26-30, 2010, Raleigh NC (USA), which notes:

[Twitter has] a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks~\cite{Newman03}… Ranking by retweets indicats a gap in influence inferred from the number of followers and that from the popularity of one’s tweets… [We]show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.

Assessing the relevance of node features for network structure (chance or necessity?)

The topology of a (real-world) network is a combination of chance and necessity. Community structure represents necessity, yet the evolution of the network is (often) stochastic. This work presents an indicator of how node characteristics affect network topology.

Their indicator measures the specificity of a network g for a particular assignment q by comparing to random assignment. Specificity is computed (roughly) as the total number of links between pairs of communities.

Structural preferential attachment: network organization beyond the link (balancing order and chaos)

Preferential attachment is ubiquitous. It predicts power-law distributions where the probability of an event of size k decreases as an inverse power of k: P_k \propto k^{-\gamma}.

Preferential attachment can create either order, i.e. distinct growing structures (here, Simon’s balls-and-bins, more generally, a Dirichlet process model) or chaos, i.e. random networks (here BA).

Preferential attachment can form order (balls in bins) or chaos (random network). The real world falls in between (community structure).

Girvan and Newman showed that most real world networks have community structure. (PNAS, 2002, non-paywall here). They mix order and chaos.

This work presents the Structural Preferential Attachment (SPA) algorithm for creating networks. The algorithm is as follows:

At every time step, a node joins a community. The node is either new with probability q or existing. Similarly, the community is either new with probability p or existing. Choice between existing nodes is with probability proportional to its membership number. Between existing communities is proportional to their size.

The model allows closed-form solutions for the membership, community size, and community degree distributions. Further, fitting the model to real-world data shows that it accurately re-creates their structure.

Builds network connections by growing communities shows that scale-free properties observed at the node level are inherited from preferential attachment at the level of the community.

Abstract:

We introduce a mechanism which models the emergence of the universal properties of complex networks, such as scale independence, modularity and self-similarity, and unifies them under a scale-free organization beyond the link. This brings a new perspective on network organization where communities, instead of links, are the fundamental building blocks of complex systems. We show how our simple model can reproduce social and information networks by predicting their community structure and more importantly, how their nodes or communities are interconnected, often in a self-similar manner.

Effect of coagulation of nodes in an evolving complex network

The authors focus on scale-free networks,

with reference to Moore et al’s model which includes annihilation of nodes. They analyze a Japanese business relationship network. Companies are born, grow, die, and merge. Firms have (observed and measurable) preferential attachment to firms with higher degree.

(a) The three basic processes. (b) Time evolution of the number of links. (c) LIfetime and average degree. (d) Cumulative distribution of the number of links, with c=0 (orange) to c=0.5 (blue).

The cumulative distribution of company lifetimes is exponential P(\geq t) \propto \exp(-t/\tau), where \tau=18.8 years. (we see similar patterns in speciation). A semi-log plot shows degree grows exponentially with lifetime, where expected degree is \exp(At) and $A=0.017$.

This suggests the degree distribution should be power-law with exponent 1/A\tau, or 3.1. The observed degree distribution has exponent 1.3. Conclusion: random annihilation and growth by preferential attachment do not explain the real world.

To explain this, the authors turn to coagulation models of aerosols and colloids; the mass of these clouds follows a power law. They propose a model where network nodes can merge. More specifically, starting with N nodes, evolve the network by stochastically choosing annihilation, creation, or coagulation at each time step, with probabilities a, b, c (b=a+c=0.5). With 10^5 nodes, the model converges after 10^7 iterations, suggesting one year in the real world == 10600 iterations.

They show, both analytically and by simulation that the coagulation probability controls the final shape of the degree distribution. Small c gives an exponential distribution, large c gives power law.

Phys. Rev. Lett. 108, 168701 (2012) Effect of Coagulation of Nodes in an Evolving Complex Network. Wataru Miura, Hideki Takayasu, and Misako Takayasu

Controlling edge dynamics in complex networks– “news and views” version

The “news and views” write up summaries the paper by comparing to earlier results on controlling node dynamics.

“Controlling edge dynamics in complex networks” Nepusz and Vicsek, Nature Physics (2012)
doi:10.1038/nphys2327

Under node dynamics, a time-evolving state variable is associated with each node, with the evolution of the state is dependent on the state of the neighbours. Think metabolite concentrations, gene expression levels, or formation of opinions.

Under edge dynamics, a state variable is associated with each edge. Nodes act as switchboards, reading the state of upstream neighbours, processing it, and passing it to downstream neighbours. Think load-balancing routers (node=router, edge=route, state=load).

Under nodal dynamics, the minimum set of driver nodes can be identified by the minimum-inputs theorem of Liu, Slotine, and Barabasi (2011, Nature). Under edge dynamics, one still wants to find the minimum set of driver nodes (since nodes set the state of their edges). Nepusz and Vicsek’s algorithm solves this.

The two representations are duals in many ways. Most obviously, edge dynamics on a network can be represented by node dynamics on the network’s line graph. More interestingly,

Role of hubs:
Nodal. A hub relays a common signal to a large portion of the network, creating symmetries which restrict the state space.  Thus not in minimal driver set.
Edge. A hub can set each associated edge state separately. The large number of edges thus controlled means they fit in the minimal driver set.

Sparse vs dense networks:
Nodal. Sparse, inhomogeneous networks are the hardest to control.
Edge.  Sparse, inhomogeneous networks are the easiest to control.

Self-edges:
Nodal. Enhances controllability.
Edge. No effect.

Correlated in and out degree:
Nodal. No effect.
Edge. enhances controllability.

The work may be relevant to evolution, under the assumption that evolution is based on ancient, optimized components whose connections are the main target of natural selection.

All scale free networks are sparse

Scale free networks have a degree distribution which follows a power law, P(k) \sim k^{-\gamma} This letter explains why \leq < \gamma \leq 2 cannot occur (for large networks).

Note that \gamma <0 implies a very dense network (and hence not really “scale free” in the traditional sense), and \gamma >2 implies a sparse network.

The core of their argument comes from looking at the scaling of the largest and the lowest degree nodes. For \gamma \leq 2, the number of nodes with the largest and second largest degree is \O(N), while for \gamma >2, the number of such nodes grows sublinearly with N. Also, the number of nodes with degree 1 (or order 1) grows linearly with N. Thus, if \gamma \leq 2, one needs \O(N) degree one nodes to associate with the highest degree node, leaving no way to place edges for the second highest degree node.

Note these results are asymptotic. One could have a finite network with 0 \leq \gamma \leq 2. To grow such networks, however, requires either a cutoff on the allowed node degree, or that \gamma must increase over 2.

http://prl.aps.org/abstract/PRL/v107/i17/e178701

Charo I. Del Genio, Thilo Gross, and Kevin E. Bassler
Phys. Rev. Lett. 107, 178701 (2011)
All Scale-Free Networks Are Sparse

Explosive percolation in random networks

Amazing what Science will publish.

The authors show that percolation transitions can be discontinuous under Erdös-Rényi (ER) networks with choice.

Network evolution. (A) Under the Erdös-Rényi (ER) model, in each step two vertices are chosen at random and connected by an edge (shown as the dashed line). In this example, two components of size 7 and 2 get merged. (B) In models with choice, two random edges {e1,e2} are picked in each step yet only one is added to the network based on some selection rule, whereas the other is discarded. Under the product rule (PR), the edge selected is the one minimizing the product of the sizes of the components it merges. In this example, e1 (with product 2 × 7 = 14) would be chosen and e2 discarded (because 4 × 4 = 16). In contrast, the rule selecting the edge minimizing the sum of the component sizes instead of the product would select e2 rather than e1. (C) Typical evolution of C/n for ER, BF (a bounded size rule with K = 1), and PR, shown for n = 512,000.

The ER network starts with a set of disconnected nodes. Two nodes are selected uniformly at random and linked. Under this protocol, up until the number of edges equals half the number of nodes, the expected size of the large component scales with log n, i.e. it is small. Above the threshold, the size is linear at C approx (4r-2)n, where rn is the number of edges. Thus, a continuous phase transition at r=1/2.

The open question revolves around Achlioptas Processes; i.e. non-random (or less random) edge selection rules. For example, select two edges and add e1 if it connects two components of size 1, otherwise add e2. (Bohman and Frieze algorithm).

The authors of the current work give evidence that unbounded selection rules give discontinuous phase transitions.

Science 13 March 2009:
Vol. 323 no. 5920 pp. 1453-1455
DOI: 10.1126/science.1167782
Explosive Percolation in Random Networks
Dimitris Achlioptas
Raissa M. D’Souza, and
Joel Spencer