# Category Archives: networks

## Community structure in social and biological networks– most networks have community structure

From the Sante Fe institute, a classic from 2002.

This work proposed identifying community structure by focusing on the edges that are most “between” communities. They identify such edges using the betweenness centrality measure, under the insight that if two communities are connected by a single edge, all shortest paths between the two communities go along that edge. Thus that edge will be in many of the “shortest paths” in the graph.

The betweenness for all m edges can be found in time O(mn) by Newman’s algorithm (Phys Rev E 64, 2001)

Open access!

Community structure in social and biological networks. Girvan M, Newman ME. Proc Natl Acad Sci U S A. 2002 Jun 11;99(12):7821-6.

A number of recent studies have focused on the statistical properties of networked systems such as social networks and the Worldwide Web. Researchers have concentrated particularly on a few properties that seem to be common to many networks: the small-world property, power-law degree distributions, and network transitivity. In this article, we highlight another property that is found in many networks, the property of community structure, in which network nodes are joined together in tightly knit groups, between which there are only looser connections. We propose a method for detecting such communities, built around the idea of using centrality indices to find community boundaries. We test our method on computer-generated and real-world graphs whose community structure is already known and find that the method detects this known structure with high sensitivity and reliability. We also apply the method to two networks whose community structure is not well known–a collaboration network and a food web–and find that it detects significant and informative community divisions in both cases.

## a model of the internet based on the k-shell

The authors apply the k-shell to the internet aththe autonomous system level of 20,000 nodes (see DIMES project, www.netdimes.com). The analysis divides the net into three layers:

- a nucleus of ~100 tightly connected nodes (all nodes with k-shell 43, the max)
- A peer-connected component of ~15,000 nodes. These can connect without the nucleus, with a 42% increase in the number of hops needed.
- Isolated tendrils (~5,000 nodes).

This structure is not observed in all networks, specifically the actor network has disconnected k-cores and no tendrils.

Open access!

A model of Internet topology using k-shell decomposition.

Carmi S, Havlin S, Kirkpatrick S, Shavitt Y, Shir E.

Proc Natl Acad Sci U S A. 2007 Jul 3;104(27):11150-4.

We study a map of the Internet (at the autonomous systems level), by introducing and using the method of k-shell decomposition and the methods of percolation theory and fractal geometry, to find a model for the structure of the Internet. In particular, our analysis uses information on the connectivity of the network shells to separate, in a unique (no parameters) way, the Internet into three subcomponents: (i) a nucleus that is a small ( approximately 100 nodes), very well connected globally distributed subgraph; (ii) a fractal subcomponent that is able to connect the bulk of the Internet without congesting the nucleus, with self-similar properties and critical exponents predicted from percolation theory; and (iii) dendrite-like structures, usually isolated nodes that are connected to the rest of the network through the nucleus only. We show that our method of decomposition is robust and provides insight into the underlying structure of the Internet and its functional consequences. Our approach of decomposing the network is general and also useful when studying other complex networks.

See also Scale-free models for the structure of business firm networks., which claims

the sizes of the nucleus and the tendrils in scale-free networks decrease as the exponent of the power-law degree distribution lambda increases, and disappear for

## Stability criteria for complex ecosystems

Allesina and Tang do a more detailed analysis of May’s seminal work on stability matrices.

May’s approach defines a community matrix M of size SxS, where S is the number of species, and Mi,j is the effect of species j on species i. Entries Mi,j are with prob C, zero otherwise. May showed that when the complexity the probability of stability is near null. Thus rich (high S) or highly connected (high C) communities should be rare.

The current authors allow the community matrix to have structure. Preditor-prey networks are structured so that Mi,j and Mj,i have opposite signs. A mixture of competition and mutulalism arises when Mi,j and Mj,i are constrained to have the same sign.

Stability arises when the eigen values have negative real parts. Random matrices constrain their eigenvalues to a circle, pred-prey to a vertical ellipse, and comp/mut to a horizontal ellipse. More formally, the difference in stability is driven exclusively by the arrangement of the coefficients into pairs with random, opposite, and same signs. Intermediate cases can be formed by linear combinations.

Imposing realistic food webs decreases stability.

What if Mi,j are not normally distributed? Imagine many weak interactions: preditor-prey networks become less stable, competition/mutulalism networks become more stable, and random are unchanged.

Very interesting work.

“Stability criteria for complex ecosystems“, Stefano Allesina,Si Tang. Nature (2012)

doi:10.1038/nature10832

## The million user fallacy (measuring Twitter influence)

Notes on **Measuring User Influence in Twitter: The Million Follower Fallacy**, Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, Krishna P. Gummadi, *Proc. International AAAI Conference on Weblogs and Social Media (ICWSM)*, May 2010.

The million follower fallacy states that having lots of followers translates into having lots of influence. This paper is one of several disproving that hypothesis. The term comes from an article posted by Avnit in 2009, but which seems to have been taken down.

Traditional communication theory claims that a minority of people, informed, respected, and well connected, shape the opinion of the society. These are the hubs, mavens, influence leaders, …

A more modern view says that the important factor is how open the society is. The Watts 2007 paper on this found that in a homogeneous degree network, anyone could start a long cascade. *not having read the paper, my criticism could be off, but I wonder if this finding isn’t just a result of the network structure. If degree is homogeneous, then by definition you don’t have hubs/mavens.*

The current work investigates three measures of influence: **followers, retweets, and mentions.** The **most followed **twitterers are public figures and news media. The** most retweeted** twitterers are content aggregation sites. The **most mentioned** users were celebrities. The three categories do not overlap well.

Influence follows a power law.

Influence runs across topics; a user who is influential in one topic is often influential in other topics as well. “Most influential users hold significant influence over a variety of topics.”

One can gain influence by focusing on a single topic, and posting creative and insightful tweets on that topic.

See also **“What is Twitter, a social network or a news media”**, by Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. *Proceedings of the 19th International World Wide Web (WWW) Conference,* April 26-30, 2010, Raleigh NC (USA), which notes:

[Twitter has] a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks~\cite{Newman03}… Ranking by retweets indicats a gap in influence inferred from the number of followers and that from the popularity of one’s tweets… [We]show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.

## Assessing the relevance of node features for network structure (chance or necessity?)

The topology of a (real-world) network is a combination of chance and necessity. Community structure represents necessity, yet the evolution of the network is (often) stochastic. This work presents an indicator of how node characteristics affect network topology.

Their indicator measures the specificity of a network *g* for a particular assignment *q* by comparing to random assignment. Specificity is computed (roughly) as the total number of links between pairs of communities.

## Effect of coagulation of nodes in an evolving complex network

#### The authors focus on scale-free networks,

with reference to Moore *et al’s* model which includes annihilation of nodes. They analyze a Japanese business relationship network. Companies are born, grow, die, and merge. Firms have (observed and measurable) preferential attachment to firms with higher degree.

The cumulative distribution of company lifetimes is exponential where years. (we see similar patterns in speciation). A semi-log plot shows degree grows exponentially with lifetime, where expected degree is and $A=0.017$.

This suggests the degree distribution should be power-law with exponent , or 3.1. The observed degree distribution has exponent 1.3. **Conclusion: random annihilation and growth by preferential attachment do not explain the real world.**

To explain this, the authors turn to coagulation models of aerosols and colloids; the mass of these clouds follows a power law. They propose a model where network nodes can merge. More specifically, starting with N nodes, evolve the network by stochastically choosing *annihilation, creation,* or *coagulation* at each time step, with probabilities a, b, c (). With 10^5 nodes, the model converges after 10^7 iterations, suggesting one year in the real world == 10600 iterations.

They show, both analytically and by simulation that the **coagulation probability controls the final shape of the degree distribution.** Small gives an exponential distribution, large gives power law.

*Phys. Rev. Lett.* 108, 168701 (2012) Effect of Coagulation of Nodes in an Evolving Complex Network. Wataru Miura, Hideki Takayasu, and Misako Takayasu

## Controlling edge dynamics in complex networks– “news and views” version

### The “news and views” write up summaries the paper by comparing to earlier results on controlling node dynamics.

“Controlling edge dynamics in complex networks” Nepusz and Vicsek, *Nature Physic*s (2012)

doi:10.1038/nphys2327

Under **node dynamics**, a time-evolving state variable is associated with each node, with the evolution of the state is dependent on the state of the neighbours. Think metabolite concentrations, gene expression levels, or formation of opinions.

Under **edge dynamics**, a state variable is associated with each edge. Nodes act as switchboards, reading the state of upstream neighbours, processing it, and passing it to downstream neighbours. Think load-balancing routers (node=router, edge=route, state=load).

Under nodal dynamics, the minimum set of driver nodes can be identified by the minimum-inputs theorem of Liu, Slotine, and Barabasi (2011, Nature). Under edge dynamics, one still wants to find the minimum set of driver nodes (since nodes set the state of their edges). Nepusz and Vicsek’s algorithm solves this.

The two representations are duals in many ways. Most obviously, edge dynamics on a network can be represented by node dynamics on the network’s line graph. More interestingly,

**Role of hubs:**

*Nodal.* A hub relays a common signal to a large portion of the network, creating symmetries which restrict the state space. Thus not in minimal driver set.

*Edge.* A hub can set each associated edge state separately. The large number of edges thus controlled means they fit in the minimal driver set.

**Sparse vs dense networks:**

*Nodal*. Sparse, inhomogeneous networks are the *hardest* to control.

*Edge.* Sparse, inhomogeneous networks are the *easiest* to control.

**Self-edges:**

*Nodal*. Enhances controllability.

*Edge.* No effect.

**Correlated in and out degree:**

*Nodal.* No effect.

*Edge.* enhances controllability.

The work may be relevant to evolution, under the assumption that evolution is based on ancient, optimized components whose connections are the main target of natural selection.

## All scale free networks are sparse

Scale free networks have a degree distribution which follows a power law, This letter explains why cannot occur (for large networks).

Note that implies a very dense network (and hence not really “scale free” in the traditional sense), and implies a sparse network.

The core of their argument comes from looking at the scaling of the largest and the lowest degree nodes. For , the number of nodes with the largest and second largest degree is , while for , the number of such nodes grows sublinearly with N. Also, the number of nodes with degree 1 (or order 1) grows linearly with N. Thus, if , one needs degree one nodes to associate with the highest degree node, leaving no way to place edges for the second highest degree node.

Note these results are asymptotic. One could have a finite network with . To grow such networks, however, requires either a cutoff on the allowed node degree, or that must increase over 2.

http://prl.aps.org/abstract/PRL/v107/i17/e178701

Charo I. Del Genio, Thilo Gross, and Kevin E. Bassler

Phys. Rev. Lett. 107, 178701 (2011)

All Scale-Free Networks Are Sparse

## Explosive percolation in random networks

Amazing what Science will publish.

The authors show that percolation transitions can be discontinuous under Erdös-Rényi (ER) networks with choice.

The ER network starts with a set of disconnected nodes. Two nodes are selected uniformly at random and linked. Under this protocol, up until the number of edges equals half the number of nodes, the expected size of the large component scales with log n, i.e. it is small. Above the threshold, the size is linear at C approx (4r-2)n, where *rn* is the number of edges. Thus, **a continuous phase transition** at *r=1/2*.

The open question revolves around *Achlioptas Processes*; i.e. non-random (or less random) edge selection rules. For example, select two edges and add e1 if it connects two components of size 1, otherwise add e2. (Bohman and Frieze algorithm).

The authors of the current work give evidence that unbounded selection rules give discontinuous phase transitions.

Science 13 March 2009:

Vol. 323 no. 5920 pp. 1453-1455

DOI: 10.1126/science.1167782

Explosive Percolation in Random Networks

Dimitris Achlioptas

Raissa M. D’Souza, and

Joel Spencer