Monthly Archives: June 2012

Community structure in social and biological networks– most networks have community structure

From the Sante Fe institute, a classic from 2002.

The largest component of the Santa Fe Institute collaboration network, with the primary divisions detected by the author’s algorithm indicated by different vertex shapes.

This work proposed identifying community structure by focusing on the edges that are most “between” communities. They identify such edges using the betweenness centrality measure, under the insight that if two communities are connected by a single edge, all shortest paths between the two communities go along that edge. Thus that edge will be in many of the “shortest paths” in the graph.

The betweenness for all m edges can be found in time O(mn) by Newman’s algorithm (Phys Rev E 64, 2001)

Open access!
Community structure in social and biological networks. Girvan M, Newman ME. Proc Natl Acad Sci U S A. 2002 Jun 11;99(12):7821-6.

A number of recent studies have focused on the statistical properties of networked systems such as social networks and the Worldwide Web. Researchers have concentrated particularly on a few properties that seem to be common to many networks: the small-world property, power-law degree distributions, and network transitivity. In this article, we highlight another property that is found in many networks, the property of community structure, in which network nodes are joined together in tightly knit groups, between which there are only looser connections. We propose a method for detecting such communities, built around the idea of using centrality indices to find community boundaries. We test our method on computer-generated and real-world graphs whose community structure is already known and find that the method detects this known structure with high sensitivity and reliability. We also apply the method to two networks whose community structure is not well known–a collaboration network and a food web–and find that it detects significant and informative community divisions in both cases.

a model of the internet based on the k-shell

Visualization of our data of the Internet at the AS level. (Upper) A plot of all nodes, ordered by their k-shell indices, using the program of ref. 13. The legend to the left denotes degree, and the legend to the right denotes k-shell index. (Lower) A schematic plot of the suggested Medusa model decomposition of the AS level Internet into three components.

The authors apply the k-shell to the internet aththe autonomous system level of 20,000 nodes (see DIMES project, The analysis divides the net into three layers:

  1. a nucleus of ~100 tightly connected nodes (all nodes with k-shell 43, the max)
  2. A peer-connected component of ~15,000 nodes. These can connect without the nucleus, with a 42% increase in the number of hops needed.
  3. Isolated tendrils (~5,000 nodes).

This structure is not observed in all networks, specifically the actor network has disconnected k-cores and no tendrils.

Open access!

A model of Internet topology using k-shell decomposition.
Carmi S, Havlin S, Kirkpatrick S, Shavitt Y, Shir E.
Proc Natl Acad Sci U S A. 2007 Jul 3;104(27):11150-4.

We study a map of the Internet (at the autonomous systems level), by introducing and using the method of k-shell decomposition and the methods of percolation theory and fractal geometry, to find a model for the structure of the Internet. In particular, our analysis uses information on the connectivity of the network shells to separate, in a unique (no parameters) way, the Internet into three subcomponents: (i) a nucleus that is a small ( approximately 100 nodes), very well connected globally distributed subgraph; (ii) a fractal subcomponent that is able to connect the bulk of the Internet without congesting the nucleus, with self-similar properties and critical exponents predicted from percolation theory; and (iii) dendrite-like structures, usually isolated nodes that are connected to the rest of the network through the nucleus only. We show that our method of decomposition is robust and provides insight into the underlying structure of the Internet and its functional consequences. Our approach of decomposing the network is general and also useful when studying other complex networks.

See also Scale-free models for the structure of business firm networks., which claims

the sizes of the nucleus and the tendrils in scale-free networks decrease as the exponent of the power-law degree distribution lambda increases, and disappear for \lambda \geq 3

Are Llamas the key to stopping HIV?

Two recent papers suggest that llamas produce antibodies which have wide neutralizing ability for HIV. This gives the potential for a vaccine. Given HIV’s (relatively) slow rate of transmission, an effective vaccine could eliminate this plague.

The proud llama

Llama antibody fragments recognizing various epitopes of the CD4bs neutralize a broad range of HIV-1 subtypes A, B and C.
Strokappe N, Szynol A, Aasa-Chapman M, Gorlani A, Forsman Quigley A, Hulsik DL, Chen L, Weiss R, de Haard H, Verrips T.
PLoS One. 2012;7(3):e33298.

Many of the neutralising antibodies, isolated to date, display limited activities against the globally most prevalent HIV-1 subtypes A and C. Therefore, those subtypes are considered to be an important target for antibody-based therapy. Variable domains of llama heavy chain antibodies (VHH) have some superior properties compared with classical antibodies. Therefore we describe the application of trimeric forms of envelope proteins (Env), derived from HIV-1 of subtype A and B/C, for a prolonged immunization of two llamas. A panel of VHH, which interfere with CD4 binding to HIV-1 Env were selected with use of panning. The results of binding and competition assays to various Env, including a variant with a stabilized CD4-binding state (gp120(Ds2)), cross-competition experiments, maturation analysis and neutralisation assays, enabled us to classify the selected VHH into three groups. The VHH of group I were efficient mainly against viruses of subtype A, C and B’/C. The VHH of group II resemble the broadly neutralising antibody (bnmAb) b12, neutralizing mainly subtype B and C viruses, however some had a broader neutralisation profile. A representative of the third group, 2E7, had an even higher neutralization breadth, neutralizing 21 out of the 26 tested strains belonging to the A, A/G, B, B/C and C subtypes. To evaluate the contribution of certain amino acids to the potency of the VHH a small set of the mutants were constructed. Surprisingly this yielded one mutant with slightly improved neutralisation potency against 92UG37.A9 (subtype A) and 96ZM651.02 (subtype C). These findings and the well-known stability of VHH indicate the potential application of these VHH as anti-HIV-1 microbicides.

J Exp Med. 2012 Jun 4;209(6):1091-103. Potent and broad neutralization of HIV-1 by a llama antibody elicited by immunization. McCoy LE, Quigley AF, Strokappe NM, Bulmer-Thomas B, Seaman MS, Mortier D, Rutten L, Chander N, Edwards CJ, Ketteler R, Davis D, Verrips T, Weiss RA.

Llamas (Lama glama) naturally produce heavy chain-only antibodies (Abs) in addition to conventional Abs. The variable regions (VHH) in these heavy chain-only Abs demonstrate comparable affinity and specificity for antigens to conventional immunoglobulins despite their much smaller size. To date, immunizations in humans and animal models have yielded only Abs with limited ability to neutralize HIV-1. In this study, a VHH phagemid library generated from a llama that was multiply immunized with recombinant trimeric HIV-1 envelope proteins (Envs) was screened directly for HIV-1 neutralization. One VHH, L8CJ3 (J3), neutralized 96 of 100 tested HIV-1 strains, encompassing subtypes A, B, C, D, BC, AE, AG, AC, ACD, CD, and G. J3 also potently neutralized chimeric simian-HIV strains with HIV subtypes B and C Env. The sequence of J3 is highly divergent from previous anti-HIV-1 VHH and its own germline sequence. J3 achieves broad and potent neutralization of HIV-1 via interaction with the CD4-binding site of HIV-1 Env. This study may represent a new benchmark for immunogens to be included in B cell-based vaccines and supports the development of VHH as anti-HIV-1 microbicides.


A zombie star is an apparently dead white dwarf which comes back to life in a huge explosion. Click the image to read more at CREDIT: NASA/CXC/Chinese Academy of Sciences/F. Lu et al

Recent article in journal of Astrophysics (Astrophys. J. 746, 179 (2012)) with evidence that 1961V may indeed be a zombie:

Reports of the death of the precursor of supernova (SN) 1961V in NGC 1058 are exaggerated. Consideration of the best astrometric data shows that the star, known as “Object 7,” lies at the greatest proximity to SN 1961V and is the likely survivor of the “SN impostor” super-outburst. SN 1961V does not coincide with a neighboring radio source and is therefore not a radio SN. Additionally, the current properties of Object 7, based on data obtained with the Hubble Space Telescope, are consistent with it being a quiescent luminous blue variable (LBV). Furthermore, post-explosion non-detections by the Spitzer Space Telescope do not necessarily and sufficiently rule out a surviving LBV. We therefore consider, based on the available evidence, that it is still a bit premature to reclassify SN 1961V as a bona fide SN. The inevitable demise of this star, though, may not be too far off.

Stability criteria for complex ecosystems

Allesina and Tang do a more detailed analysis of May’s seminal work on stability matrices.

May’s approach defines a community matrix M of size SxS, where S is the number of species, and Mi,j is the effect of species j on species i. Entries Mi,j are N(0,\sigma^2) with prob C, zero otherwise. May showed that when the complexity  \sigma \sqrt{SC} >1 the probability of stability is near null. Thus rich (high S) or highly connected (high C) communities should be rare.

The current authors allow the community matrix to have structure.  Preditor-prey networks are structured so that Mi,j and Mj,i have opposite signs. A mixture of competition and mutulalism arises when Mi,j and Mj,i are constrained to have the same sign.

a, Plot of the eigenvalues of 10 matrices following the random, predator–prey or mixture prescriptions. The black ellipses are derived analytically. b, Numerical simulations for the corresponding stability profiles. The phase transition between stability and instability is accurately predicted by our derivation.

Stability arises when the eigen values have negative real parts.  Random matrices constrain their eigenvalues to a circle, pred-prey to a vertical ellipse, and comp/mut to a horizontal ellipse. More formally, the difference in stability is driven exclusively by the arrangement of the coefficients into pairs with  random, opposite, and same signs. Intermediate cases can be formed by linear combinations.

Imposing realistic food webs decreases stability.

What if Mi,j are not normally distributed? Imagine many weak interactions: preditor-prey networks become less stable, competition/mutulalism networks become more stable, and random are unchanged.

Very interesting work.

Stability criteria for complex ecosystems“, Stefano Allesina,Si Tang. Nature (2012)

America’s declining trust in institutions/ Chris Hayes why meritocracy is just a myth

From Joshua Holland’s alternet interview with Chris Hayes on his new book:

One of the toxic aspects of our politics is that the gap between the American Dream and the reality is something people feel viscerally.

Photo by the amazing Sarah Shatz

The data is really clear — when you look across the landscape, American trust in pillar institutions, like the financial sector, big business, media, science and academia, and even religion are at or near all-time lows.

They’re at all-time lows even compared to when this polling was initiated in the 1970s, in the wake of Watergate… The irony is that the polling was initiated in the 1970s, and what was then viewed as the nadir of public trust in institutions turns out to have been the high watermark…

The project of the book started with trying to get to the bottom of why this was the case… the argument in the book that I assert is if you take a step back and look at the record of the last 10 years in American life — what I call in the book the “fail decade” — it is a cascade of incompetence and corruption.

You start with the Bush v Gore decision where a slim majority on the court hands the election over to the favored candidate even though he doesn’t win the popular vote, and even though the legal logic is tortured. Then there’s the failure of the largest security apparatus in the history of human civilization, the American security state, when it couldn’t stop 19 men with box cutters. Then you go to Iraq and the fallout there. You go to the botched Katrina rescue. Then you go to a financial crisis. Running through there you have Enron, Major League Baseball’s steroids issue, the child abuse scandal in the Catholic Church. So there’s a long-winded answer to the question you set up, which is we are less trusting of our institutions because they appear less trustworthy. They’ve had a very poor record of institutional performance over the last decade.

The million user fallacy (measuring Twitter influence)

Notes on Measuring User Influence in Twitter: The Million Follower Fallacy, Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, Krishna P. Gummadi, Proc. International AAAI Conference on Weblogs and Social Media (ICWSM), May 2010.

The million follower fallacy states that having lots of followers translates into having lots of influence.  This paper is one of several disproving that hypothesis. The term comes from an article posted by Avnit in 2009, but which seems to have been taken down.

Traditional communication theory claims that a minority of people, informed, respected, and well connected, shape the opinion of the society. These are the hubs, mavens, influence leaders, …

A more modern view says that the important factor is how open the society is. The Watts 2007 paper on this found that in a homogeneous degree network, anyone could start a long cascade. not having read the paper, my criticism could be off, but I wonder if this finding isn’t just a result of the network structure. If degree is homogeneous, then by definition you don’t have hubs/mavens.

The current work investigates three measures of influence: followers, retweets, and mentions. The most followed twitterers are public figures and news media. The most retweeted twitterers are content aggregation sites. The most mentioned users were celebrities. The three categories do not overlap well.

Influence follows a power law.

Influence runs across topics; a user who is influential in one topic is often influential in other topics as well. “Most influential users hold significant influence over a variety of topics.”

One can gain influence by focusing on a single topic, and posting creative and insightful tweets on that topic.

See also “What is Twitter, a social network or a news media”, by Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. Proceedings of the 19th International World Wide Web (WWW) Conference, April 26-30, 2010, Raleigh NC (USA), which notes:

[Twitter has] a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks~\cite{Newman03}… Ranking by retweets indicats a gap in influence inferred from the number of followers and that from the popularity of one’s tweets… [We]show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.

Why the experts are always wrong

Berlin’s hedgehog and fox model has proved highly influential. Hedgehogs know one trick. An expert has a model of how the world works. Chances are this is a very good model. It had strong theoretical training and copious empirical support. It is, however, only a model. It does not contain things extraneous to the model. And it is those factors which lead to revolutionary regime change.

It has been shown that both speciation and businesses extinction rates follow exponential distributions. This is probably true for many more evolutionary systems.

The implication is that the causes of regime change are many, are non-cumulative, and non-casual. Non-casual in that the same event will cause a change one time, yet the same event will not cause a change the next time. Perhaps the system has adopted to the previous shock. “They’ll never hit us with an aircraft again!” (alt.: we will never again have an exploding aircraft…) Perhaps the larger environment had shifted. The last poor employment report coincided with a fall in the stock market; the current poor report was ignored.

We need experts. We need to know how things work. And often, the experts are right. We just can never know when they will be wrong, when the world has shifted such that their model loses it’s explanatory value.

Now, dear reader, the thoughts above are not new. I read them more elegantly expressed some months ago, and again before that. If you could track down another version of these sentiments, I would be most grateful.

The 2012 Connecticut Primary Project

The CPP website is an organizing link to bring the Conneticut legislature into line with the clear and obvious will of the people on the topic of marijuana legalization.

According to the most recent Gallup poll 50% of Americans now support the legalization of marijuana –nearly double the level of support from 2000. Moreover, support for legalization has risen among all major voting groups, with increasing numbers of Democrats, Republicans, independents, liberals, and conservatives now calling for an end to marijuana prohibition.

Since as far back as 2002, polls of Connecticut voters have shown support for medical marijuana in excess of 70%, yet to this day marijuana remains illegal in our state, even for the terminally ill. In 2011, despite majority support in every single voter group, Connecticut’s marijuana decriminalization bill was nearly blocked by the Senate.

Opponents of reform have not suffered any consequences for their votes.

Time to change that.

Assessing the relevance of node features for network structure (chance or necessity?)

The topology of a (real-world) network is a combination of chance and necessity. Community structure represents necessity, yet the evolution of the network is (often) stochastic. This work presents an indicator of how node characteristics affect network topology.

Their indicator measures the specificity of a network g for a particular assignment q by comparing to random assignment. Specificity is computed (roughly) as the total number of links between pairs of communities.