Dynamical Systems Seminar: Tiago de Paula Peixoto
Hierarchical Block Structures and High-Resolution Model Selection in Large Networks
,Ìý
Date and time:Ìý
Thursday, May 1, 2014 - 2:00pm
³¢´Ç³¦²¹³Ù¾±´Ç²Ô:Ìý
ECCR 257
´¡²ú²õ³Ù°ù²¹³¦³Ù:Ìý
Many social, technological, and biological networks are composed of modules, representing groups of nodes that are assumed to have a similar role in the functioning of the network. The use of statistical generative models that formally characterize this modular structure is a powerful tool that allows the development of well-defined and principled techniques to extract such information from empirical data. What is perhaps the most popular example is the stochastic block model, which allows the detection of arbitrary mixing patterns, and has been extended to include many other properties such as arbitrary degree variability, multiple group memberships, edge weights, etc. However, even such well-motivated techniques may possess pitfalls, specially for very large networks. In particular, when performing model selection for the stochastic block model via minimum description length, a strong resolution limit is encountered if simple non-informative prior assumptions are made, where the minimum size of detectable groups scales with the square root of the total number of nodes, regardless of how well-defined the modular structures appear to be. In this talk, I present a hierarchical generalization of the stochastic block model which lifts this limitation, while at the same time remaining fully non-parametric.Ìý Even with this increased resolution, the method is based on the principle of parsimony, and is capable of separating signal from noise, and thus will not lead to the identification of spurious modules even on sparse networks.Ìý Besides serving as a solution to this practical problem, the model provides the entire multilevel structure of the data, through a complete description of the entire network hierarchy at multiple scales. Furthermore, it fully generalizes other approaches in that it is not restricted to purely assortative mixing patterns, directed or undirected graphs, and ad hoc hierarchical structures such as binary trees. Despite its general character, the approach is tractable, and yields an efficient algorithm which scales well for very large networks.
I illustrate the application of the method on several empirical networks, including, among others, the whole internet at the autonomous systems level, and a bipartite network of actors and films with over 106Ìýedges.