Community detection is one of the fundamental problems in network analysis, with many diverse applications, and a lot of work has been done on models and algorithms that find communities. Perhaps the most commonly used probabilistic model for a network with communities is the stochastic block model, and many algorithms for fitting it have been proposed. Since finding communities involves optimizing over all possible assignments of discrete labels, most existing algorithms do not scale well to large networks, and many fail on sparse networks. In this talk, we propose a pseudo-likelihood approach for fitting the stochastic block model to address these shortcomings. Pseudo-likelihood is a general statistical principle that involves trading off some of the model complexity against computational efficiency. We also derive a variant that allows for arbitrary degree distributions in the network, making it suitable for fitting the more flexible degree-corrected stochastic block model. The pseudo-likelihood algorithm scales easily to networks with millions of nodes, performs well empirically under a range of settings, including on very sparse networks, and is asymptotically consistent under reasonable conditions. If times allows, I will also discuss spectral clustering with perturbations, a new method of independent interest we use to initialize pseudo-likelihood, which works well on sparse networks where regular spectral clustering fails.
Joint work with Arash Amini, Aiyou Chen, and Peter Bickel.