Amir Ghasmeian presentation

Tuesday, May 10, 2022 10:00 - 11:00 AM

Amir Ghasmeian, postdoctoral research associate at Yale University, will present "Limits of model selection, link prediction, and community detection."

Abstract: Networks have attracted wide interest in the scientific community as a powerful tool for studying complex systems, which provide a trade-off between reality and generality. A common graph mining task is community detection, which seeks an unsupervised decomposition of a network into groups based on statistical regularities in network connectivity. On the other hand, the task of link prediction (predicting missing links) has become a standard for evaluating and comparing models of network structure, playing a role in networks that is similar to that of cross-validation in traditional statistical learning. Although many such algorithms exist, community detection and link prediction’s No Free Lunch theorems imply that no algorithm can be optimal across all inputs. However, little is known in practice about how different algorithms over- or under-fit to real networks, or how to reliably assess such behavior across algorithms. It has remained unknown whether a single best link predictor exists, how link predictability varies across methods and networks from different domains, and how close to optimality current methods are. In this talk I answer these important questions on the limitations of model selection, link prediction, and community detection on complex networks. I present a broad investigation of over- and under-fitting across 16 state-of-the-art community detection algorithms applied to a novel benchmark corpus of 572 structurally diverse real-world networks. I answer questions on optimal link prediction by systematically evaluating 203 individual link predictor algorithms, representing three popular families of methods, applied to a large corpus of networks. And finally, using arguments from statistical physics, I will present the detectability limits on community detection, where below this threshold, there is no efficient algorithm that can identify the communities better than chance, and propose two algorithms that are optimal in the sense that they succeed all the way down to this threshold.

Amir Ghasmeian is a post-doctoral research associate at Yale University's Human Nature Lab, led by Nicholas Christakis, and an affiliate member of the Computational Social Science Lab at the University of Pennsylvania. He received his PhD from the University of Colorado Boulder, working with Aaron Clauset. His research interests lie in network science, statistical inference, causal inference, statistical physics, and machine learning. He is the recipient of the NSF/CRA/CCC Computing Innovation Award (CIFellow 2020), and his research has been published in top physics, computer science, and multidisciplinary scientific journals such as Physical Review X, TKDE, and PNAS.