Xia Hu is an Associate Professor in Computer Science at Rice University. He is currently directing the DATA Lab, with his students and collaborators, they strive to develop automated and interpretable data mining and machine learning algorithms with theoretical properties to better discover actionable patterns from large-scale, networked, dynamic and sparse data. Their research is directly motivated by, and contributes to, applications in social informatics, health informatics and information security. Their work has been featured in Various News Media, such as MIT Tech Review, ACM TechNews, New Scientist, Fast Company, Economic Times. Their research is generously supported by federal agencies such as DARPA (XAI, D3M and NGS2) and NSF (CAREER, III, SaTC, CRII, S&AS), and industrial sponsors such as Adobe, Apple, Alibaba, Google, LinkedIn and JP Morgan. He was the General Co-Chair for WSDM 2020.
Title: Provable Benign Training of Graph Neural Networks.
Abstract: Graph neural networks (GNNs), which learn the node representations by recursively aggregating neighborhoods, have become prominent computation tools in graph structured data. However, the training of GNNs is notoriously challenging due to over-smoothing issue and poor scalability. The over-smoothing (i.e., smoothed node representations with recursive aggregation) prevents the exploration of high-order neighborhoods. Even worse, the storage of node representations results in high memory usage on large-scale graphs. To bridge the gaps towards stable and scalable training, we optimize GNNs from three coherent perspectives: (1) Differentiable group normalization, an architectural component to relieve the over-smoothing; (2) Dirichlet energy constrained learning, a principle to guide the design of deep GNNs; and (3) Extreme activation compression, a systematic tool to reduce memory. In practice, we could train deep GNNs up to 64 layers to achieve state-of-the-art performances on real-world datasets, and reduce the memory footprint of node representations by up to 32X. Furthermore, we provide the comprehensive theoretical analysis to understand how our optimizations guarantee the benign training of GNNs.