In this talk we describe our efforts for knowledge collection for products of thousands of types. We describe how we nail down the most important first step for delivering the data business: training high-precision models that generate accurate data. We then describe how we scale up the models with learning from limited labels, and how we increase the yields with multi-modal models and web extraction. We share the many learnings and lessons in building this product graph and applying it to support customer-facing applications..

First, we connect information theory to graph neural network interpretation. The goal of GNN-based interpretation is to find the subgraphs of the input graphs that are most indicative to the prediction task labels. We disclose the drawbacks of posthoc interpretation methods that have been typically discussed in the community. We demonstrate that the information bottleneck principle is the only needed prior assumption needed for GNN interpretation, where previously-used sparsity and connectivity constraints may bias the interpretation results. Our theory naturally yields a novel architecture termed graph stochastic attention mechanism which can inherently provide reliable interpretation results.

Second, we introduce how to properly leverage traditional network embedding approaches in graph neural network models, so as to improve GNN expressive power while keeping the model generalizable across different networks (i.e., the inductive property). This is a work to respond to the previous criticism that using traditional network embedding may lose the inductive property unless using random augmentation. We address this problem in both theory and practice.

Third, if we have more time, we will introduce a novel graph auto-encoder. Previous GAE/VGAE and many others self-supervised GNN models are only applied to proximity-driven node-level tasks. However, our novel graph auto-encoder can also be applied to structure-driven node-level tasks. Our key idea is to leverage optimal transport to reconstruct neighbor nodes representations.