Hi, I am XING LI, a researcher from Sansan DSOC.
Graph is a more general data form to describe our world. Sansan DSOC is creatively exploiting graph data to mining new value for benefitting customers. To coin a phrase, sharp tools make good work. Deep Graph Library(DGL) is just the sharp tool you need to explore deep graph learning. I am planning to post a series hands-on guidance to DGL library together with graph learning topics as examples. This is the first blog of this series, starting with a brief introduction and one core feature of DGL.
Introduction
In the last few years, many challenging tasks in machine learning field have been extraordinary successfully solved or almost solved by deep learning. These tasks are mainly located in CV(computer vision) or NLP(nature language processing) areas. They share one common feature that most of them are described as euclidean data, the image data usually has a structure of two-dimensional coordinate, sometimes with more layers information and the human language could contain a time series information. While in our real world, non-euclidean data is a more general form of data structure than euclidean data or the tensors. Learning from this more general structured data, more specifically, the graph data, is naturally and widely regarded as an important task.
Recently, Graph Neural Networks(GNNs) family emerges from a tons of models as a versatile fundamental model structure to tackle graph data across several subjects. Such as chemical molecules, social networks, bioinformatics, knowledge graphs and recommendation systems.
An obvious trend in deep learning area is the higher integration of deep learning framework. From Theano, Caffe to current mainstreams Tensorflow, PyTorch. These frameworks provide us an off-the-shelf tool to conveniently and quickly deploy neural networks but also keep the necessary model flexibility for customising specific architectures. In the area of graph neural networks, there are also several frameworks. The two most popular frameworks are Deep Graph Library(DGL) and PyTorch Geometric(PyG).
Name | Github Stars | Support Team | Supported Framework |
---|---|---|---|
DGL | 4.7K | Amazon Web Services | PyTorch, TensorFlow and Maxnet |
PyG | 7.5K | TU Dortmund University | PyTorch |
(All information in this table is retrieved at 12/MAY/2020, DGL 0.4, PyG Latest).
In addition to the above table, DGL could train at most 0.5 billion nodes and 25 billion edges in one single machine with appropriate memory while PyG does not reveal such information. PyG has slightly faster training speed in small graph but DGL is faster when graph is getting larger. Both of them implement most mainstream graph neural network models, but DGL supports more tools and features, such as more optional sampling methods, Heterogeneous Graphs, Knowledge graph and etc..
Here, we only use DGL as the example to explore deep graph learning, but PyG is also a popular framework and worthy of attention. The following contents may involve some concepts of Graph Convolutional Networks and assume other basic understandings of graph learning.
DGL works with the following operating systems:
- Ubuntu 16.04
- macOS X
- Windows 10
Python 3.5 or later version is required to run DGL. More information of installing DGL could be found here.
DGL contains many sophisticated features, such as Message Passing API, Nodeflow Data Structure, Sampling Methods, Heterogeneous Graphs, Scale to Giant Graphs, Knowledge Graph and etc.. They play different roles when facing different requirements and still keep updating continuously. However, some features are fundamental to understand the overall framework and understanding these cornerstones will make it easier to master other features. We will discuss one main feature in this blog: Message Passing.
続きを読む