Skip to content
This repository was archived by the owner on Mar 14, 2024. It is now read-only.

Commit 46c45e6

Browse files
adamlererfacebook-github-bot
authored andcommitted
Add slack channel to README, and clean up. (#244)
Summary: ## Types of changes - [x] Docs change / refactoring / dependency upgrade - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Motivation and Context / Related issue Cleans up the README and adds a link to our new Slack workspace for devs/users. ## Checklist - [x] The documentation is up-to-date with the changes I made. - [x] I have read the **CONTRIBUTING** document and completed the CLA (see **CONTRIBUTING**). - [x] All tests passed, and additional code has been covered with new tests. Pull Request resolved: #244 Reviewed By: lw Differential Revision: D33638675 Pulled By: adamlerer fbshipit-source-id: 77da759c7c673c4730f5f4b0a3e5bf7befbda2c2
1 parent e3de4a3 commit 46c45e6

1 file changed

Lines changed: 32 additions & 6 deletions

File tree

README.md

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,31 @@
44

55
PyTorch-BigGraph (PBG) is a distributed system for learning graph embeddings for large graphs, particularly big web interaction graphs with up to billions of entities and trillions of edges.
66

7-
**Update:** *PBG now supports GPU training. Check out the [GPU Training](#gpu-training) section below!*
87

98
PBG was introduced in the [PyTorch-BigGraph: A Large-scale Graph Embedding Framework](https://mlsys.org/Conferences/2019/doc/2019/71.pdf) paper, presented at the [SysML conference](https://mlsys.org/) in 2019.
109

10+
**Update:** *PBG now supports GPU training. Check out the [GPU Training](#gpu-training) section below!*
11+
12+
<!-- toc -->
13+
- [Overview](#overview)
14+
- [Requirements](#requirements)
15+
- [Installation](#installation)
16+
- [Getting Started](#getting-started)
17+
- [Downloading the data](#downloading-the-data)
18+
- [Preparing the data](#preparing-the-data)
19+
- [Training](#training)
20+
- [GPU Training](#gpu-training)
21+
- [Evaluation](#evaluation)
22+
- [Converting the output](#converting-the-output)
23+
- [Pre-trained embeddings](#pre-trained-embeddings)
24+
- [Documentation](#documentation)
25+
- [Communication](#communication)
26+
- [Citation](#citation)
27+
- [License](#license)
28+
29+
<!-- tocstop -->
30+
31+
## Overview
1132
PBG trains on an input graph by ingesting its list of edges, each identified by its source and target entities and, possibly, a relation type. It outputs a feature vector (embedding) for each entity, trying to place adjacent entities close to each other in the vector space, while pushing unconnected entities apart. Therefore, entities that have a similar distribution of neighbors will end up being nearby.
1233

1334
It is possible to configure each relation type to calculate this "proximity score" in a different way, with the parameters (if any) learned during training. This allows the same underlying entity embeddings to be shared among multiple relation types.
@@ -22,7 +43,6 @@ PBG is designed with scale in mind, and achieves it through:
2243

2344
PBG is not optimized for small graphs. If your graph has fewer than 100,000 nodes, consider using [KBC](https://github.com/facebookresearch/kbc) with the ComplEx model and N3 regularizer. KBC produces state-of-the-art embeddings for graphs that can fit on a single GPU. Compared to KBC, PyTorch-BigGraph enables learning on very large graphs whose embeddings wouldn't fit in a single GPU or a single machine, but may not produce high-quality embeddings for small graphs without careful tuning.
2445

25-
2646
## Requirements
2747

2848
PBG is written in Python (version 3.6 or later) and relies on [PyTorch](https://pytorch.org/) (at least version 1.0) and a few other libraries.
@@ -47,8 +67,6 @@ PBG_INSTALL_CPP=1 pip install .
4767
Everything will work identically except that you will be able to run GPU training (`torchbiggraph_train_gpu`).
4868

4969

50-
## Getting started
51-
5270
The results of [the paper](https://mlsys.org/Conferences/2019/doc/2019/71.pdf) can easily be reproduced by running the following command (which executes [this script](torchbiggraph/examples/fb15k.py)):
5371
```bash
5472
torchbiggraph_example_fb15k
@@ -57,6 +75,8 @@ This will download the Freebase 15k knowledge base dataset, put it into the righ
5775

5876
To learn how to use PBG, let us walk through what the FB15k script does.
5977

78+
## Getting started
79+
6080
### Downloading the data
6181

6282
First, it [retrieves the dataset](https://dl.fbaipublicfiles.com/starspace/fb15k.tgz) and unpacks it, obtaining a directory with three edge sets as TSV files, for training, validation and testing.
@@ -192,13 +212,19 @@ bar rhs complex_diagonal real 200 -2.350617170 0.529571176 0.521403074 ...
192212
bar rhs complex_diagonal imag 200 0.692483306 0.446569800 0.235914066 ...
193213
```
194214

215+
## Pre-trained embeddings
216+
217+
We trained a PBG model on the full [Wikidata](https://www.wikidata.org/) graph, using a [translation operator](https://torchbiggraph.readthedocs.io/en/latest/scoring.html#operators) to represent relations. It can be downloaded [here](https://dl.fbaipublicfiles.com/torchbiggraph/wikidata_translation_v1.tsv.gz) (36GiB, gzip-compressed). We used the truthy version of data from [here](https://dumps.wikimedia.org/wikidatawiki/entities/) to train our model. The model file is in TSV format as described in the above section. Note that the first line of the file contains the number of entities, the number of relations and the dimension of the embeddings, separated by tabs. The model contains 78 million entities, 4,131 relations and the dimension of the embeddings is 200.
218+
219+
195220
## Documentation
196221

197222
More information can be found in [the full documentation](https://torchbiggraph.readthedocs.io/).
198223

199-
## Pre-trained embeddings
224+
## Communication
200225

201-
We trained a PBG model on the full [Wikidata](https://www.wikidata.org/) graph, using a [translation operator](https://torchbiggraph.readthedocs.io/en/latest/scoring.html#operators) to represent relations. It can be downloaded [here](https://dl.fbaipublicfiles.com/torchbiggraph/wikidata_translation_v1.tsv.gz) (36GiB, gzip-compressed). We used the truthy version of data from [here](https://dumps.wikimedia.org/wikidatawiki/entities/) to train our model. The model file is in TSV format as described in the above section. Note that the first line of the file contains the number of entities, the number of relations and the dimension of the embeddings, separated by tabs. The model contains 78 million entities, 4,131 relations and the dimension of the embeddings is 200.
226+
- GitHub Issues: Bug reports, feature requests, install issues, etc.
227+
- The [PyTorch-BigGraph Slack](https://join.slack.com/t/pytorchbiggraph/shared_invite/zt-yxy7zl41-37ypKwOqLHhmMSac5XOh2w) is a forum for online discussion between developers and users, discussing features, collaboration, etc.
202228

203229
## Citation
204230

0 commit comments

Comments
 (0)