You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 14, 2024. It is now read-only.
Summary:
## Types of changes
- [x] Docs change / refactoring / dependency upgrade
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Motivation and Context / Related issue
Cleans up the README and adds a link to our new Slack workspace for devs/users.
## Checklist
- [x] The documentation is up-to-date with the changes I made.
- [x] I have read the **CONTRIBUTING** document and completed the CLA (see **CONTRIBUTING**).
- [x] All tests passed, and additional code has been covered with new tests.
Pull Request resolved: #244
Reviewed By: lw
Differential Revision: D33638675
Pulled By: adamlerer
fbshipit-source-id: 77da759c7c673c4730f5f4b0a3e5bf7befbda2c2
Copy file name to clipboardExpand all lines: README.md
+32-6Lines changed: 32 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,10 +4,31 @@
4
4
5
5
PyTorch-BigGraph (PBG) is a distributed system for learning graph embeddings for large graphs, particularly big web interaction graphs with up to billions of entities and trillions of edges.
6
6
7
-
**Update:***PBG now supports GPU training. Check out the [GPU Training](#gpu-training) section below!*
8
7
9
8
PBG was introduced in the [PyTorch-BigGraph: A Large-scale Graph Embedding Framework](https://mlsys.org/Conferences/2019/doc/2019/71.pdf) paper, presented at the [SysML conference](https://mlsys.org/) in 2019.
10
9
10
+
**Update:***PBG now supports GPU training. Check out the [GPU Training](#gpu-training) section below!*
PBG trains on an input graph by ingesting its list of edges, each identified by its source and target entities and, possibly, a relation type. It outputs a feature vector (embedding) for each entity, trying to place adjacent entities close to each other in the vector space, while pushing unconnected entities apart. Therefore, entities that have a similar distribution of neighbors will end up being nearby.
12
33
13
34
It is possible to configure each relation type to calculate this "proximity score" in a different way, with the parameters (if any) learned during training. This allows the same underlying entity embeddings to be shared among multiple relation types.
@@ -22,7 +43,6 @@ PBG is designed with scale in mind, and achieves it through:
22
43
23
44
PBG is not optimized for small graphs. If your graph has fewer than 100,000 nodes, consider using [KBC](https://github.com/facebookresearch/kbc) with the ComplEx model and N3 regularizer. KBC produces state-of-the-art embeddings for graphs that can fit on a single GPU. Compared to KBC, PyTorch-BigGraph enables learning on very large graphs whose embeddings wouldn't fit in a single GPU or a single machine, but may not produce high-quality embeddings for small graphs without careful tuning.
24
45
25
-
26
46
## Requirements
27
47
28
48
PBG is written in Python (version 3.6 or later) and relies on [PyTorch](https://pytorch.org/) (at least version 1.0) and a few other libraries.
@@ -47,8 +67,6 @@ PBG_INSTALL_CPP=1 pip install .
47
67
Everything will work identically except that you will be able to run GPU training (`torchbiggraph_train_gpu`).
48
68
49
69
50
-
## Getting started
51
-
52
70
The results of [the paper](https://mlsys.org/Conferences/2019/doc/2019/71.pdf) can easily be reproduced by running the following command (which executes [this script](torchbiggraph/examples/fb15k.py)):
53
71
```bash
54
72
torchbiggraph_example_fb15k
@@ -57,6 +75,8 @@ This will download the Freebase 15k knowledge base dataset, put it into the righ
57
75
58
76
To learn how to use PBG, let us walk through what the FB15k script does.
59
77
78
+
## Getting started
79
+
60
80
### Downloading the data
61
81
62
82
First, it [retrieves the dataset](https://dl.fbaipublicfiles.com/starspace/fb15k.tgz) and unpacks it, obtaining a directory with three edge sets as TSV files, for training, validation and testing.
@@ -192,13 +212,19 @@ bar rhs complex_diagonal real 200 -2.350617170 0.529571176 0.521403074 ...
192
212
bar rhs complex_diagonal imag 200 0.692483306 0.446569800 0.235914066 ...
193
213
```
194
214
215
+
## Pre-trained embeddings
216
+
217
+
We trained a PBG model on the full [Wikidata](https://www.wikidata.org/) graph, using a [translation operator](https://torchbiggraph.readthedocs.io/en/latest/scoring.html#operators) to represent relations. It can be downloaded [here](https://dl.fbaipublicfiles.com/torchbiggraph/wikidata_translation_v1.tsv.gz) (36GiB, gzip-compressed). We used the truthy version of data from [here](https://dumps.wikimedia.org/wikidatawiki/entities/) to train our model. The model file is in TSV format as described in the above section. Note that the first line of the file contains the number of entities, the number of relations and the dimension of the embeddings, separated by tabs. The model contains 78 million entities, 4,131 relations and the dimension of the embeddings is 200.
218
+
219
+
195
220
## Documentation
196
221
197
222
More information can be found in [the full documentation](https://torchbiggraph.readthedocs.io/).
198
223
199
-
## Pre-trained embeddings
224
+
## Communication
200
225
201
-
We trained a PBG model on the full [Wikidata](https://www.wikidata.org/) graph, using a [translation operator](https://torchbiggraph.readthedocs.io/en/latest/scoring.html#operators) to represent relations. It can be downloaded [here](https://dl.fbaipublicfiles.com/torchbiggraph/wikidata_translation_v1.tsv.gz) (36GiB, gzip-compressed). We used the truthy version of data from [here](https://dumps.wikimedia.org/wikidatawiki/entities/) to train our model. The model file is in TSV format as described in the above section. Note that the first line of the file contains the number of entities, the number of relations and the dimension of the embeddings, separated by tabs. The model contains 78 million entities, 4,131 relations and the dimension of the embeddings is 200.
226
+
- GitHub Issues: Bug reports, feature requests, install issues, etc.
227
+
- The [PyTorch-BigGraph Slack](https://join.slack.com/t/pytorchbiggraph/shared_invite/zt-yxy7zl41-37ypKwOqLHhmMSac5XOh2w) is a forum for online discussion between developers and users, discussing features, collaboration, etc.
0 commit comments