Graph Database Manager

API Documentation

The APIs can be accessed and tested via the swagger in http://localhost:8000/docs/

Setting Up

Install dependencies by pip install -r requirements.txt
Run Neo4j and update .env file
Run python add_data.py to add the extracted scientific knowledge graph into the graph database (takes quite a long time)
Run python cluster_and_drop.py to drop some semantic/syntactic duplications
Run python gen_vocab.py to generate a replication of vocaburary from the graph database
Run python app.py to serve the endpoints (also generate a set of embedding vectors from the previous step if not exist)

Initial Dataset

The following files contain many essential field using for constructing knowledge graph. You can modify the dataset and script to add more information to the graph.

1. data/csv/kaggle-arxiv-cscl-2020-12-18.csv

Metadata of arxiv dataset retreived from Cornell-University/arxiv filtering only Computation and Language (CL) category.

2. data/pickle/kaggle_arxiv_cite_ref.pickle

Citations and references for each publication in the arxiv cs.CL dataset

3. data/pickle/kaggle_arxiv_cleaned.pickle

The combination of retreived metadata from Cornell-University/arxiv and additional essential fields

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph Database Manager

API Documentation

Setting Up

Initial Dataset

1. data/csv/kaggle-arxiv-cscl-2020-12-18.csv

2. data/pickle/kaggle_arxiv_cite_ref.pickle

3. data/pickle/kaggle_arxiv_cleaned.pickle

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Graph Database Manager

API Documentation

Setting Up

Initial Dataset

1. data/csv/kaggle-arxiv-cscl-2020-12-18.csv

2. data/pickle/kaggle_arxiv_cite_ref.pickle

3. data/pickle/kaggle_arxiv_cleaned.pickle