TensorFlow implementation of a novel open-source Seq2SeqRegression API for performing a wide range of automatic feature extraction tasks outside of NLP. This general purpose Sequence-to-Sequence Regression model can predict a sequence of multidimensional vectors based on previous observations. The system of study being analyzed here is the Plouffe Graph, a graph by Canadian mathematician Simon Plouffe in 1974-1979. More information about the Plouffe Graph can be found here: Times Tables, Mandelbrot and the Heart of Mathematics.
The Plouffe dataset is already included. A dataset of multidimensional vectors that represent the Plouffe Graph gets constructed during training. The dataset can be configured easily in the plouffe.yml file inside the configs folder.
An IPython Notebook of the Seq2Seq Regression model can be found inside the notebooks folder. This notebook serves to complement the paper and walks you through the computational graph. It also provides a background of the Plouffe Graph dataset.
In order to see the interactive graphics of the Seq2Seq Regression model's predictions, you will need to download this pre-trained model at the Google Drive link,
https://drive.google.com/open?id=0B86gEeQqfnjtMERTV2tjLWMwNnc
Create a logs directory in the root of the Seq2Seq_PlouffeRainbows folder.
After downloading, you need to move/copy the lr0002 folder that was downloaded from the Google Drive link into the logs folder.
cd notebooks
jupyter notebookNote: The iopoub rate limits are too low by default, for this visualization heavy project. To fix this, you can launch the IPython notebook the following way:
jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000The program requires the following dependencies (easy to install using pip, Anaconda or Docker):
- python 2.7
- tensorflow API (tested with r1.0.0)
- numpy
- scipy
- pandas
- matplotlib
- jupyter
- networkx
- tqdm
- pyyaml
- jupyterthemes
- seaborn
To install DLFractalSequences in an Anaconda environment:
conda env create -f environment.ymlTo activate Anaconda environment:
source activate dlfractals-envTrain Seq2Seq Regression model on the local machine using the Plouffe dataset:
python train.py -c configs/plouffe.ymlNote: The training inputs (i.e. dataset parameters, hyperparameters etc.) for training on a local machine can be modified in the plouffe.yml inside the configs folder.
Prerequisites: Docker installed on your machine. If you don't have docker installed already, then go here to Docker Setup
To build Docker image:
docker build -t dlfractals:latest .To deploy and train on Docker container:
docker run -it dlfractals:latest python train.py -c configs/plouffe.ymlThe Shared Hierarchical Academic Research Computing Network (SHARCNET) is used when you want to run multiple jobs.
Activate Tensorflow Python2.7 environment:
source /opt/sharcnet/testing/tensorflow/tensorflow-cp27-activeNote: If there is anything missing, then do:
pip install <missing_pkg> --userExample:
pip install /opt/sharcnet/testing/tensorflow/tensorflow-1.0.0-cp27-cp27m-linux_x86_64.whl --userTrain multiple jobs using the Seq2Seq Regression model on the Plouffe dataset:
python train_manyjobs.py -c configs/plouffe_sharcnet.ymlNote: The training inputs (i.e. dataset parameters, hyperparameters etc.) for training on a sharcnet machine can be modified in the plouffe.yml inside the configs folder. You must specify train option inside the YAML config file to be either copper or local when training on sharcnet.
-
Perform futher analysis on the Plouffe Graph. We particularly want to analyze how arithmetic in embedding space corresponds to the group arithmetic in input space, and establish strong baselines in relation to that.
-
Add libraries that allow more experimentation with attention and external memory.
-
Explore more datasets (i.e. video sequences) which would leverage the automatic feature extraction functionality of the Seq2Seq Regression model.
