NavodPeiris
diff --git a/‎Artifacts_Size_vs_Vocabulary_Size.png‎
100 KB b/‎Artifacts_Size_vs_Vocabulary_Size.png‎
100 KB
diff --git a/‎Inference_Time_vs_Vocabulary_Size.png‎
103 KB b/‎Inference_Time_vs_Vocabulary_Size.png‎
103 KB
diff --git a/‎Memory_Usage_vs_Vocabulary_Size.png‎
103 KB b/‎Memory_Usage_vs_Vocabulary_Size.png‎
103 KB
diff --git a/‎README.md‎
Lines changed: 102 additions & 25 deletions b/‎README.md‎
Lines changed: 102 additions & 25 deletions
diff --git a/‎examples/infer.py‎
Lines changed: 6 additions & 5 deletions b/‎examples/infer.py‎
Lines changed: 6 additions & 5 deletions
diff --git a/‎examples/output/edges.csv‎
Lines changed: 0 additions & 133 deletions b/‎examples/output/edges.csv‎
Lines changed: 0 additions & 133 deletions
diff --git a/‎examples/output/graph_embeddings.model‎
-87.7 KB b/‎examples/output/graph_embeddings.model‎
-87.7 KB
diff --git a/‎examples/train.py‎
Lines changed: 4 additions & 4 deletions b/‎examples/train.py‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎graphsense/__init__.py‎
Lines changed: 2 additions & 1 deletion b/‎graphsense/__init__.py‎
Lines changed: 2 additions & 1 deletion
@@ -13,6 +13,10 @@
 
 </p>
 
+## GraphSense
+GraphSense is a framework that can be used to easily train and use code suggestion models with minimal data preprocessing and resource consumption. No transformers are used and underlying algorithm used was Node2Vec. FAISS used as the vector index and RocksDB used to store code line to index and index to code line mappings.
+
+GraphSense is highly optimized for performance and efficiency.
 
 ### Requirements
 
@@ -28,45 +32,118 @@ pip install graphsense
 ### Training example:
 
 ```
-from graphsense import GraphSense
-
-g = GraphSense()
-
-g.line_completion(input_path="code_files", output_path="output")
+from graphsense import GraphTrain
 
+g = GraphTrain()
+# train the model
+g.line_completion(directory_path="code_files", language="Python")
 ```
 
 ### Inference example:
 
 ```
-from graphsense import GraphSense
+from graphsense import GraphInfer
 
-g = GraphSense()
+g = GraphInfer()
 
-g.load_model("output/graph_embeddings.model")
-next = g.infer("def factorial(n):")
+g.load_artifacts()  # load the artifacts to memory
+suggestions = g.infer("def factorial(n):")
+g.unload_artifacts()  # clean memory
 
-print("next item predicted: ", next)
+print("top 10 suggestions: ", suggestions)
 ```
 
+### Architecture
+
+#### Training Architecture
+
+![architecture](graphsense_training_architecture.png) 
+
+#### Inference Architecture
+
+![architecture](graphsense_inference_architecture.png) 
+
 ### Performance Comparison with gpt2_medium finetuned model
 Dataset used to train models: https://github.com/TheAlgorithms/Python 
-#### gpt2_medium finetuned model
+
+#### gpt2-medium model (Fine-tuned on Python Algorithms dataset)
 ```
-input: def factorial(n)  
-output: return 1 if n == 1 else n * factorial(n - 1)   
-model size: 1.44GB   
-avg inference time: 10.3302 seconds  
-CPU Usage: 8.3%  
-Memory Usage: 68.54 MB  
+artifacts size: 1.44 GB   
+avg inference time (CPU): 8 seconds 
+avg inference time (GPU): 2.2662 seconds
+avg memory usage: 1800 MB 
 ```
 
-#### graph embedding model 
-```
-input: def factorial(n)  
-output: return 1 if n == 1 else n * factorial(n - 1)   
-model size: 13.2MB
-avg inference time: 2.1870 seconds  
-CPU Usage: 0.2%
-Memory Usage: 4.54 MB  
+#### GraphSense (trained on Python Algorithms dataset)
+```  
+artifacts size: 13.9 MB
+avg inference time (CPU): 0.0079 seconds 
+avg memory usage: 277.8194 MB 
 ``` 
+
+### Performance and Scalability
+
+#### Accuracy of GraphSense (vector size: 128)
+| Dataset               | Top-1 Accuracy | Top-3 Accuracy | Top-10 Accuracy |
+|-----------------------|----------------|----------------|-----------------|
+| TheAlgorithms(Python) | 0.4718         | 0.8012         | 0.8958          |
+
+
+#### Scalability of GraphSense (CPU) (vector size: 128)
+```
+vocabulary = 100,000
+average memory usage: 273.777 MB
+average execution time: 0.0113 seconds
+artifacts size: 61.3 MB
+
+vocabulary = 200,000
+average memory usage: 325.8949 MB
+average execution time: 0.0155 seconds
+artifacts size: 122 MB
+
+vocabulary = 300,000
+average memory usage: 377.1085 MB
+average execution time: 0.0185 seconds
+artifacts size: 168 MB
+
+vocabulary = 400,000
+average memory usage: 428.3011 MB
+average execution time: 0.0227 seconds
+artifacts size: 224 MB
+
+vocabulary = 500,000
+average memory usage: 478.8532 MB
+average execution time: 0.0273 seconds
+artifacts size: 280 MB
+
+vocabulary = 600,000
+average memory usage: 531.0189 MB
+average execution time: 0.0301 seconds
+artifacts size: 368 MB
+
+vocabulary = 700,000
+average memory usage: 581.3494 MB
+average execution time: 0.0333 seconds
+artifacts size: 429 MB
+
+vocabulary = 800,000
+average memory usage: 633.226 MB
+average execution time: 0.038 seconds
+artifacts size: 448 MB
+
+vocabulary = 900,000
+average memory usage: 685.1932 MB
+average execution time: 0.0439 seconds
+artifacts size: 552 MB
+
+vocabulary = 1,000,000
+average memory usage: 734.5819 MB
+average execution time: 0.0444 seconds
+artifacts size: 561 MB
+```
+
+#### Linear Scaling
+
+![scaling](Artifacts_Size_vs_Vocabulary_Size.png)  
+![scaling](Memory_Usage_vs_Vocabulary_Size.png)  
+![scaling](Inference_Time_vs_Vocabulary_Size.png)  
@@ -1,8 +1,9 @@
-from graphsense import GraphSense
+from graphsense import GraphInfer
 
-g = GraphSense()
+g = GraphInfer()
 
-g.load_model("output/graph_embeddings.model")
-next = g.infer("def factorial(n):")
+g.load_artifacts()  # load the artifacts to memory
+suggestions = g.infer("def factorial(n):")
+g.unload_artifacts()  # clean memory
 
-print("next item predicted: ", next)
+print("top 10 suggestions: ", suggestions)
@@ -1,5 +1,5 @@
-from graphsense import GraphSense
+from graphsense import GraphTrain
 
-g = GraphSense()
-
-g.line_completion(input_path="code_files", output_path="output")
+g = GraphTrain()
+# train the model
+g.line_completion(directory_path="code_files", language="Python")
@@ -1,3 +1,4 @@
 from .graphsense import(
-    GraphSense
+    GraphTrain,
+    GraphInfer
 )
Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,4 @@`
`1`	`1`	`from .graphsense import(`
`2`		`- GraphSense`
	`2`	`+ GraphTrain,`
	`3`	`+ GraphInfer`
`3`	`4`	`)`