Skip to content

Commit 1d0789d

Browse files
committed
added optimizations and language support
1 parent 96740dc commit 1d0789d

16 files changed

Lines changed: 563 additions & 349 deletions
100 KB
Loading
103 KB
Loading
103 KB
Loading

README.md

Lines changed: 102 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@
1313

1414
</p>
1515

16+
## GraphSense
17+
GraphSense is a framework that can be used to easily train and use code suggestion models with minimal data preprocessing and resource consumption. No transformers are used and underlying algorithm used was Node2Vec. FAISS used as the vector index and RocksDB used to store code line to index and index to code line mappings.
18+
19+
GraphSense is highly optimized for performance and efficiency.
1620

1721
### Requirements
1822

@@ -28,45 +32,118 @@ pip install graphsense
2832
### Training example:
2933

3034
```
31-
from graphsense import GraphSense
32-
33-
g = GraphSense()
34-
35-
g.line_completion(input_path="code_files", output_path="output")
35+
from graphsense import GraphTrain
3636
37+
g = GraphTrain()
38+
# train the model
39+
g.line_completion(directory_path="code_files", language="Python")
3740
```
3841

3942
### Inference example:
4043

4144
```
42-
from graphsense import GraphSense
45+
from graphsense import GraphInfer
4346
44-
g = GraphSense()
47+
g = GraphInfer()
4548
46-
g.load_model("output/graph_embeddings.model")
47-
next = g.infer("def factorial(n):")
49+
g.load_artifacts() # load the artifacts to memory
50+
suggestions = g.infer("def factorial(n):")
51+
g.unload_artifacts() # clean memory
4852
49-
print("next item predicted: ", next)
53+
print("top 10 suggestions: ", suggestions)
5054
```
5155

56+
### Architecture
57+
58+
#### Training Architecture
59+
60+
![architecture](graphsense_training_architecture.png)
61+
62+
#### Inference Architecture
63+
64+
![architecture](graphsense_inference_architecture.png)
65+
5266
### Performance Comparison with gpt2_medium finetuned model
5367
Dataset used to train models: https://github.com/TheAlgorithms/Python
54-
#### gpt2_medium finetuned model
68+
69+
#### gpt2-medium model (Fine-tuned on Python Algorithms dataset)
5570
```
56-
input: def factorial(n)
57-
output: return 1 if n == 1 else n * factorial(n - 1)
58-
model size: 1.44GB
59-
avg inference time: 10.3302 seconds
60-
CPU Usage: 8.3%
61-
Memory Usage: 68.54 MB
71+
artifacts size: 1.44 GB
72+
avg inference time (CPU): 8 seconds
73+
avg inference time (GPU): 2.2662 seconds
74+
avg memory usage: 1800 MB
6275
```
6376

64-
#### graph embedding model
65-
```
66-
input: def factorial(n)
67-
output: return 1 if n == 1 else n * factorial(n - 1)
68-
model size: 13.2MB
69-
avg inference time: 2.1870 seconds
70-
CPU Usage: 0.2%
71-
Memory Usage: 4.54 MB
77+
#### GraphSense (trained on Python Algorithms dataset)
78+
```
79+
artifacts size: 13.9 MB
80+
avg inference time (CPU): 0.0079 seconds
81+
avg memory usage: 277.8194 MB
7282
```
83+
84+
### Performance and Scalability
85+
86+
#### Accuracy of GraphSense (vector size: 128)
87+
| Dataset | Top-1 Accuracy | Top-3 Accuracy | Top-10 Accuracy |
88+
|-----------------------|----------------|----------------|-----------------|
89+
| TheAlgorithms(Python) | 0.4718 | 0.8012 | 0.8958 |
90+
91+
92+
#### Scalability of GraphSense (CPU) (vector size: 128)
93+
```
94+
vocabulary = 100,000
95+
average memory usage: 273.777 MB
96+
average execution time: 0.0113 seconds
97+
artifacts size: 61.3 MB
98+
99+
vocabulary = 200,000
100+
average memory usage: 325.8949 MB
101+
average execution time: 0.0155 seconds
102+
artifacts size: 122 MB
103+
104+
vocabulary = 300,000
105+
average memory usage: 377.1085 MB
106+
average execution time: 0.0185 seconds
107+
artifacts size: 168 MB
108+
109+
vocabulary = 400,000
110+
average memory usage: 428.3011 MB
111+
average execution time: 0.0227 seconds
112+
artifacts size: 224 MB
113+
114+
vocabulary = 500,000
115+
average memory usage: 478.8532 MB
116+
average execution time: 0.0273 seconds
117+
artifacts size: 280 MB
118+
119+
vocabulary = 600,000
120+
average memory usage: 531.0189 MB
121+
average execution time: 0.0301 seconds
122+
artifacts size: 368 MB
123+
124+
vocabulary = 700,000
125+
average memory usage: 581.3494 MB
126+
average execution time: 0.0333 seconds
127+
artifacts size: 429 MB
128+
129+
vocabulary = 800,000
130+
average memory usage: 633.226 MB
131+
average execution time: 0.038 seconds
132+
artifacts size: 448 MB
133+
134+
vocabulary = 900,000
135+
average memory usage: 685.1932 MB
136+
average execution time: 0.0439 seconds
137+
artifacts size: 552 MB
138+
139+
vocabulary = 1,000,000
140+
average memory usage: 734.5819 MB
141+
average execution time: 0.0444 seconds
142+
artifacts size: 561 MB
143+
```
144+
145+
#### Linear Scaling
146+
147+
![scaling](Artifacts_Size_vs_Vocabulary_Size.png)
148+
![scaling](Memory_Usage_vs_Vocabulary_Size.png)
149+
![scaling](Inference_Time_vs_Vocabulary_Size.png)

examples/infer.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
from graphsense import GraphSense
1+
from graphsense import GraphInfer
22

3-
g = GraphSense()
3+
g = GraphInfer()
44

5-
g.load_model("output/graph_embeddings.model")
6-
next = g.infer("def factorial(n):")
5+
g.load_artifacts() # load the artifacts to memory
6+
suggestions = g.infer("def factorial(n):")
7+
g.unload_artifacts() # clean memory
78

8-
print("next item predicted: ", next)
9+
print("top 10 suggestions: ", suggestions)

examples/output/edges.csv

Lines changed: 0 additions & 133 deletions
This file was deleted.
-87.7 KB
Binary file not shown.

examples/train.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
from graphsense import GraphSense
1+
from graphsense import GraphTrain
22

3-
g = GraphSense()
4-
5-
g.line_completion(input_path="code_files", output_path="output")
3+
g = GraphTrain()
4+
# train the model
5+
g.line_completion(directory_path="code_files", language="Python")

graphsense/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
from .graphsense import(
2-
GraphSense
2+
GraphTrain,
3+
GraphInfer
34
)

0 commit comments

Comments
 (0)