Skip to content

Commit 7fd0f1f

Browse files
Add project configuration files and update README
1 parent a213ca1 commit 7fd0f1f

8 files changed

Lines changed: 201 additions & 33 deletions

File tree

.idea/.gitignore

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/VulnScan.iml

Lines changed: 14 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/inspectionProfiles/Project_Default.xml

Lines changed: 118 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/inspectionProfiles/profiles_settings.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/misc.xml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/modules.xml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/vcs.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
# VulnScan Documentation
22

3-
VulnScan is designed to detect sensitive data across various file formats.
4-
It offers a modular framework to train models using diverse algorithms,
5-
from traditional ML classifiers to advanced Neural Networks.
3+
VulnScan is designed to detect sensitive data across various file formats.
4+
It offers a modular framework to train models using diverse algorithms,
5+
from traditional ML classifiers to advanced Neural Networks.
66

77
This document outlines the system's naming conventions, lifecycle, and model configuration.
88

99
> [!NOTE]
1010
> Ported in update 3.5.0 of Logicytics - Latest update from there was 3.4.2
11-
>
11+
>
1212
> You can find the main repo and generated files [here](https://github.com/DefinetlyNotAI/Logicytics/tree/main/CODE/vulnScan)
1313
1414
> [!IMPORTANT]
1515
> Old documentation is available in the `Archived Models` directory of this [repository](https://github.com/DefinetlyNotAI/VulnScan_Data)
16-
>
16+
>
1717
> This documentation is covers test data, metrics and niche features.
1818
1919
---
@@ -24,16 +24,16 @@ This document outlines the system's naming conventions, lifecycle, and model con
2424
`Model {Type of model} .{Version}`
2525

2626
- **Type of Model**: Describes the training data configuration.
27-
- `Sense`: Sensitive data set with 50k files, each 50KB in size.
28-
- `SenseNano`: Test set with 5-10 files, each 5KB, used for error-checking.
29-
- `SenseMacro`: Large dataset with 1M files, each 10KB. This is computationally intensive, so some corners were cut in training.
30-
- `SenseMini`: Dataset with 10K files, each between 10-200KB. Balanced size for effective training and resource efficiency.
27+
- `Sense`: Sensitive data set with 50k files, each 50KB in size.
28+
- `SenseNano`: Test set with 5-10 files, each 5KB, used for error-checking.
29+
- `SenseMacro`: Large dataset with 1M files, each 10KB. This is computationally intensive, so some corners were cut in training.
30+
- `SenseMini`: Dataset with 10K files, each between 10-200KB. Balanced size for effective training and resource efficiency.
3131

3232
- **Version Format**: `{Version#}{c}{Repeat#}`
33-
- **Version#**: Increment for major code updates.
34-
- **c**: Model identifier (e.g., NeuralNetwork, BERT, etc.). See below for codes.
35-
- **Repeat#**: Number of times the same model was trained without significant code changes, used to improve consistency.
36-
- **-F**: Denotes a failed model or a corrupted model.
33+
- **Version#**: Increment for major code updates.
34+
- **c**: Model identifier (e.g., NeuralNetwork, BERT, etc.). See below for codes.
35+
- **Repeat#**: Number of times the same model was trained without significant code changes, used to improve consistency.
36+
- **-F**: Denotes a failed model or a corrupted model.
3737

3838
### Model Identifiers
3939

@@ -52,7 +52,7 @@ This document outlines the system's naming conventions, lifecycle, and model con
5252
| `x` | XGBoost |
5353

5454
### Example
55-
`Model Sense .1n2`:
55+
`Model Sense .1n2`:
5656
- Dataset: `Sense` (50k files, 50KB each).
5757
- Version: 1 (first major version).
5858
- Model: `NeuralNetwork` (`n`).
@@ -101,7 +101,7 @@ This document outlines the system's naming conventions, lifecycle, and model con
101101
---
102102

103103
## Preferred Model
104-
**NeuralNetwork (`n`)**
104+
**NeuralNetwork (`n`)**
105105
- Proven to be the most effective for detecting sensitive data in the project.
106106

107107
---
@@ -121,27 +121,27 @@ This document outlines the system's naming conventions, lifecycle, and model con
121121

122122
# More files
123123

124-
There is a repository that archived all the data used to make the model,
125-
as well as previously trained models for you to test out
126-
(loading scripts and vectorizers are not included).
124+
There is a repository that archived all the data used to make the model,
125+
as well as previously trained models for you to test out
126+
(loading scripts and vectorizers are not included).
127127

128128
The repository is located [here](https://github.com/DefinetlyNotAI/VulnScan_Data).
129129

130130
The repository contains the following directories:
131131
- `Archived Models`: Contains the previously trained models. Is organized by the model type then version.
132132
- `NN features`: Contains information about the model `.3n3` and the vectorizer used. Information include:
133-
- `Documentation_Study_Network.md`: A markdown file that contains more info.
134-
- `Neural Network Nodes Graph.gexf`: A Gephi file that contains the model nodes and edges.
135-
- `Nodes and edges (GEPHI).csv`: A CSV file that contains the model nodes and edges.
136-
- `Statistics`: Directories made by Gephi, containing the statistics of the model nodes and edges.
137-
- `Feature_Importance.svg`: A SVG file that contains the feature importance of the model.
138-
- `Loss_Landscape_3D.html`: A HTML file that contains the 3D loss landscape of the model.
139-
- `Model Accuracy Over Epochs.png` and `Model Loss Over Epochs.png`: PNG files that contain the model accuracy and loss over epochs.
140-
- `Model state dictionary.txt`: A text file that contains the model state dictionary.
141-
- `Model Summary.txt`: A text file that contains the model summary.
142-
- `Model Visualization.png`: A PNG file that contains the model visualization.
143-
- `Top_90_Features.svg`: A SVG file that contains the top 90 features of the model.
144-
- `Vectorizer features.txt`: A text file that contains the vectorizer features.
145-
- `Visualize Activation.png`: A PNG file that contains the visualization of the model activation.
146-
- `Visualize t-SNE.png`: A PNG file that contains the visualization of the model t-SNE.
147-
- `Weight Distribution.png`: A PNG file that contains the weight distribution of the model.
133+
- `Documentation_Study_Network.md`: A markdown file that contains more info.
134+
- `Neural Network Nodes Graph.gexf`: A Gephi file that contains the model nodes and edges.
135+
- `Nodes and edges (GEPHI).csv`: A CSV file that contains the model nodes and edges.
136+
- `Statistics`: Directories made by Gephi, containing the statistics of the model nodes and edges.
137+
- `Feature_Importance.svg`: A SVG file that contains the feature importance of the model.
138+
- `Loss_Landscape_3D.html`: A HTML file that contains the 3D loss landscape of the model.
139+
- `Model Accuracy Over Epochs.png` and `Model Loss Over Epochs.png`: PNG files that contain the model accuracy and loss over epochs.
140+
- `Model state dictionary.txt`: A text file that contains the model state dictionary.
141+
- `Model Summary.txt`: A text file that contains the model summary.
142+
- `Model Visualization.png`: A PNG file that contains the model visualization.
143+
- `Top_90_Features.svg`: A SVG file that contains the top 90 features of the model.
144+
- `Vectorizer features.txt`: A text file that contains the vectorizer features.
145+
- `Visualize Activation.png`: A PNG file that contains the visualization of the model activation.
146+
- `Visualize t-SNE.png`: A PNG file that contains the visualization of the model t-SNE.
147+
- `Weight Distribution.png`: A PNG file that contains the weight distribution of the model.

0 commit comments

Comments
 (0)