|
2 | 2 |
|
3 | 3 | ## Description |
4 | 4 |
|
5 | | -This agent leverages a Large Language Model (LLM) to autonomously explore and analyze file systems for sensitive data. It is designed to navigate through a given path, read the contents of various files, and identify information such as passwords, API keys, personal identifiable information (PII), and other confidential data. A key feature of this agent is its use of the fsspec library, allowing it to operate on a wide variety of storage systems, including local directories, cloud storage like AWS S3 and Google Cloud Storage, and even remote sources like GitHub repositories. |
| 5 | +This agent leverages a Large Language Model (LLM) to autonomously explore and analyze file systems for sensitive data. It is designed to navigate through a given path, read the contents of various files, and identify information such as passwords, API keys, personal identifiable information (PII), and other confidential data. A key feature of this agent is ability to operate on a wide variety of storage systems, including local directories, cloud storage like AWS S3 and Google Cloud Storage, and even remote sources like GitHub repositories (via [fsspec](https://filesystem-spec.readthedocs.io/en/latest/)). |
6 | 6 |
|
7 | 7 | ## Intended Use |
8 | 8 |
|
9 | | -The Agent is used for performing a thorough search through fileshares and files, then reporting its findings in a structured format, which can then be used for remediation efforts. |
| 9 | +The Agent is used to perform a thorough search through fileshares and files, then reporting its findings in a structured format, which can then be used for remediation efforts. |
10 | 10 |
|
11 | 11 | ## Environment |
12 | 12 |
|
13 | | -The environment is simply a filesystem. The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories). For observability, the agent can be connected to a Dreadnode server to log detailed run information, metrics, and findings. |
| 13 | +The environment is simply a filesystem. The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories). For observability, the agent can be [connected to a Dreadnode server](https://docs.dreadnode.io/strikes/usage/config) to log detailed run information, metrics, and findings. |
14 | 14 |
|
15 | 15 | ## Tools |
16 | 16 |
|
17 | | -- `fsspec`: The underlying library that provides a unified Pythonic interface to various local and remote file systems. This is what enables the agent's versatility in accessing different storage backends like s3://, gs://, and github://. |
| 17 | +- `fsspec`: The underlying library that provides a unified Pythonic interface to various local and remote file systems. This is what enables the agent's versatility in accessing different storage backends like `s3://`, `gs://`, and `github://`. |
18 | 18 |
|
19 | 19 | ## Features |
20 | 20 |
|
21 | | -- Multi-Filesystem Support: Can analyze files on local disks, AWS S3, Google Cloud Storage, GitHub repositories, and any other backend supported by fsspec. |
22 | | -- LLM-Powered Data Identification: Employs a language model to intelligently parse file contents and identify a broad range of sensitive data types based on context. |
23 | | -- Structured Data Reporting: Uses a dedicated report_sensitive_data tool that forces the LLM to report findings in a structured format, including the file path, location within the file, data type, the sensitive value itself, and a comment. |
24 | | -- Location-Aware Reporting: Can specify the location of findings differently based on the file type (line number for text, seconds for audio/video, or byte offset for binary files). |
25 | | -- Autonomous Exploration: The agent can independently navigate the directory structure of the target path to ensure comprehensive coverage. |
26 | | -- Task Control: Includes tools for the agent to explicitly complete_task with a summary or give_up if it gets stuck, providing better insight into its reasoning process. |
| 21 | +- **Multi-Filesystem Support**: Can analyze files on local disks, AWS S3, Google Cloud Storage, GitHub repositories, and any other backend supported by fsspec. |
| 22 | +- **LLM-Powered Data Identification**: Employs a language model to intelligently parse file contents and identify a broad range of sensitive data types based on context. |
| 23 | +- **Structured Data Reporting**: Uses a dedicated report_sensitive_data tool that forces the LLM to report findings in a structured format, including the file path, location within the file, data type, the sensitive value itself, and a comment. |
| 24 | +- **Location-Aware Reportin**g: Can specify the location of findings differently based on the file type (line number for text, seconds for audio/video, or byte offset for binary files). |
| 25 | +- **Autonomous Exploration**: The agent can independently navigate the directory structure of the target path to ensure comprehensive coverage. |
| 26 | +- **Task Contro**l: Includes tools for the agent to explicitly complete_task with a summary or give_up if it gets stuck, providing better insight into its reasoning process. |
27 | 27 |
|
28 | 28 | ## References |
29 | 29 |
|
|
0 commit comments