|
4 | 4 |
|
5 | 5 | This agent leverages a Large Language Model (LLM) to autonomously explore and analyze file systems for sensitive data. It is designed to navigate through a given path, read the contents of various files, and identify information such as passwords, API keys, personal identifiable information (PII), and other confidential data. A key feature of this agent is ability to operate on a wide variety of storage systems, including local directories, cloud storage like AWS S3 and Google Cloud Storage, and even remote sources like GitHub repositories (via [fsspec](https://filesystem-spec.readthedocs.io/en/latest/)). |
6 | 6 |
|
7 | | -## Intended Use |
8 | | - |
9 | | -The Agent is used to perform a thorough search through fileshares and files, then reporting its findings in a structured format, which can then be used for remediation efforts. |
10 | 7 |
|
11 | 8 | ## Environment |
12 | 9 |
|
13 | | -The environment is simply a filesystem. The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories). For observability, the agent can be [connected to a Dreadnode server](https://docs.dreadnode.io/strikes/usage/config) to log detailed run information, metrics, and findings. |
| 10 | +The environment is simply a filesystem. The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories). |
14 | 11 |
|
15 | 12 | ## Tools |
16 | 13 |
|
17 | 14 | - `fsspec`: The underlying library that provides a unified Pythonic interface to various local and remote file systems. This is what enables the agent's versatility in accessing different storage backends like `s3://`, `gs://`, and `github://`. |
18 | 15 |
|
19 | | -## Features |
20 | | - |
21 | | -- **Multi-Filesystem Support**: Can analyze files on local disks, AWS S3, Google Cloud Storage, GitHub repositories, and any other backend supported by fsspec. |
22 | | -- **LLM-Powered Data Identification**: Employs a language model to intelligently parse file contents and identify a broad range of sensitive data types based on context. |
23 | | -- **Structured Data Reporting**: Uses a dedicated report_sensitive_data tool that forces the LLM to report findings in a structured format, including the file path, location within the file, data type, the sensitive value itself, and a comment. |
24 | | -- **Location-Aware Reportin**g: Can specify the location of findings differently based on the file type (line number for text, seconds for audio/video, or byte offset for binary files). |
25 | | -- **Autonomous Exploration**: The agent can independently navigate the directory structure of the target path to ensure comprehensive coverage. |
26 | | -- **Task Contro**l: Includes tools for the agent to explicitly complete_task with a summary or give_up if it gets stuck, providing better insight into its reasoning process. |
27 | 16 |
|
28 | 17 | ## References |
29 | 18 |
|
30 | 19 | - [fsspec](https://github.com/fsspec/fsspec) |
| 20 | + |
| 21 | + |
| 22 | +## Examples |
| 23 | + |
| 24 | +`uv run main.py --model "" --path ""` |
0 commit comments