Skip to content

Commit 29e9aa7

Browse files
committed
sensitive-data
1 parent b0d8c1c commit 29e9aa7

3 files changed

Lines changed: 52 additions & 575 deletions

File tree

sensitive_data_extraction/README.md

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,21 @@
44

55
This agent leverages a Large Language Model (LLM) to autonomously explore and analyze file systems for sensitive data. It is designed to navigate through a given path, read the contents of various files, and identify information such as passwords, API keys, personal identifiable information (PII), and other confidential data. A key feature of this agent is ability to operate on a wide variety of storage systems, including local directories, cloud storage like AWS S3 and Google Cloud Storage, and even remote sources like GitHub repositories (via [fsspec](https://filesystem-spec.readthedocs.io/en/latest/)).
66

7-
## Intended Use
8-
9-
The Agent is used to perform a thorough search through fileshares and files, then reporting its findings in a structured format, which can then be used for remediation efforts.
107

118
## Environment
129

13-
The environment is simply a filesystem. The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories). For observability, the agent can be [connected to a Dreadnode server](https://docs.dreadnode.io/strikes/usage/config) to log detailed run information, metrics, and findings.
10+
The environment is simply a filesystem. The Agent must have the necessary credentials to access the target path specified by the user (e.g., AWS credentials configured for S3 access, or a GitHub token for private repositories).
1411

1512
## Tools
1613

1714
- `fsspec`: The underlying library that provides a unified Pythonic interface to various local and remote file systems. This is what enables the agent's versatility in accessing different storage backends like `s3://`, `gs://`, and `github://`.
1815

19-
## Features
20-
21-
- **Multi-Filesystem Support**: Can analyze files on local disks, AWS S3, Google Cloud Storage, GitHub repositories, and any other backend supported by fsspec.
22-
- **LLM-Powered Data Identification**: Employs a language model to intelligently parse file contents and identify a broad range of sensitive data types based on context.
23-
- **Structured Data Reporting**: Uses a dedicated report_sensitive_data tool that forces the LLM to report findings in a structured format, including the file path, location within the file, data type, the sensitive value itself, and a comment.
24-
- **Location-Aware Reportin**g: Can specify the location of findings differently based on the file type (line number for text, seconds for audio/video, or byte offset for binary files).
25-
- **Autonomous Exploration**: The agent can independently navigate the directory structure of the target path to ensure comprehensive coverage.
26-
- **Task Contro**l: Includes tools for the agent to explicitly complete_task with a summary or give_up if it gets stuck, providing better insight into its reasoning process.
2716

2817
## References
2918

3019
- [fsspec](https://github.com/fsspec/fsspec)
20+
21+
22+
## Examples
23+
24+
`uv run main.py --model "" --path ""`

0 commit comments

Comments
 (0)