Skip to content

Implement PII data classification and automated redaction pipeline #668

Description

@Smartdevs17

Context


SubTrackr processes sensitive user data including names, email addresses, payment details, and usage metadata. Regulatory requirements (GDPR, CCPA, PCI-DSS) mandate that PII be identified, classified, and appropriately protected throughout the data pipeline—from ingestion to storage, processing, and analytics export.
\

\

Current Limitation

\

  • No systematic PII classification across data flows
    \
  • Analytics exports may contain unredacted PII fields
    \
  • Log aggregation can capture sensitive data in plaintext
    \
  • No automated redaction for API responses, webhook payloads, or audit logs
    \
  • GDPR data export/deletion workflows (Implement GDPR compliance tools with data export and deletion workflows #547) lack automated PII discovery
    \

\

Expected Outcome


A configurable PII classification engine with regex/ML-based pattern detection, automated redaction middleware for API responses and logs, and a data lineage audit trail showing what PII exists, where it flows, and how it is protected.
\

\

Acceptance Criteria

\

  • PII classification engine with configurable detection patterns (email, phone, SSN, crypto addresses, API keys)
    \
  • Automated redaction middleware for API response serialization
    \
  • Log redaction filter for structured logging output
    \
  • PII audit report generation showing classified fields per endpoint/module
    \
  • Data lineage tracking for PII fields through processing pipeline
    \
  • Configurable classification levels (strict, standard, permissive)
    \
  • Integration tests verifying redaction does not break API contracts
    \
  • Documentation for adding custom PII patterns and classification rules
    \

\

Technical Scope

\

  • Files: backend/services/shared/piiAudit.ts, backend/services/shared/auditService.ts, backend/services/shared/logging.ts, backend/services/shared/apiResponse.ts, backend/services/billing/, backend/services/analytics/
    \
  • APIs: Express/Fastify response middleware, structured logger (pino/winston), serialization layer
    \
  • Edge cases: False positives (e.g., "example@test.com" in test data), partial PII (last 4 digits), international PII formats, PII in nested JSON blobs, Unicode/normalization differences

Metadata

Metadata

Assignees

No one assigned

    Labels

    200-points200 point issueStellar WaveIssues in the Stellar wave programdrips-waveIssues in the Drips Wave programhighHigh complexity issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions