Date: 2025-12-22 Vision: Expose every delicate detail of a target with a single click Current State: 66 / 1036 credential types (6.4% coverage) Goal: Match or exceed TruffleHog's detection capabilities for web targets
The Problem: X-POSURE currently detects only 6.4% of credential types that TruffleHog can find. For the vision of "exposing everything about a target," this is unacceptable.
The Gap: 970 missing credential types (93.6%)
The Solution: Systematic expansion across 10 priority tiers to reach 800+ credential types within 3-6 months.
Strengths:
- Good cloud coverage: AWS (3), Azure (5), GCP (3), DigitalOcean (2), Alibaba (2)
- Excellent AI/ML coverage: OpenAI (4), Anthropic, Cohere, HuggingFace, Stability, Replicate, Mistral, Pinecone, Weaviate, WandB, LangSmith
- Solid VCS: GitHub (5), GitLab (3), Bitbucket (1)
- Good communication: Slack (4), Discord (2), Twilio (2), Telegram, SendGrid, Mailgun, Zendesk
- Basic payment: Stripe (3), PayPal, Square (2), Braintree
- Essential databases: MongoDB, Postgres, MySQL, Redis, Supabase, Firebase
Weaknesses:
- Only 1 monitoring tool (Zendesk - which is actually support, not monitoring)
- Only 1 CI/CD tool (Terraform)
- Zero secret management tools
- Zero container/orchestration tools
- Zero CMS/e-commerce beyond basic
- Zero security/compliance tools
- Minimal productivity/SaaS tools
TruffleHog has 1036 detector types covering:
- 100+ cloud/infrastructure providers
- 50+ AI/ML services
- 30+ VCS integrations
- 80+ communication platforms
- 40+ payment processors
- 60+ database types
- 50+ CI/CD tools
- 40+ monitoring/observability platforms
- 30+ authentication providers
- 100+ productivity/SaaS tools
- 30+ container/orchestration platforms
- 20+ secret management tools
- 50+ CMS/e-commerce platforms
- 30+ security/compliance tools
- 400+ niche/specialized APIs
We need to add ~900+ credential types, but not all are equal. Strategy:
- Tier 1 (Critical - Week 1): High-impact types commonly found in web apps
- Tier 2 (Essential - Week 2-3): Common enterprise/startup infrastructure
- Tier 3 (Important - Week 4-6): Additional cloud, SaaS, and dev tools
- Tier 4 (Useful - Week 7-9): Niche but valuable services
- Tier 5 (Long-tail - Week 10-12): Comprehensive coverage for "expose everything"
Target: +100 types → 166 total Effort: 40 hours Impact: CRITICAL - These are everywhere
- Cloudflare: API tokens, workers tokens, R2 keys, global API keys
- Vercel: API tokens, team tokens
- Netlify: API tokens, build hooks
- IBM Cloud: API keys, service credentials
- Oracle Cloud: API keys, auth tokens
- Render: API tokens
- Fly.io: API tokens, auth tokens
- Railway: API tokens
Why: These are the second-tier cloud providers commonly found in modern web apps.
- Google AI (Gemini): API keys
- xAI (Grok): API keys
- ElevenLabs: API keys
- DeepSeek: API keys
- Groq: API keys
- Together AI: API keys
- Perplexity: API keys
- Anyscale: API keys
- Langfuse: API keys
- Weights & Biases: (Already have wandb_api_key ✓)
Why: AI integration is exploding. Every startup uses these.
- Datadog: API keys, app keys
- New Relic: API keys, license keys, insights keys
- Sentry: DSN, auth tokens, org tokens
- Grafana: API keys, service account tokens
- Splunk: HEC tokens, auth tokens
- Honeycomb: API keys
- Better Stack: API tokens
- Loggly: API tokens
- LogzIO: API tokens
Why: EVERY production web app has monitoring. High-value targets.
- CircleCI: API tokens, project tokens
- TravisCI: API tokens
- Jenkins: API tokens, credentials
- BuildKite: API tokens
- DroneCI: API tokens
- GitHub Actions: secrets (composite detection)
- GitLab CI: tokens, registry tokens
- Azure Pipelines: PATs
- Pulumi: access tokens
- Docker Hub: access tokens, passwords
- Kubernetes: kubeconfig, service account tokens
- Nomad: ACL tokens
- Portainer: API keys
Why: CI/CD pipelines = keys to the kingdom. Always exposed in configs.
- Docker: registry tokens, hub tokens
- Kubernetes: kubeconfig, secrets
- Portainer: API tokens
- Rancher: API keys, tokens
- Nomad: ACL tokens
Why: Modern infra = containers. These are critical.
- HashiCorp Vault: tokens, root tokens
- Doppler: service tokens, personal tokens
- Infisical: API tokens
- 1Password: service account tokens
- AWS Secrets Manager: (detected via AWS keys)
Why: Ironically, secret managers get leaked too. High value.
- Notion: API tokens, integration tokens
- Airtable: personal tokens, OAuth tokens
- Jira: API tokens, personal tokens
- Asana: personal tokens, OAuth tokens
- Linear: API keys, OAuth tokens
- Monday.com: API tokens
- ClickUp: API tokens, personal tokens
- Confluence: API tokens
Why: Every startup uses these. Found in automation scripts.
- Checkout.com: API keys, secret keys
- Adyen: API keys
- Paddle: API keys, vendor ID
- LemonSqueezy: API keys
- Chargebee: API keys
Why: Payment = direct money access. Critical security risk.
Target: +150 types → 316 total Effort: 80 hours Impact: HIGH - Common in enterprise/scale-ups
- Microsoft Teams: webhooks, bot tokens
- Webex: bot tokens, access tokens
- RingCentral: API tokens
- MessageBird: API keys
- Postmark: API tokens
- Mailjet: API keys, secrets
- Customer.io: API keys
- Intercom: access tokens
- Drift: API tokens
- Freshdesk: API keys
- Auth0: client secrets, management tokens, M2M tokens
- Okta: API tokens, OAuth tokens
- OneLogin: API tokens
- Clerk: API keys
- WorkOS: API keys
- Supabase Auth: (already have supabase ✓)
- Shopify: API keys, access tokens, webhooks
- WooCommerce: API keys
- Contentful: content delivery keys, preview keys, management tokens
- Strapi: API tokens
- Sanity: API tokens
- WordPress: application passwords
- Ghost: admin API keys, content API keys
- Webflow: API tokens
- Squarespace: API keys
- BigCommerce: API tokens
- Cloudflare: (expand from tier 1 - workers KV, durable objects, etc.)
- Akamai: API tokens, EdgeGrid tokens
- Fastly: API tokens
- BunnyCDN: API keys
- PlanetScale: connection strings, tokens
- Aiven: API tokens, connection strings
- Couchbase: connection strings
- CockroachDB: connection strings
- NPM: tokens, auth tokens
- PyPI: tokens, API tokens
- RubyGems: API keys
- Packagist: tokens
- NuGet: API keys
- GitHub Packages: (covered by GitHub)
- JFrog Artifactory: API keys, tokens
- Sonatype Nexus: tokens
- Cloudsmith: API keys
- PackageCloud: tokens
- BrowserStack: access keys
- Sauce Labs: access keys
- Percy: tokens
- Chromatic: project tokens
- Playwright: (no API keys typically)
- Cypress: record keys
Target: +200 types → 516 total Effort: 120 hours Impact: MEDIUM-HIGH - Specialized but common
- Wiz: API tokens
- Snyk: API tokens, org tokens
- Qualys: API credentials
- Tenable: API keys, secret keys
- Detectify: API tokens
- Shodan: API keys
- Censys: API tokens
- VirusTotal: API keys
- AbuseIPDB: API keys
- SecurityTrails: API keys
- Elasticsearch: API keys, passwords
- InfluxDB: tokens
- TimescaleDB: connection strings
- Cassandra: credentials
- Neo4j: credentials
- RavenDB: API keys
- DynamoDB: (AWS keys)
- CosmosDB: (Azure keys)
- FaunaDB: secrets
- Mixpanel: project tokens, service accounts
- Amplitude: API keys, secret keys
- Segment: write keys, API tokens
- Google Analytics: (OAuth, measurement protocol)
- PostHog: personal API keys, project keys
- Heap: app ID (not secret)
- FullStory: API keys
- LogRocket: API tokens
- Snowplow: (various endpoints)
- Hugging Face: (already have ✓)
- Modal: API tokens
- RunPod: API keys
- Lambda Labs: API keys
- CoreWeave: API keys
- Stability AI: (already have ✓)
- Midjourney: (no public API)
- Dall-E: (OpenAI keys)
- HubSpot: API keys, private app tokens
- Salesforce: (already have OAuth ✓)
- Marketo: client ID/secret
- Pardot: API keys
- ActiveCampaign: API keys
- Mailchimp: API keys
- Constant Contact: API keys
- ConvertKit: API keys, secret keys
- Mux: signing keys, access tokens
- Cloudinary: API key/secret
- Imgix: API tokens
- Vimeo: access tokens
- YouTube: API keys (Google)
- Twitch: OAuth tokens, client IDs
- AWS S3: (covered by AWS ✓)
- Google Cloud Storage: (covered by GCP ✓)
- Azure Blob: (covered by Azure ✓)
- Backblaze B2: application keys
- Wasabi: access keys
- Spaces: (DigitalOcean, already have ✓)
Target: +200 types → 716 total Effort: 120 hours Impact: MEDIUM - Nice to have, fills gaps
- QuickBooks: OAuth tokens
- Xero: OAuth tokens
- FreshBooks: OAuth tokens
- Wave: API tokens
- Plaid: client ID/secret, access tokens
- Dwolla: API key/secret
- Yodlee: API keys
- BambooHR: API keys
- Greenhouse: API tokens
- Lever: API keys
- Workday: credentials
- ADP: API credentials
- Gusto: API tokens
- Twitter/X: API keys, bearer tokens, access tokens
- Facebook: access tokens, app secrets
- Instagram: access tokens
- LinkedIn: access tokens, client secrets
- TikTok: access tokens
- Reddit: client ID/secret
- Typeform: API tokens
- Google Forms: (OAuth)
- SurveyMonkey: access tokens
- Qualtrics: API tokens
- JotForm: API keys
- Formstack: API keys
- Calendly: API tokens, webhooks
- Cal.com: API keys
- Acuity: API keys
- Chili Piper: API keys
- Savvycal: API keys
- Terraform Cloud: (already have ✓)
- Ansible: vault passwords
- Puppet: API tokens
- Chef: API keys
- SaltStack: credentials
- Etherscan: API keys
- Infura: project IDs, secrets
- Alchemy: API keys
- Moralis: API keys
- Coinbase: API keys, secrets
- Binance: API keys, secrets
- Kraken: API keys
- SendBird: app ID, API tokens
- PubNub: publish/subscribe keys
- Pusher: app keys, secrets
- Ably: API keys
- Stream: API keys, secrets
Target: +300 types → 1000+ total Effort: 150 hours Impact: COMPREHENSIVE - "Expose everything"
This tier includes:
- 100+ niche SaaS APIs
- 50+ regional cloud providers
- 50+ legacy systems
- 50+ specialized industry tools
- 50+ misc services
Examples:
- Weather APIs (OpenWeather, WeatherStack, etc.)
- Geo APIs (Google Maps, Mapbox, etc.)
- SMS providers (Twilio covered, but 20+ others)
- Email verification services
- Image processing services
- Document generation services
- Translation services
- OCR services
- Voice/speech APIs
- IoT platforms
- Gaming platforms
- Travel/booking APIs
- Sports data APIs
- Financial data APIs
- AWS → STS GetCallerIdentity
- GitHub → /user API
- Slack → auth.test
- Stripe → /v1/account
- OpenAI → /v1/models
High-ROI verifications:
- GCP → OAuth2 tokeninfo endpoint
- Azure → Microsoft Graph /me
- DigitalOcean → /v2/account
- Cloudflare → /user
- Vercel → /v2/user
- Netlify → /api/v1/user
- Datadog → /api/v1/validate
- Sentry → /api/0/
- GitLab → /api/v4/user
- Bitbucket → /2.0/user
- Docker Hub → /v2/user
- SendGrid → /v3/scopes
- Twilio → API validation
- MongoDB Atlas → connection test
- Supabase → project info API
- Notion → /v1/users/me
- Airtable → /v0/meta/whoami
- Jira → /rest/api/3/myself
- Linear → GraphQL whoami
- HubSpot → /integrations/v1/me
- CircleCI → /me
- Terraform Cloud → /account/details
- New Relic → API key validation
- Grafana → /api/org
- Anthropic → /v1/models (similar to OpenAI) 26-40... (more services)
Focus on payment processors, cloud providers, AI services
For services without specific APIs:
- Generic GET endpoint check
- Generic POST endpoint check
- Response status code analysis
- Error message pattern matching
Goal: 70+ active verifiers by end of expansion
Day 1-2: Infrastructure prep
- Fix
pyjsparserdependency issue - Create rule template generator
- Setup bulk testing framework
Day 3-7: Add Tier 1 types
- Cloud giants (15 types) → 2 days
- AI/ML (12 types) → 1 day
- Monitoring (15 types) → 2 days
- CI/CD (20 types) → 2 days
- Containers (10 types) → 1 day
- Secret mgmt (10 types) → 1 day
- SaaS (15 types) → 1 day
- Payment (8 types) → 0.5 day
Deliverable: 166 total types, +15 verifiers
Focus: Enterprise/common services
- Communication (25 types)
- Authentication (20 types)
- CMS/E-commerce (30 types)
- Cloud services (30 types)
- Dev tools (25 types)
- Testing (20 types)
Deliverable: 316 total types, +25 verifiers
Focus: Specialized but valuable
- Security (40 types)
- Databases (30 types)
- Analytics (35 types)
- AI/ML (25 types)
- Marketing (35 types)
- Media (20 types)
- Storage/CDN (15 types)
Deliverable: 516 total types, +30 verifiers
Focus: Fill category gaps
- Finance (30 types)
- HR (25 types)
- Social (25 types)
- Forms (20 types)
- Scheduling (15 types)
- Infrastructure (30 types)
- Blockchain (30 types)
- Messaging (25 types)
Deliverable: 716 total types, +0 verifiers (focus on detection)
Focus: Long-tail coverage
- 300+ niche APIs across all categories
- Polish existing detectors
- Optimize performance
- Documentation
Deliverable: 1000+ total types, final polish
Current structure:
xposure/rules/
├── cloud.yaml (15 rules → expand to 80)
├── ai.yaml (16 rules → expand to 60)
├── vcs.yaml (10 rules → expand to 30)
├── communication.yaml (13 rules → expand to 50)
├── payment.yaml (7 rules → expand to 30)
├── database.yaml (6 rules → expand to 40)
Proposed structure:
xposure/rules/
├── cloud/
│ ├── aws.yaml
│ ├── azure.yaml
│ ├── gcp.yaml
│ ├── cloudflare.yaml
│ ├── vercel.yaml
│ ├── netlify.yaml
│ └── ... (20 files, 150 rules)
├── ai/
│ ├── openai.yaml
│ ├── anthropic.yaml
│ ├── google_ai.yaml
│ └── ... (15 files, 60 rules)
├── devops/
│ ├── cicd.yaml
│ ├── containers.yaml
│ ├── monitoring.yaml
│ ├── secrets.yaml
│ └── ... (10 files, 100 rules)
├── saas/
│ ├── productivity.yaml
│ ├── crm.yaml
│ ├── analytics.yaml
│ └── ... (20 files, 200 rules)
└── ... (total: 100 files, 1000+ rules)
rules:
- id: service_credential_type
name: Service Name Credential Type
type: service_credential_type
severity: critical|high|medium|low|info
# Detection pattern
pattern: 'regex_pattern_here'
capture_group: 1 # optional
# Pairing (if applicable)
pair_with: other_credential_type
# Context requirements
context_required: true|false
context_patterns:
- 'context_pattern_1'
- 'context_pattern_2'
# Exclusions
exclude_patterns:
- 'false_positive_pattern_1'
- 'false_positive_pattern_2'
# Verification
verifier: service_verifier_name # or null
# Metadata
metadata:
provider: service_name
service: specific_service
category: cloud|ai|devops|saas|etc
docs: https://docs.service.com/api-keys
common_locations:
- .env
- config.json
- source code
# Remediation
remediation: |
1. Immediately revoke the exposed credential
2. Generate new credentials
3. Update all services using the old credential
4. Audit access logs for unauthorized use
5. Implement proper secret management1. Rule Generator (scripts/generate_rule.py)
# Interactive CLI to generate new rules
# Inputs: service name, credential pattern, docs URL
# Output: YAML rule file2. Bulk Importer (scripts/import_trufflehog_rules.py)
# Parse TruffleHog detector definitions
# Convert to X-POSURE YAML format
# Requires manual review but saves 80% of time3. Rule Validator (scripts/validate_rules.py)
# Validates all YAML files
# Checks for:
# - Valid regex patterns
# - No duplicate IDs
# - Valid severity levels
# - Proper metadata4. Test Generator (scripts/generate_tests.py)
# Auto-generates test cases from rules
# Creates positive and negative examplesclass GenericHTTPVerifier(BaseVerifier):
"""Generic HTTP-based verification for APIs."""
def __init__(self, config: dict):
self.endpoint = config['endpoint']
self.method = config.get('method', 'GET')
self.auth_header = config.get('auth_header', 'Authorization')
self.auth_prefix = config.get('auth_prefix', 'Bearer')
self.success_codes = config.get('success_codes', [200, 201])
self.success_patterns = config.get('success_patterns', [])
self.error_patterns = config.get('error_patterns', [])
async def verify(self, credential: str) -> VerificationResult:
# Generic HTTP verification logic
passConfiguration file:
# xposure/verify/configs/services.yaml
services:
notion:
endpoint: https://api.notion.com/v1/users/me
method: GET
auth_header: Authorization
auth_prefix: Bearer
success_codes: [200]
success_patterns: ['\"object\":\"user\"']
error_patterns: ['unauthorized', 'invalid_token']
airtable:
endpoint: https://api.airtable.com/v0/meta/whoami
method: GET
auth_header: Authorization
auth_prefix: Bearer
success_codes: [200]
success_patterns: ['\"id\":\"usr']
# ... 100+ more servicesThis allows adding new verifiers without writing Python code.
- Async HTTP requests
- Concurrent processing
- Rate limiting per provider
1. Rule Matching Optimization
- Current: Loop through all rules for each content chunk
- Proposed: Trie-based prefix matching for 10x speedup
- Estimated gain: 200ms → 20ms per content scan
2. Caching Layer
- Cache compiled regex patterns
- Cache verification results (1 hour TTL)
- Cache decoded content
3. Smart Sampling
- Don't scan every single JS file (too slow with 1000 rules)
- Prioritize: configs > env files > source maps > main JS > vendor JS
- Sample large files instead of scanning entirely
4. Parallel Processing
- Current: Sequential rule matching
- Proposed: Batch process with multiprocessing for CPU-bound regex
- Estimated gain: 4x speedup on multi-core systems
5. Progressive Disclosure
- Stream results as found (already implemented ✓)
- Show high-confidence findings first
- Background verification of medium-confidence findings
# Test each rule individually
pytest tests/rules/test_cloud.py
pytest tests/rules/test_ai.py
# ... etc# Test end-to-end with synthetic data
pytest tests/integration/test_full_scan.py# Performance benchmarks
pytest tests/benchmarks/test_rule_matching_speed.py
pytest tests/benchmarks/test_verification_speed.py# Test against known false positive datasets
pytest tests/fp/test_false_positives.py# Test against known vulnerable sites (with permission)
python -m xposure testsite.example.com --benchmark- True Positive Rate: >95%
- False Positive Rate: <5%
- Coverage: 1000+ credential types
- Detection Speed: <1s per file
- Verification Accuracy: >98%
- Verification Coverage: 70+ services (7% of types)
- Verification Speed: <500ms per credential
- Error Handling: Graceful degradation on API failures
- Full Domain Scan: <5 minutes for typical site
- Memory Usage: <500MB for typical scan
- Network Efficiency: <1000 requests per scan
- Result Quality: <10% unverified findings
Risk 1: Regex Performance Degradation
- Mitigation: Pre-compile all patterns, use atomic grouping, timeout protection
- Backup: Fall back to simple string matching for complex patterns
Risk 2: False Positive Explosion
- Mitigation: Mandatory testing for each new rule, exclusion patterns, entropy filtering
- Backup: Confidence scoring helps users filter noise
Risk 3: Verification Rate Limiting
- Mitigation: Exponential backoff, respect rate limits, batch verifications
- Backup: Queue verification for later, allow manual retry
Risk 4: Maintenance Burden
- Mitigation: Automated rule updates from TruffleHog, community contributions
- Backup: Focus on high-value detectors first, deprecate low-value ones
Risk 5: API Changes Breaking Verifiers
- Mitigation: Version pinning, graceful degradation, automated testing
- Backup: Fall back to passive verification
✅ 300+ credential types ✅ 40+ active verifiers ✅ <5% false positive rate ✅ All critical services covered
✅ 500+ credential types ✅ 60+ active verifiers ✅ Automated rule generation ✅ Comprehensive testing
✅ 700+ credential types ✅ 70+ active verifiers ✅ Performance optimized ✅ Documentation complete
✅ 1000+ credential types ✅ 70+ active verifiers ✅ Benchmark published ✅ Community adoption
- Total Effort: ~600 hours
- Timeline: 12 weeks
- Team Size: 1-2 developers
- Breakdown:
- Week 1-3: 120h (foundation)
- Week 4-6: 120h (expansion)
- Week 7-9: 120h (completion)
- Week 10-12: 150h (excellence)
- Testing/QA: 90h (ongoing)
- Development: Local machine
- Testing: Cloud sandbox accounts for verification testing
- CI/CD: GitHub Actions for automated testing
- Documentation: GitHub wiki or docs site
- TruffleHog detector repository (reference)
- Provider API documentation
- Secret scanning test datasets
- Community contributions
The Path Forward:
X-POSURE can achieve its vision of "exposing everything about a target with a single click" by systematically expanding from 66 → 1000+ credential types over 12 weeks.
Key Success Factors:
- Systematic approach: Tiered expansion prioritizing high-impact types
- Automation: Template generators, bulk importers, automated testing
- Quality: Mandatory testing, false positive filtering, verification
- Performance: Optimized matching, caching, parallel processing
- Sustainability: Clear documentation, community contribution path
Bottom Line:
- Current: 6.4% coverage, limited utility
- After 3 weeks: 30% coverage, production-ready for most targets
- After 6 weeks: 50% coverage, comprehensive for common services
- After 12 weeks: 95%+ coverage, truly "expose everything"
This is achievable, measurable, and will make X-POSURE the definitive tool for web-based credential discovery.
- Approve this plan ✓
- Fix
pyjsparserdependency (blocker) - Create rule generator script (automation)
- Begin Tier 1 additions (week 1)
- Setup automated testing (quality)
- Start weekly progress tracking (accountability)
Let's make X-POSURE truly expose everything. 🔥