|
| 1 | +# System-Wide Masking: Advanced Secure Approach (Community Contribution Guide) |
| 2 | + |
| 3 | +## Current State |
| 4 | + |
| 5 | +The current system-wide masking uses **Accessibility overlay** (`AXObserver` + `NSPanel`) — a visual-only approach: |
| 6 | +- ✅ Key not visible on screen |
| 7 | +- ❌ `⌘C` copy still captures the original key |
| 8 | +- ❌ Screen recording software may capture the original beneath the overlay |
| 9 | + |
| 10 | +## More Secure Direction: ScreenCaptureKit + Virtual Camera |
| 11 | + |
| 12 | +The most secure approach intercepts at the **frame output layer**, ensuring the original key **never appears in any output**. |
| 13 | + |
| 14 | +### Architecture |
| 15 | + |
| 16 | +``` |
| 17 | +[macOS Screen] |
| 18 | + ↓ ScreenCaptureKit |
| 19 | +[Capture frame (CGImage/IOSurface)] |
| 20 | + ↓ |
| 21 | +[Detect API key positions] ← Reuse existing pattern matching |
| 22 | + ↓ |
| 23 | +[Draw masking blocks on frame] |
| 24 | + ↓ |
| 25 | +[Output to Virtual Camera / Screen Share] |
| 26 | +``` |
| 27 | + |
| 28 | +### Use Case Comparison |
| 29 | + |
| 30 | +| Scenario | Overlay Approach | ScreenCaptureKit Approach | |
| 31 | +|----------|-----------------|--------------------------| |
| 32 | +| OBS livestream | ⚠️ Overlay may not be in capture | ✅ Output already masked | |
| 33 | +| Google Meet screen share | ⚠️ Depends on window level | ✅ Virtual camera output masked | |
| 34 | +| Screen recording | ⚠️ Same issue | ✅ Recording captures masked version | |
| 35 | +| Copy/Paste | ❌ Original key still copyable | ❌ Same (not in frame layer) | |
| 36 | + |
| 37 | +### Technical Components |
| 38 | + |
| 39 | +#### 1. ScreenCaptureKit Capture |
| 40 | + |
| 41 | +```swift |
| 42 | +import ScreenCaptureKit |
| 43 | + |
| 44 | +let content = try await SCShareableContent.current |
| 45 | +let display = content.displays.first! |
| 46 | +let filter = SCContentFilter(display: display, excludingWindows: []) |
| 47 | + |
| 48 | +let config = SCStreamConfiguration() |
| 49 | +config.width = display.width |
| 50 | +config.height = display.height |
| 51 | +config.pixelFormat = kCVPixelFormatType_32BGRA |
| 52 | + |
| 53 | +let stream = SCStream(filter: filter, configuration: config, delegate: self) |
| 54 | +try stream.addStreamOutput(self, type: .screen, sampleHandlerQueue: .global()) |
| 55 | +try await stream.startCapture() |
| 56 | +``` |
| 57 | + |
| 58 | +#### 2. Key Position Detection |
| 59 | + |
| 60 | +**Option A: Vision Framework OCR** |
| 61 | +```swift |
| 62 | +import Vision |
| 63 | + |
| 64 | +func detectKeys(in image: CGImage) -> [(String, CGRect)] { |
| 65 | + let request = VNRecognizeTextRequest { request, error in |
| 66 | + guard let results = request.results as? [VNRecognizedTextObservation] else { return } |
| 67 | + for observation in results { |
| 68 | + let text = observation.topCandidates(1).first?.string ?? "" |
| 69 | + let matches = maskingCoordinator.shouldMask(text: text) |
| 70 | + // observation.boundingBox → screen coordinates |
| 71 | + } |
| 72 | + } |
| 73 | + request.recognitionLevel = .fast // Real-time needs fast mode |
| 74 | +} |
| 75 | +``` |
| 76 | + |
| 77 | +**Option B: AX API Coordinates (Recommended — No OCR needed)** |
| 78 | + |
| 79 | +Reuse existing `SystemMaskingService`'s `AXBoundsForRange` coordinates to draw masking blocks directly on captured frames. Much faster than OCR. |
| 80 | + |
| 81 | +```swift |
| 82 | +let keyRects = systemMaskingService.getActiveOverlayRects() |
| 83 | +let context = CGContext(data: ..., width: ..., height: ...) |
| 84 | +for rect in keyRects { |
| 85 | + context.setFillColor(CGColor.white) |
| 86 | + context.fill(rect) |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +#### 3. Virtual Camera Output |
| 91 | + |
| 92 | +Use [CoreMediaIO DAL Plugin](https://developer.apple.com/documentation/coremediaio) or [OBS Virtual Camera](https://obsproject.com/). |
| 93 | + |
| 94 | +Third-party options: |
| 95 | +- [mac-virtual-camera](https://github.com/pjb/mac-virtual-camera) — Swift implementation |
| 96 | +- OBS Studio Virtual Camera API |
| 97 | + |
| 98 | +### Lessons Learned (From Overlay Implementation) |
| 99 | + |
| 100 | +1. **AXBoundsForRange is precise** — returns exact multi-line text screen coordinates, no estimation needed |
| 101 | +2. **Coordinate conversion** — AX uses top-left origin, AppKit/CG uses bottom-left: `appKitY = primaryScreenHeight - axY - height` |
| 102 | +3. **Multi-monitor** — use primary screen height as conversion reference |
| 103 | +4. **Performance baseline** — pattern matching: 0.1-0.3ms, AX query + overlay: 2-6ms. ScreenCaptureKit bottleneck would be OCR (if used) |
| 104 | +5. **30ms debounce is sufficient** — human flicker perception threshold is ~50ms |
| 105 | +6. **Immediate scan on app switch** — don't wait for debounce, or key flashes for 500ms+ |
| 106 | +7. **Unique overlay IDs** — same keyId with multiple occurrences needs distinct IDs (keyId + match index) |
| 107 | +8. **Reuse NSHostingView** — creating new views on every update causes accumulation |
| 108 | + |
| 109 | +### Suggested Implementation Order |
| 110 | + |
| 111 | +1. **OBS Plugin first** (easiest) — write an OBS Source Plugin that gets key coordinates from DemoSafe Core via IPC, draw masking on OBS scene |
| 112 | +2. **Virtual Camera next** (moderate) — ScreenCaptureKit capture + AX coordinates + DAL Plugin output |
| 113 | +3. **OCR last** (complex) — Vision framework OCR as fallback when AX API coordinates unavailable |
| 114 | + |
| 115 | +### Reusable Files |
| 116 | + |
| 117 | +| File | Reusable Content | |
| 118 | +|------|-----------------| |
| 119 | +| `Services/Accessibility/SystemMaskingService.swift` | AXObserver setup, focused element scanning, AXBoundsForRange | |
| 120 | +| `Services/Masking/MaskingCoordinator.swift` | Pattern matching engine (`shouldMask()`) | |
| 121 | +| `Views/Overlay/SystemOverlayController.swift` | Coordinate conversion (`convertAXToAppKit`) | |
0 commit comments