Skip to content

Commit ec360f4

Browse files
copyleftdevclaude
andcommitted
docs: update mdBook and repo metadata for stealth features
Updates repo description with stealth capabilities (BoringSSL, JA3/JA4, 17-patch CDP stealth, 288 tests, 15 crates) and sets homepage URL to GitHub Pages. mdBook updates: - crate-reference/fetch.md: complete rewrite covering wreq, FetchConfig.emulation, BrowserFetchConfig.stealth, WebdriverValue enum, 17 stealth patches table, BrowserProfile/ProfileMode, stealth regression test results (Rebrowser 10/10, Sannysoft 55/56) - security/browser-sandbox.md: added CDP stealth mode section with Chrome launch hardening, 17 patches table, determinism guarantee, and verified detection test results - appendix/glossary.md: new Anti-Detection section with JA3, JA4, BoringSSL, Akamai h2, CDP stealth, BrowserProfile, ProfileMode, WebdriverValue definitions Rebuilt docs/book/ from updated sources. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 04f1aab commit ec360f4

53 files changed

Lines changed: 526 additions & 116 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

book/src/appendix/glossary.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,3 +73,21 @@
7373
**UniverseGenerator** — The trait for generating deterministic responses. Implementations: LinkMaze, EncodingHell, MalformedDom, RedirectLabyrinth, ContentTrap, TemporalDrift.
7474

7575
**Adversarial Universe** — A simulation universe designed to stress a specific aspect of the crawl kernel (encoding, DOM parsing, redirects, spider traps, temporal changes).
76+
77+
## Anti-Detection
78+
79+
**JA3** — TLS fingerprinting method that hashes five fields from the ClientHello: TLS version, cipher suites, extensions, supported groups, EC point formats. Legacy but still deployed by WAFs.
80+
81+
**JA4** — Current TLS fingerprinting standard (FoxIO). Sorts before hashing to defeat extension randomization. Three sections: header, sorted cipher hash, sorted extension hash.
82+
83+
**BoringSSL** — Google's fork of OpenSSL used by Chrome. Palimpsest uses it (via `wreq`) for full ClientHello control, enabling browser-grade TLS impersonation.
84+
85+
**Akamai h2 Fingerprint** — Passive HTTP/2 fingerprint capturing SETTINGS frame values/order, WINDOW_UPDATE, PRIORITY frames, and pseudo-header ordering. Distinguishes browsers from automation clients.
86+
87+
**CDP Stealth Mode** — Anti-detection suite for headless Chrome. 17 evasion patches covering `navigator.webdriver`, `window.chrome`, plugins, WebGL, canvas noise, AudioContext noise, and more.
88+
89+
**BrowserProfile** — A unified, internally consistent browser identity tying TLS fingerprint + HTTP/2 settings + HTTP headers + JS surface into a single profile. Prevents cross-layer detection mismatches.
90+
91+
**ProfileMode** — Controls how browser profiles are selected: `None` (default), `Fixed` (one profile), `Seeded` (deterministic from CrawlSeed), `RotatePerDomain` (per-domain via BLAKE3).
92+
93+
**WebdriverValue** — Explicit config for `navigator.webdriver` in stealth mode. `False` (matches real Chrome, default) or `Undefined` (property deleted). Auditable, not hidden.

book/src/crate-reference/fetch.md

Lines changed: 101 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# palimpsest-fetch
22

3-
HTTP client + browser capture (CDP) + link extraction + robots.txt parsing. Every fetch wraps an `ExecutionEnvelope`.
3+
HTTP client + browser capture (CDP) + link extraction + robots.txt parsing + TLS/HTTP2 fingerprint impersonation + CDP stealth mode. Every fetch wraps an `ExecutionEnvelope`.
44

55
## HttpFetcher
66

@@ -12,17 +12,22 @@ impl HttpFetcher {
1212
}
1313
```
1414

15+
Uses **wreq** (BoringSSL backend) instead of reqwest. When an emulation profile is set, the TLS ClientHello and HTTP/2 SETTINGS frame match the selected browser.
16+
1517
## FetchConfig
1618

1719
```rust
1820
pub struct FetchConfig {
19-
pub connect_timeout: Duration, // Default: 30s
20-
pub total_timeout: Duration, // Default: 120s
21-
pub max_body_size: u64, // Default: 256 MiB
22-
pub max_redirects: usize, // Default: 10
21+
pub connect_timeout: Duration, // Default: 30s
22+
pub total_timeout: Duration, // Default: 120s
23+
pub max_body_size: u64, // Default: 256 MiB
24+
pub max_redirects: usize, // Default: 10
25+
pub emulation: Option<wreq_util::Emulation>, // Default: None
2326
}
2427
```
2528

29+
When `emulation` is set (e.g., `Emulation::Chrome133`), wreq impersonates the selected browser's TLS fingerprint (JA3/JA4 including post-quantum key shares) and HTTP/2 settings (SETTINGS frame values/order, WINDOW_UPDATE, pseudo-header ordering). 70+ browser profiles available: Chrome 100-137, Firefox 109-139, Safari 15-18.5, Edge, Opera.
30+
2631
## BrowserFetcher
2732

2833
```rust
@@ -33,7 +38,81 @@ impl BrowserFetcher {
3338
}
3439
```
3540

36-
Launches headless Chrome via CDP. Injects JS determinism overrides. Captures DOM snapshot, sub-resources via Network events, and resource dependency graph.
41+
Launches headless Chrome via CDP. Injects determinism overrides and (optionally) 17 stealth evasion patches. Captures DOM snapshot, sub-resources via Network events, and resource dependency graph.
42+
43+
## BrowserFetchConfig
44+
45+
```rust
46+
pub struct BrowserFetchConfig {
47+
pub page_timeout: Duration, // Default: 30s
48+
pub viewport_width: u32, // Default: 1920
49+
pub viewport_height: u32, // Default: 1080
50+
pub js_enabled: bool, // Default: true
51+
pub user_agent: String, // Default: "PalimpsestBot/0.1"
52+
pub stealth: bool, // Default: false
53+
pub webdriver_value: WebdriverValue, // Default: False
54+
}
55+
```
56+
57+
## WebdriverValue
58+
59+
```rust
60+
pub enum WebdriverValue {
61+
False, // Matches real non-automated Chrome (default)
62+
Undefined, // Property appears deleted
63+
}
64+
```
65+
66+
Explicit, auditable config choice for `navigator.webdriver`. Default `False` passes Rebrowser Bot Detector (10/10).
67+
68+
## CDP Stealth Mode
69+
70+
When `stealth: true`, the browser fetcher applies:
71+
72+
**Chrome launch hardening:**
73+
- `--disable-blink-features=AutomationControlled`
74+
- `--disable-component-extensions-with-background-pages`
75+
76+
**17 stealth evasion patches** (injected via `addScriptToEvaluateOnNewDocument`):
77+
78+
| # | Patch | What It Does |
79+
|---|---|---|
80+
| 1 | `navigator.webdriver` | Set to `false` (configurable via `WebdriverValue`) |
81+
| 2 | `window.chrome` | Full object mock (app, csi, loadTimes, runtime) |
82+
| 3 | `navigator.plugins` | Chrome PDF Plugin, Chrome PDF Viewer, Native Client |
83+
| 4 | `navigator.mimeTypes` | application/pdf, application/x-nacl |
84+
| 5 | `navigator.permissions` | Fix Notification state inconsistency |
85+
| 6 | `navigator.languages` | `["en-US", "en"]` |
86+
| 7 | `navigator.hardwareConcurrency` | 8 |
87+
| 8 | `navigator.deviceMemory` | 8 |
88+
| 9 | WebGL vendor/renderer | Intel UHD Graphics 630 |
89+
| 10 | Canvas fingerprint | Seeded sub-pixel noise (CrawlSeed) |
90+
| 11 | Window dimensions | outerWidth/outerHeight match viewport + chrome UI |
91+
| 12 | Screen dimensions | width/height/availWidth/availHeight/colorDepth |
92+
| 13 | AudioContext | Seeded oscillator noise (CrawlSeed) |
93+
| 14 | ClientRect | Seeded sub-pixel noise (CrawlSeed) |
94+
| 15 | sourceURL markers | Strip pptr/playwright stack traces |
95+
| 16 | `navigator.userAgent` | Consistent with HTTP header |
96+
| 17 | `navigator.maxTouchPoints` | 0 |
97+
98+
All noise patches use deterministic xorshift PRNGs seeded from `CrawlSeed` (Law 1).
99+
100+
## Browser Emulation Profiles
101+
102+
```rust
103+
pub struct BrowserProfile { /* unified TLS + HTTP/2 + headers + JS identity */ }
104+
105+
pub enum ProfileMode {
106+
None, // No impersonation (default)
107+
Fixed(BrowserProfile), // Same profile for all requests
108+
Seeded, // Generate from CrawlSeed
109+
RotatePerDomain, // Per-domain via BLAKE3(seed + domain)
110+
}
111+
```
112+
113+
Pre-built profiles: `BrowserProfile::chrome_windows()`, `firefox_linux()`, `safari_macos()`.
114+
115+
See [profile module](../crate-reference/fetch.md) for details.
37116

38117
## BrowserFetchResult
39118

@@ -56,14 +135,10 @@ pub fn normalize_url_for_comparison(url: &Url) -> String;
56135

57136
`extract_links` strips `<script>` and `<style>` content before scanning for `href` and `src` attributes. Output is deduplicated and sorted for determinism.
58137

59-
`normalize_url` removes fragments, strips default ports (80/443), and sorts query parameters.
60-
61138
## Robots.txt
62139

63140
```rust
64-
pub struct RobotsRules {
65-
pub crawl_delay: Option<Duration>,
66-
}
141+
pub struct RobotsRules { pub crawl_delay: Option<Duration> }
67142

68143
impl RobotsRules {
69144
pub fn parse(body: &str) -> Self; // RFC 9309 compliant
@@ -72,6 +147,20 @@ impl RobotsRules {
72147

73148
Per-origin caching in `BTreeMap` (deterministic).
74149

150+
## Stealth Regression Tests
151+
152+
5 integration tests against live public detection sites:
153+
154+
| Site | Score | Key Checks |
155+
|---|---|---|
156+
| Rebrowser Bot Detector | **10/10** | CDP leak, webdriver, viewport, user-agent |
157+
| Sannysoft | **55/56** | webdriver, chrome, plugins, WebGL, canvas, permissions |
158+
| FingerprintJS BotD | **Clean** | 18 detectors, no bot verdict |
159+
| CreepJS | **Clean** | Headless rating, stealth rating, lie detection |
160+
| Infosimples | **Skipped** | Site timeout |
161+
162+
Run with: `cargo test -p palimpsest-fetch --test stealth_test -- --ignored --nocapture --test-threads=1`
163+
75164
## Key Invariant
76165

77-
Every fetch receives an `ExecutionEnvelope`. The envelope seals the context before the network request begins, enabling replay and verification.
166+
Every fetch receives an `ExecutionEnvelope`. The envelope seals the context before the network request begins, enabling replay and verification. Emulation profile and stealth config are deterministic inputs (Law 1).

book/src/security/browser-sandbox.md

Lines changed: 50 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,7 @@ Headless Chrome runs in a sandboxed process with strict isolation:
1010

1111
## Timeout Enforcement
1212

13-
Every page load has a hard timeout (default: 60 seconds for browser mode, 30 seconds for page load). If the page does not complete loading within the timeout, the browser process is killed.
14-
15-
```rust
16-
pub struct BrowserFetchConfig {
17-
pub page_timeout: Duration, // Default: 30s
18-
pub viewport_width: u32, // Default: 1920
19-
pub viewport_height: u32, // Default: 1080
20-
pub js_enabled: bool, // Default: true
21-
pub user_agent: String,
22-
}
23-
```
13+
Every page load has a hard timeout (default: 30 seconds). If the page does not complete loading within the timeout, the browser process is killed.
2414

2515
## Determinism Overrides
2616

@@ -39,6 +29,55 @@ performance.now = function() { return (__perf_offset += 0.1); };
3929

4030
This prevents JavaScript on the page from introducing non-determinism. Same seed = same execution.
4131

32+
## CDP Stealth Mode
33+
34+
When `stealth: true` is set on `BrowserFetchConfig`, a comprehensive anti-detection suite is applied on top of the determinism overrides.
35+
36+
### Chrome Launch Hardening
37+
38+
```
39+
--disable-blink-features=AutomationControlled
40+
--disable-component-extensions-with-background-pages
41+
--no-first-run
42+
--no-default-browser-check
43+
```
44+
45+
### 17 Stealth Evasion Patches
46+
47+
All patches injected via `Page.addScriptToEvaluateOnNewDocument` before navigation:
48+
49+
| Patch | Purpose |
50+
|---|---|
51+
| navigator.webdriver | Set to `false` (configurable via `WebdriverValue` enum) |
52+
| window.chrome | Full Chrome object mock (app, csi, loadTimes, runtime) |
53+
| navigator.plugins | 3 plugins matching real Chrome |
54+
| navigator.mimeTypes | PDF + NaCl mime types |
55+
| navigator.permissions | Fix Notification permission inconsistency |
56+
| navigator.languages | `["en-US", "en"]` |
57+
| navigator.hardwareConcurrency | 8 cores |
58+
| navigator.deviceMemory | 8 GB |
59+
| WebGL vendor/renderer | Intel UHD Graphics 630 |
60+
| Canvas fingerprint | Seeded sub-pixel noise |
61+
| Window/screen dimensions | Match viewport + chrome UI offset |
62+
| AudioContext | Seeded oscillator noise |
63+
| ClientRect | Seeded sub-pixel noise |
64+
| sourceURL markers | Strip automation stack traces |
65+
| navigator.userAgent | Consistent with HTTP User-Agent header |
66+
| navigator.maxTouchPoints | 0 |
67+
68+
### Determinism Guarantee
69+
70+
All noise patches (canvas, audio, ClientRect) use deterministic xorshift PRNGs with sub-seeds derived from `CrawlSeed`. Same seed = same noise = same fingerprint. This is Law 1 compliant.
71+
72+
### Verified Results
73+
74+
Tested against 5 public bot detection sites:
75+
76+
- **Rebrowser Bot Detector**: 10/10 pass
77+
- **Sannysoft**: 55/56 pass (only PluginArray prototype)
78+
- **FingerprintJS BotD**: No bot verdict
79+
- **CreepJS**: No hard failures
80+
4281
## Sub-Resource Capture
4382

4483
Chrome DevTools Protocol (CDP) network event listeners capture all sub-resources:

docs/book/404.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
const path_to_root = "";
3838
const default_light_theme = "light";
3939
const default_dark_theme = "coal";
40-
window.path_to_searchindex_js = "searchindex-dc01b304.js";
40+
window.path_to_searchindex_js = "searchindex-4f269067.js";
4141
</script>
4242
<!-- Start loading toc.js asap -->
4343
<script src="toc-7980dfd2.js"></script>

docs/book/appendix/api-reference.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
const path_to_root = "../";
3737
const default_light_theme = "light";
3838
const default_dark_theme = "coal";
39-
window.path_to_searchindex_js = "../searchindex-dc01b304.js";
39+
window.path_to_searchindex_js = "../searchindex-4f269067.js";
4040
</script>
4141
<!-- Start loading toc.js asap -->
4242
<script src="../toc-7980dfd2.js"></script>

docs/book/appendix/error-taxonomy.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
const path_to_root = "../";
3737
const default_light_theme = "light";
3838
const default_dark_theme = "coal";
39-
window.path_to_searchindex_js = "../searchindex-dc01b304.js";
39+
window.path_to_searchindex_js = "../searchindex-4f269067.js";
4040
</script>
4141
<!-- Start loading toc.js asap -->
4242
<script src="../toc-7980dfd2.js"></script>

docs/book/appendix/glossary.html

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
const path_to_root = "../";
3737
const default_light_theme = "light";
3838
const default_dark_theme = "coal";
39-
window.path_to_searchindex_js = "../searchindex-dc01b304.js";
39+
window.path_to_searchindex_js = "../searchindex-4f269067.js";
4040
</script>
4141
<!-- Start loading toc.js asap -->
4242
<script src="../toc-7980dfd2.js"></script>
@@ -216,6 +216,15 @@ <h2 id="simulation"><a class="header" href="#simulation">Simulation</a></h2>
216216
<p><strong>SimulatedWeb</strong> — A virtual internet for testing. Hosts multiple <code>UniverseGenerator</code> instances, each responding to URLs on its domain.</p>
217217
<p><strong>UniverseGenerator</strong> — The trait for generating deterministic responses. Implementations: LinkMaze, EncodingHell, MalformedDom, RedirectLabyrinth, ContentTrap, TemporalDrift.</p>
218218
<p><strong>Adversarial Universe</strong> — A simulation universe designed to stress a specific aspect of the crawl kernel (encoding, DOM parsing, redirects, spider traps, temporal changes).</p>
219+
<h2 id="anti-detection"><a class="header" href="#anti-detection">Anti-Detection</a></h2>
220+
<p><strong>JA3</strong> — TLS fingerprinting method that hashes five fields from the ClientHello: TLS version, cipher suites, extensions, supported groups, EC point formats. Legacy but still deployed by WAFs.</p>
221+
<p><strong>JA4</strong> — Current TLS fingerprinting standard (FoxIO). Sorts before hashing to defeat extension randomization. Three sections: header, sorted cipher hash, sorted extension hash.</p>
222+
<p><strong>BoringSSL</strong> — Google’s fork of OpenSSL used by Chrome. Palimpsest uses it (via <code>wreq</code>) for full ClientHello control, enabling browser-grade TLS impersonation.</p>
223+
<p><strong>Akamai h2 Fingerprint</strong> — Passive HTTP/2 fingerprint capturing SETTINGS frame values/order, WINDOW_UPDATE, PRIORITY frames, and pseudo-header ordering. Distinguishes browsers from automation clients.</p>
224+
<p><strong>CDP Stealth Mode</strong> — Anti-detection suite for headless Chrome. 17 evasion patches covering <code>navigator.webdriver</code>, <code>window.chrome</code>, plugins, WebGL, canvas noise, AudioContext noise, and more.</p>
225+
<p><strong>BrowserProfile</strong> — A unified, internally consistent browser identity tying TLS fingerprint + HTTP/2 settings + HTTP headers + JS surface into a single profile. Prevents cross-layer detection mismatches.</p>
226+
<p><strong>ProfileMode</strong> — Controls how browser profiles are selected: <code>None</code> (default), <code>Fixed</code> (one profile), <code>Seeded</code> (deterministic from CrawlSeed), <code>RotatePerDomain</code> (per-domain via BLAKE3).</p>
227+
<p><strong>WebdriverValue</strong> — Explicit config for <code>navigator.webdriver</code> in stealth mode. <code>False</code> (matches real Chrome, default) or <code>Undefined</code> (property deleted). Auditable, not hidden.</p>
219228

220229
</main>
221230

docs/book/architecture/crate-map.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
const path_to_root = "../";
3737
const default_light_theme = "light";
3838
const default_dark_theme = "coal";
39-
window.path_to_searchindex_js = "../searchindex-dc01b304.js";
39+
window.path_to_searchindex_js = "../searchindex-4f269067.js";
4040
</script>
4141
<!-- Start loading toc.js asap -->
4242
<script src="../toc-7980dfd2.js"></script>

docs/book/architecture/data-flow.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
const path_to_root = "../";
3737
const default_light_theme = "light";
3838
const default_dark_theme = "coal";
39-
window.path_to_searchindex_js = "../searchindex-dc01b304.js";
39+
window.path_to_searchindex_js = "../searchindex-4f269067.js";
4040
</script>
4141
<!-- Start loading toc.js asap -->
4242
<script src="../toc-7980dfd2.js"></script>

docs/book/architecture/overview.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
const path_to_root = "../";
3737
const default_light_theme = "light";
3838
const default_dark_theme = "coal";
39-
window.path_to_searchindex_js = "../searchindex-dc01b304.js";
39+
window.path_to_searchindex_js = "../searchindex-4f269067.js";
4040
</script>
4141
<!-- Start loading toc.js asap -->
4242
<script src="../toc-7980dfd2.js"></script>

0 commit comments

Comments
 (0)