You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/src/appendix/glossary.md
+18Lines changed: 18 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,3 +73,21 @@
73
73
**UniverseGenerator** — The trait for generating deterministic responses. Implementations: LinkMaze, EncodingHell, MalformedDom, RedirectLabyrinth, ContentTrap, TemporalDrift.
74
74
75
75
**Adversarial Universe** — A simulation universe designed to stress a specific aspect of the crawl kernel (encoding, DOM parsing, redirects, spider traps, temporal changes).
76
+
77
+
## Anti-Detection
78
+
79
+
**JA3** — TLS fingerprinting method that hashes five fields from the ClientHello: TLS version, cipher suites, extensions, supported groups, EC point formats. Legacy but still deployed by WAFs.
80
+
81
+
**JA4** — Current TLS fingerprinting standard (FoxIO). Sorts before hashing to defeat extension randomization. Three sections: header, sorted cipher hash, sorted extension hash.
82
+
83
+
**BoringSSL** — Google's fork of OpenSSL used by Chrome. Palimpsest uses it (via `wreq`) for full ClientHello control, enabling browser-grade TLS impersonation.
**CDP Stealth Mode** — Anti-detection suite for headless Chrome. 17 evasion patches covering `navigator.webdriver`, `window.chrome`, plugins, WebGL, canvas noise, AudioContext noise, and more.
88
+
89
+
**BrowserProfile** — A unified, internally consistent browser identity tying TLS fingerprint + HTTP/2 settings + HTTP headers + JS surface into a single profile. Prevents cross-layer detection mismatches.
90
+
91
+
**ProfileMode** — Controls how browser profiles are selected: `None` (default), `Fixed` (one profile), `Seeded` (deterministic from CrawlSeed), `RotatePerDomain` (per-domain via BLAKE3).
92
+
93
+
**WebdriverValue** — Explicit config for `navigator.webdriver` in stealth mode. `False` (matches real Chrome, default) or `Undefined` (property deleted). Auditable, not hidden.
HTTP client + browser capture (CDP) + link extraction + robots.txt parsing. Every fetch wraps an `ExecutionEnvelope`.
3
+
HTTP client + browser capture (CDP) + link extraction + robots.txt parsing + TLS/HTTP2 fingerprint impersonation + CDP stealth mode. Every fetch wraps an `ExecutionEnvelope`.
4
4
5
5
## HttpFetcher
6
6
@@ -12,17 +12,22 @@ impl HttpFetcher {
12
12
}
13
13
```
14
14
15
+
Uses **wreq** (BoringSSL backend) instead of reqwest. When an emulation profile is set, the TLS ClientHello and HTTP/2 SETTINGS frame match the selected browser.
When `emulation` is set (e.g., `Emulation::Chrome133`), wreq impersonates the selected browser's TLS fingerprint (JA3/JA4 including post-quantum key shares) and HTTP/2 settings (SETTINGS frame values/order, WINDOW_UPDATE, pseudo-header ordering). 70+ browser profiles available: Chrome 100-137, Firefox 109-139, Safari 15-18.5, Edge, Opera.
30
+
26
31
## BrowserFetcher
27
32
28
33
```rust
@@ -33,7 +38,81 @@ impl BrowserFetcher {
33
38
}
34
39
```
35
40
36
-
Launches headless Chrome via CDP. Injects JS determinism overrides. Captures DOM snapshot, sub-resources via Network events, and resource dependency graph.
41
+
Launches headless Chrome via CDP. Injects determinism overrides and (optionally) 17 stealth evasion patches. Captures DOM snapshot, sub-resources via Network events, and resource dependency graph.
`extract_links` strips `<script>` and `<style>` content before scanning for `href` and `src` attributes. Output is deduplicated and sorted for determinism.
Run with: `cargo test -p palimpsest-fetch --test stealth_test -- --ignored --nocapture --test-threads=1`
163
+
75
164
## Key Invariant
76
165
77
-
Every fetch receives an `ExecutionEnvelope`. The envelope seals the context before the network request begins, enabling replay and verification.
166
+
Every fetch receives an `ExecutionEnvelope`. The envelope seals the context before the network request begins, enabling replay and verification. Emulation profile and stealth config are deterministic inputs (Law 1).
Copy file name to clipboardExpand all lines: book/src/security/browser-sandbox.md
+50-11Lines changed: 50 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,17 +10,7 @@ Headless Chrome runs in a sandboxed process with strict isolation:
10
10
11
11
## Timeout Enforcement
12
12
13
-
Every page load has a hard timeout (default: 60 seconds for browser mode, 30 seconds for page load). If the page does not complete loading within the timeout, the browser process is killed.
14
-
15
-
```rust
16
-
pubstructBrowserFetchConfig {
17
-
pubpage_timeout:Duration, // Default: 30s
18
-
pubviewport_width:u32, // Default: 1920
19
-
pubviewport_height:u32, // Default: 1080
20
-
pubjs_enabled:bool, // Default: true
21
-
pubuser_agent:String,
22
-
}
23
-
```
13
+
Every page load has a hard timeout (default: 30 seconds). If the page does not complete loading within the timeout, the browser process is killed.
| navigator.userAgent | Consistent with HTTP User-Agent header |
66
+
| navigator.maxTouchPoints | 0 |
67
+
68
+
### Determinism Guarantee
69
+
70
+
All noise patches (canvas, audio, ClientRect) use deterministic xorshift PRNGs with sub-seeds derived from `CrawlSeed`. Same seed = same noise = same fingerprint. This is Law 1 compliant.
<p><strong>SimulatedWeb</strong> — A virtual internet for testing. Hosts multiple <code>UniverseGenerator</code> instances, each responding to URLs on its domain.</p>
217
217
<p><strong>UniverseGenerator</strong> — The trait for generating deterministic responses. Implementations: LinkMaze, EncodingHell, MalformedDom, RedirectLabyrinth, ContentTrap, TemporalDrift.</p>
218
218
<p><strong>Adversarial Universe</strong> — A simulation universe designed to stress a specific aspect of the crawl kernel (encoding, DOM parsing, redirects, spider traps, temporal changes).</p>
<p><strong>JA3</strong> — TLS fingerprinting method that hashes five fields from the ClientHello: TLS version, cipher suites, extensions, supported groups, EC point formats. Legacy but still deployed by WAFs.</p>
221
+
<p><strong>JA4</strong> — Current TLS fingerprinting standard (FoxIO). Sorts before hashing to defeat extension randomization. Three sections: header, sorted cipher hash, sorted extension hash.</p>
222
+
<p><strong>BoringSSL</strong> — Google’s fork of OpenSSL used by Chrome. Palimpsest uses it (via <code>wreq</code>) for full ClientHello control, enabling browser-grade TLS impersonation.</p>
<p><strong>CDP Stealth Mode</strong> — Anti-detection suite for headless Chrome. 17 evasion patches covering <code>navigator.webdriver</code>, <code>window.chrome</code>, plugins, WebGL, canvas noise, AudioContext noise, and more.</p>
225
+
<p><strong>BrowserProfile</strong> — A unified, internally consistent browser identity tying TLS fingerprint + HTTP/2 settings + HTTP headers + JS surface into a single profile. Prevents cross-layer detection mismatches.</p>
226
+
<p><strong>ProfileMode</strong> — Controls how browser profiles are selected: <code>None</code> (default), <code>Fixed</code> (one profile), <code>Seeded</code> (deterministic from CrawlSeed), <code>RotatePerDomain</code> (per-domain via BLAKE3).</p>
227
+
<p><strong>WebdriverValue</strong> — Explicit config for <code>navigator.webdriver</code> in stealth mode. <code>False</code> (matches real Chrome, default) or <code>Undefined</code> (property deleted). Auditable, not hidden.</p>
0 commit comments