Skip to content

Commit 8906e88

Browse files
committed
updated blog
1 parent e18d76c commit 8906e88

1 file changed

Lines changed: 33 additions & 5 deletions

File tree

demo/BLOG_POST.md

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -95,9 +95,13 @@ Predicate Secure wraps your existing agent code in **3-5 lines** - no rewrites n
9595
The demo executes a simple but complete browser task:
9696

9797
✓ Navigate to https://www.example.com with policy check
98+
9899
✓ Take snapshot with visual element overlay
100+
99101
✓ Find and click "Learn more" link using semantic query
102+
100103
✓ Verify URL contains "example-domains" after navigation
104+
101105
✓ Upload trace to Predicate Studio (if API key provided)
102106

103107
Each action goes through the full authorization + verification loop.
@@ -174,17 +178,19 @@ Authorization rules are declarative YAML:
174178
175179
> **Note:** The policy is fail-closed: any action not explicitly allowed is denied. This prevents agents from taking unexpected actions.
176180
177-
### 3. Verification with Local LLM
181+
### 3. LLM-Generated Verification Predicates
178182
179-
After each action, the local LLM generates verification predicates:
183+
After each action, the local LLM analyzes the state changes and generates **deterministic verification predicates** (assertions to check):
184+
185+
> **Important:** The LLM is NOT doing visual verification. Instead, it generates structured assertions (like `url_contains`, `element_exists`) based on observed state changes. The actual verification execution is **deterministic** - predicates are evaluated as true/false checks.
180186

181187
```python
182188
# Capture pre and post snapshots
183189
pre_snapshot = await get_page_summary()
184190
result = await execute_action()
185191
post_snapshot = await get_page_summary()
186192
187-
# LLM generates verification plan
193+
# LLM generates verification plan (what to check, not the check itself)
188194
verification_plan = verifier.generate_verification_plan(
189195
action="click",
190196
action_target="element#6",
@@ -193,7 +199,7 @@ verification_plan = verifier.generate_verification_plan(
193199
context={"task": "Find and click Learn more link"}
194200
)
195201
196-
# Execute generated verifications
202+
# Execute generated predicates deterministically
197203
for verification in verification_plan.verifications:
198204
passed = execute_predicate(
199205
verification.predicate, # e.g., "url_contains"
@@ -204,7 +210,7 @@ for verification in verification_plan.verifications:
204210
raise AssertionError("Post-execution verification failed")
205211
```
206212

207-
The LLM sees both snapshots and generates appropriate checks:
213+
The LLM sees both snapshots and generates a structured verification plan:
208214

209215
```json
210216
{
@@ -222,6 +228,28 @@ The LLM sees both snapshots and generates appropriate checks:
222228
}
223229
```
224230

231+
**For Production Workflows:**
232+
233+
For well-understood web flows (like QA testing flows or regular business processes), you can skip LLM generation and use **human-defined predicates** directly:
234+
235+
```python
236+
# Predefined verification for known workflows
237+
verification_plan = VerificationPlan(
238+
action="click",
239+
verifications=[
240+
VerificationSpec(predicate="url_contains", args=["example-domains"]),
241+
VerificationSpec(predicate="element_exists", args=["h1"]),
242+
VerificationSpec(predicate="snapshot_changed", args=[]),
243+
],
244+
reasoning="Predefined checks for 'Learn more' click flow",
245+
)
246+
247+
# Execute the same way - deterministic evaluation
248+
all_passed = execute_verifications(verification_plan)
249+
```
250+
251+
This approach is **faster** (no LLM inference), **more predictable** (explicit assertions), and **ideal for regression testing** of known workflows. Use LLM-generated predicates for exploratory tasks or novel scenarios.
252+
225253
### 4. Visual Element Overlay
226254

227255
Enable visual debugging with snapshot overlays:

0 commit comments

Comments
 (0)