Skip to content

Commit 24d38d6

Browse files
authored
fix(configs): tighten 4 configs & add release & security advisory configs (#306)
* fix(configs): tighten 4 feed configs with stable selectors * feat: add release and security advisory configs * chore: drop unreliable avherald config * chore: agents + style * fix: move unstable changed configs to browserless strategy
1 parent 6631881 commit 24d38d6

8 files changed

Lines changed: 96 additions & 28 deletions

File tree

AGENTS.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,13 @@ Recommended sequence:
6868
4. Confirm the title and URL live inside that boundary.
6969
5. Record the final URL if the page redirects by locale or renders a different surface than expected.
7070

71+
If Chrome MCP is unavailable (`Transport closed` or page-lock errors), do this recovery sequence:
72+
73+
1. Kill stale Chrome MCP processes (`pkill -9 -f 'chrome-devtools-mcp|Chrome for Testing'`).
74+
2. Retry Chrome MCP once before continuing.
75+
3. If still unavailable, continue with `curl -I -L`, runtime `feed`, and HTML inspection in a temporary file.
76+
4. Explicitly report Chrome MCP outage in the final handoff.
77+
7178
## Browserless
7279

7380
Use Browserless when:
@@ -158,6 +165,20 @@ bundle exec rspec --tag fetch --example 'example.com/feed.yml' spec/html2rss/con
158165
- the chosen surface is too noisy or too dynamic
159166
- the candidate should be downgraded or dropped
160167

168+
7. Cross-runtime mismatch check (required when core feed works but fetch specs fail):
169+
170+
- confirm canonical URL with redirect tracing:
171+
172+
```bash
173+
curl -I -L -s https://example.com | sed -n '1,20p'
174+
```
175+
176+
- compare behavior in both runtimes:
177+
- core repo (`../html2rss`) via `html2rss feed`
178+
- configs repo fetch lane (`bundle exec rspec --tag fetch --example ...`)
179+
- if selectors are valid in core but fetch lane still returns zero items, treat this as request-strategy/runtime mismatch, not selector success.
180+
- in that case: prefer Browserless-backed verification if available; otherwise mark as downgraded/deferred with evidence.
181+
161182
## Runtime Debugging
162183

163184
Use the core CLI as the authority for single-config debugging. The quickest loop is:
@@ -170,6 +191,13 @@ Use the core CLI as the authority for single-config debugging. The quickest loop
170191

171192
If Browserless works but Faraday does not, keep the config narrow and classify it as Browserless-backed instead of trying to rescue it with brittle tweaks.
172193

194+
Additional high-value checks:
195+
196+
- Always normalize `channel.url` to the final canonical host/path (`www` vs non-`www`, retired legacy paths).
197+
- Prefer selectors anchored to content links (`h3 a`, `a[href*='/article/']`) over container-only selectors.
198+
- Remove optional fields first when quality drops (`categories`, synthetic IDs, weak descriptions) before adding selector complexity.
199+
- Set `enhance: false` early if enhancement starts pulling nav/hero/market widgets.
200+
173201
## Auto-Source
174202

175203
Use `auto` for reconnaissance, not as proof that a config is ready.
@@ -211,3 +239,5 @@ When finishing config work, report:
211239
- dropped or deferred candidates and why
212240
- commands actually run
213241
- residual risks, especially selector drift, localization dependence, or Browserless dependence
242+
- whether Chrome MCP was available during validation
243+
- whether focused fetch specs matched core runtime behavior
Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,16 @@
11
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
22
channel:
33
title: "deraktionaer.de: meistgelesen"
4-
url: https://deraktionaer.de/
4+
url: https://www.deraktionaer.de/
55
time_zone: Europe/Berlin
66
ttl: 360
77
language: de
8+
enhance: false
9+
strategy: browserless
810
selectors:
911
items:
10-
selector: "#most-viewed ol > li"
12+
selector: "section#top-articles article.top-article a.top-article-content[href^='/artikel/']"
1113
title:
12-
selector: "> a"
14+
extractor: "text"
1315
url:
14-
selector: "> a"
1516
extractor: "href"
16-
isin:
17-
selector: ".stock-info"
18-
extractor: attribute
19-
attribute: "data-quote"
20-
categories:
21-
- isin
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
2+
channel:
3+
url: https://www.elastic.co/docs/release-notes/elasticsearch
4+
language: en
5+
time_zone: UTC
6+
ttl: 360
7+
strategy: browserless
8+
selectors:
9+
items:
10+
selector: 'a[href^="#elasticsearch-"][href$="-release-notes"]'
11+
enhance: false
12+
title:
13+
extractor: text
14+
url:
15+
extractor: href
Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
2-
---
32
channel:
4-
url: https://avherald.com/
3+
url: https://go.dev/doc/devel/release
54
language: en
6-
ttl: 120
75
time_zone: UTC
6+
ttl: 360
7+
strategy: browserless
88
selectors:
99
items:
10-
selector: "table table a"
10+
selector: 'a[href^="/doc/go1."]'
11+
enhance: false
1112
title:
12-
selector: span
13+
extractor: text
1314
url:
1415
extractor: href
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
2+
channel:
3+
url: https://grafana.com/docs/grafana/latest/whatsnew/
4+
language: en
5+
time_zone: UTC
6+
ttl: 360
7+
strategy: browserless
8+
selectors:
9+
items:
10+
selector: 'a.docs__menu-a[href^="/docs/grafana/latest/whatsnew/whats-new-in-v"]'
11+
enhance: false
12+
title:
13+
extractor: text
14+
url:
15+
extractor: href
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
22
channel:
3-
url: https://www.iaapa.org/news
3+
url: https://iaapa.org/news-funworld
44
time_zone: UTC
55
ttl: 720
6+
enhance: false
7+
strategy: browserless
68
selectors:
79
items:
810
selector: ".views-row > article"
911
title:
1012
selector: h3
11-
description:
12-
selector: ".event-card__summary"
1313
url:
1414
selector: "a"
1515
extractor: "href"
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
2+
channel:
3+
url: https://www.mozilla.org/en-US/security/advisories/
4+
language: en
5+
time_zone: UTC
6+
ttl: 360
7+
strategy: browserless
8+
selectors:
9+
items:
10+
selector: "main li"
11+
enhance: false
12+
title:
13+
selector: 'a[href*="/security/advisories/mfsa"]'
14+
url:
15+
selector: 'a[href*="/security/advisories/mfsa"]'
16+
extractor: href
Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,16 @@
11
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
22
channel:
3-
url: https://www.tourismusnetzwerk-brandenburg.de/nc/aktuelle-nachrichten/
3+
url: https://tourismusnetzwerk-brandenburg.de/
44
time_zone: Europe/Berlin
55
ttl: 720
66
language: de
7+
enhance: false
8+
strategy: browserless
79
selectors:
810
items:
9-
selector: "article.article"
11+
selector: "article.node.article.wall-floating"
1012
title:
11-
selector: "h3"
13+
selector: "h3.title a[rel='bookmark']"
1214
url:
13-
selector: "a"
15+
selector: "h3.title a[rel='bookmark']"
1416
extractor: "href"
15-
topic:
16-
selector: ".field--item"
17-
categories:
18-
- topic
19-
description:
20-
selector: "p"

0 commit comments

Comments
 (0)