Skip to content

Commit e9b0def

Browse files
committed
feat(config): add E1 browserless configs and fetch lane
1 parent 2d64c47 commit e9b0def

11 files changed

Lines changed: 208 additions & 15 deletions

File tree

Makefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,10 @@ test-fetch-changed-configs:
1919
bin/rspec_changed_configs
2020

2121
test-fetch-all-configs:
22-
bundle exec rspec --tag fetch spec/html2rss/configs
22+
bundle exec rspec --tag fetch spec/html2rss/configs_dynamic_spec.rb
23+
24+
test-fetch-browserless-configs:
25+
bin/rspec_browserless_configs
2326

2427
test-all: test test-fetch-all-configs
2528

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,10 +76,23 @@ make test-config CONFIG=github.com/releases.yml
7676

7777
# Test domain
7878
make test-domain DOMAIN=github.com
79+
80+
# Run live fetch tests for the full corpus
81+
make test-fetch-all-configs
82+
83+
# Run the Browserless-backed fetch subset
84+
BROWSERLESS_IO_WEBSOCKET_URL=ws://127.0.0.1:4002 \
85+
BROWSERLESS_IO_API_TOKEN=... \
86+
make test-fetch-browserless-configs
7987
```
8088

8189
**Adding new configs**: Create the YAML file, run `make validate`, then run the generated tests. No dedicated spec file is needed.
8290

91+
The fetch suite has two lanes:
92+
93+
- `make test-fetch-all-configs` runs all `:fetch` examples. Configs marked as Browserless-backed are skipped unless Browserless env vars are configured.
94+
- `make test-fetch-browserless-configs` runs only the Browserless-backed config subset and requires `BROWSERLESS_IO_WEBSOCKET_URL`. Custom endpoints also require `BROWSERLESS_IO_API_TOKEN`.
95+
8396
**Config folder convention**: Place configs under the registrable domain folder (e.g., `example.com/` or `bbc.co.uk/`). Legacy subdomain folders (e.g., `news.example.com/`) are allowed but not preferred.
8497

8598
## Editor Setup (JSON Schema)

bin/rspec_browserless_configs

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!/usr/bin/env ruby
2+
# frozen_string_literal: true
3+
4+
require_relative '../spec/support/browserless_fetch_configs'
5+
6+
unless BrowserlessFetchConfigs.browserless_env_configured?
7+
warn 'BROWSERLESS_IO_WEBSOCKET_URL is required for browserless fetch tests.'
8+
warn 'Set BROWSERLESS_IO_API_TOKEN as well when using a custom websocket endpoint.'
9+
exit 1
10+
end
11+
12+
args = ['bundle', 'exec', 'rspec', '--tag', 'fetch']
13+
BrowserlessFetchConfigs::CONFIGS.each do |config|
14+
args << '--example'
15+
args << config
16+
end
17+
args << 'spec/html2rss/configs_dynamic_spec.rb'
18+
19+
exec(*args)
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
2+
strategy: browserless
3+
4+
channel:
5+
url: https://www.apple.com/newsroom/
6+
language: en
7+
time_zone: UTC
8+
ttl: 360
9+
selectors:
10+
items:
11+
selector: 'li.tile-item a.tile-hero, li.tile-item a.tile-2up, li.tile-item a.tile-3up, li.tile-item a.tile-list'
12+
enhance: false
13+
title:
14+
selector: .tile__headline
15+
url:
16+
extractor: href
17+
published_at:
18+
selector: .tile__timestamp
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
2+
strategy: browserless
3+
4+
channel:
5+
url: https://deepmind.google/blog/
6+
language: en
7+
time_zone: UTC
8+
ttl: 360
9+
selectors:
10+
items:
11+
selector: .card__inner
12+
enhance: false
13+
title:
14+
selector: h3
15+
url:
16+
selector: .card__overlay-link
17+
extractor: href
18+
published_at:
19+
selector: time
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
2+
strategy: browserless
3+
4+
channel:
5+
url: https://www.notion.com/blog
6+
language: en
7+
time_zone: UTC
8+
ttl: 360
9+
selectors:
10+
items:
11+
selector: article.post-preview
12+
enhance: false
13+
title:
14+
selector: h3 a[href*="/blog/"]
15+
url:
16+
selector: h3 a[href*="/blog/"]
17+
extractor: href
18+
description:
19+
selector: '> a[href*="/blog/"]:not([title])'
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
2+
strategy: browserless
3+
4+
channel:
5+
url: https://www.shopify.com/blog/latest
6+
time_zone: UTC
7+
ttl: 360
8+
selectors:
9+
items:
10+
selector: article.article--index
11+
enhance: false
12+
title:
13+
selector: '.blogPost a[href*="/blog/"]:not([href*="/topics/"])'
14+
url:
15+
selector: '.blogPost a[href*="/blog/"]:not([href*="/topics/"])'
16+
extractor: href
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# yaml-language-server: $schema=https://raw.githubusercontent.com/html2rss/html2rss/refs/heads/master/schema/html2rss-config.schema.json
2+
strategy: browserless
3+
4+
channel:
5+
url: https://newsroom.spotify.com/
6+
language: en
7+
time_zone: UTC
8+
ttl: 360
9+
selectors:
10+
items:
11+
selector: '.post-box.v2'
12+
enhance: false
13+
title:
14+
selector: 'h3 a[href*="/20"]'
15+
url:
16+
selector: 'h3 a[href*="/20"]'
17+
extractor: href
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# frozen_string_literal: true
2+
3+
RSpec.describe BrowserlessFetchConfigs do
4+
describe '.browserless_env_configured?' do
5+
around do |example|
6+
original_ws_url = ENV['BROWSERLESS_IO_WEBSOCKET_URL']
7+
original_api_token = ENV['BROWSERLESS_IO_API_TOKEN']
8+
9+
example.run
10+
ensure
11+
ENV['BROWSERLESS_IO_WEBSOCKET_URL'] = original_ws_url
12+
ENV['BROWSERLESS_IO_API_TOKEN'] = original_api_token
13+
end
14+
15+
it 'accepts the documented local Browserless websocket URL without a token' do
16+
ENV['BROWSERLESS_IO_WEBSOCKET_URL'] = 'ws://127.0.0.1:4002'
17+
ENV['BROWSERLESS_IO_API_TOKEN'] = ''
18+
19+
expect(described_class.browserless_env_configured?).to be(true)
20+
end
21+
22+
it 'accepts the legacy local Browserless websocket URL without a token' do
23+
ENV['BROWSERLESS_IO_WEBSOCKET_URL'] = 'ws://127.0.0.1:3000'
24+
ENV['BROWSERLESS_IO_API_TOKEN'] = ''
25+
26+
expect(described_class.browserless_env_configured?).to be(true)
27+
end
28+
29+
it 'requires a token for non-local websocket URLs' do
30+
ENV['BROWSERLESS_IO_WEBSOCKET_URL'] = 'wss://production.browserless.example/ws'
31+
ENV['BROWSERLESS_IO_API_TOKEN'] = ''
32+
33+
expect(described_class.browserless_env_configured?).to be(false)
34+
end
35+
end
36+
end
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# frozen_string_literal: true
2+
3+
module BrowserlessFetchConfigs
4+
LOCAL_WS_URLS = %w[
5+
ws://127.0.0.1:3000
6+
ws://127.0.0.1:4002
7+
].freeze
8+
9+
CONFIGS = %w[
10+
apple.com/newsroom.yml
11+
deepmind.google/blog.yml
12+
notion.com/blog.yml
13+
shopify.com/blog.yml
14+
spotify.com/newsroom.yml
15+
].freeze
16+
17+
module_function
18+
19+
def include?(file_name)
20+
CONFIGS.include?(file_name)
21+
end
22+
23+
def browserless_env_configured?
24+
ws_url = ENV['BROWSERLESS_IO_WEBSOCKET_URL'].to_s
25+
return false if ws_url.empty?
26+
return true if LOCAL_WS_URLS.include?(ws_url)
27+
28+
!ENV['BROWSERLESS_IO_API_TOKEN'].to_s.empty?
29+
end
30+
end

0 commit comments

Comments
 (0)