Add OECD.org scraper by Copilot · Pull Request #32 · okffi/Openmag

Copilot · 2026-03-09T11:12:57Z

Adds scrapers/oecd.org.js to scrape news cards from https://www.oecd.org/.

Key details

listSelector: div.card---theme (triple-dash class, exact from Webscraper JSON config)
parse: extracts title + absolute link from first <a>; returns null if either is missing; resolves relative URLs against https://www.oecd.org
Image: lazy-load fallback chain (src → data-src → data-lazy-src → data-original), discarding data: base64 placeholders
Date: parseOecdDate handles DD/MM/YYYY (uses Date.UTC to avoid timezone day-shift) and named-month strings like "9 March 2026", falls back to now
content: empty string — no description selector in JSON config
creator: 'OECD'
No domain property (not used by the engine)

Original prompt

Task

Create a new scraper file scrapers/oecd.org.js for https://www.oecd.org/.

robots.txt clearance

The OECD robots.txt (https://www.oecd.org/robots.txt) only disallows /content/dam/oecd/ and /adobe/dynamicmedia/deliver/ for generic crawlers. The homepage https://www.oecd.org/ is allowed. ✅

Webscraper JSON config (source of truth)
{
  "_id": "oecd-main-insights",
  "startUrl": ["https://www.oecd.org/"],
  "selectors": [
    {
      "elementLimit": 0,
      "id": "elem",
      "multiple": true,
      "parentSelectors": ["_root"],
      "scroll": false,
      "selector": "div.card---theme",
      "type": "SelectorElement"
    },
    {
      "id": "title",
      "multiple": false,
      "multipleType": "singleColumn",
      "parentSelectors": ["elem"],
      "regex": "",
      "selector": "a",
      "type": "SelectorText",
      "version": 2
    },
    {
      "id": "link",
      "linkType": "linkFromHref",
      "multiple": false,
      "parentSelectors": ["elem"],
      "selector": "a",
      "type": "SelectorLink",
      "version": 2
    },
    {
      "id": "image",
      "multiple": false,
      "multipleType": "singleColumn",
      "parentSelectors": ["elem"],
      "selector": "img",
      "type": "SelectorImage",
      "version": 2
    },
    {
      "id": "date",
      "multiple": false,
      "multipleType": "singleColumn",
      "parentSelectors": ["elem"],
      "regex": "",
      "selector": ".card__date",
      "type": "SelectorText",
      "version": 2
    }
  ]
}
Implementation requirements

Selectors

listSelector: div.card---theme — note the triple dash --- in the class name (this is exact, straight from the JSON)

title: text of the first a inside the card

link: href of the first a inside the card — resolve to absolute URL using https://www.oecd.org as base if relative

image: img inside the card — apply the lazy-load fallback pattern: src → data-src → data-lazy-src → data-original → null; discard data: base64 placeholders

date: .card__date text — OECD typically uses formats like "9 March 2026" or "09/03/2026". Parse robustly:

Try DD/MM/YYYY regex first

Fall back to new Date(str) for named-month formats like "9 March 2026"

Fall back to new Date().toISOString() if unparseable

Return object
return {
    title,
    link,
    enforcedImage: img,
    content,   // empty string "" — no description selector in JSON config
    pubDate,
    creator: 'OECD'
};
Return null if title or link is missing/empty.

File location

scrapers/oecd.org.js

Reference implementation pattern

Follow the same structure as the existing scrapers in the repo, especially scrapers/icann.org.js and scrapers/coe.int.js. The file must use module.exports = { listSelector, parse }.

Do NOT add a domain property — it is not used by the engine for scrapers.

The fetch_news.js engine automatically resolves relative URLs but it is safer to make the link absolute in the scraper using new URL(rawLink, 'https://www.oecd.org').href.

Date parsing helper
function parseOecdDate(str) {
    if (!str) return new Date().toISOString();

    // Try DD/MM/YYYY (European slash format)
    const slashMatch = str.match(/^(\d{1,2})\/(\d{1,2})\/(\d{4})$/);
    if (slashMatch) {
        const day   = parseInt(slashMatch[1], 10);
        const month = parseInt(slashMatch[2], 10) - 1;
        const year  = parseInt(slashMatch[3], 10);
        return new Date(year, month, day, 12, 0, 0).toISOString();
    }

    // Fallback: JS Date parse ("9 March 2026", "2026-03-09", etc.)
    const d = new Date(str);
    if (!isNaN(d.getTime())) return d.toISOString();

    return new Date().toISOString();
}

This pull request was created from Copilot chat.

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: susannaanas <4725416+susannaanas@users.noreply.github.com>

Initial plan

9beb223

Copilot AI assigned Copilot and susannaanas Mar 9, 2026

Copilot started work on behalf of susannaanas March 9, 2026 11:13 View session

Add OECD scraper (scrapers/oecd.org.js)

5393962

Co-authored-by: susannaanas <4725416+susannaanas@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Create new scraper file for oecd.org~~ Add OECD.org scraper Mar 9, 2026

Copilot finished work on behalf of susannaanas March 9, 2026 11:15

Copilot started work on behalf of susannaanas March 9, 2026 11:27 View session

Copilot finished work on behalf of susannaanas March 9, 2026 11:28

susannaanas marked this pull request as ready for review March 9, 2026 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OECD.org scraper#32

Add OECD.org scraper#32
Copilot wants to merge 2 commits intomainfrom
copilot/add-scraper-file-oecd-org

Copilot AI commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key details

Task

robots.txt clearance

Webscraper JSON config (source of truth)

Implementation requirements

Selectors

Return object

File location

Reference implementation pattern

Date parsing helper

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 9, 2026 •

edited

Loading