Skip to content

feat: add Xiaohongshu (RedNote) extractor#292

Open
Afterimages wants to merge 1 commit into
kepano:mainfrom
Afterimages:xiaohongshu-extractor
Open

feat: add Xiaohongshu (RedNote) extractor#292
Afterimages wants to merge 1 commit into
kepano:mainfrom
Afterimages:xiaohongshu-extractor

Conversation

@Afterimages
Copy link
Copy Markdown

Extract note content (description, images, video, tags) and comments from xiaohongshu.com note pages via INITIAL_STATE JSON parsing.

Comment extraction uses a three-tier fallback strategy:

  1. Parse from INITIAL_STATE (HTML source / MAIN world bridge)
  2. Fetch from XHS comment API (async fallback)
  3. Extract from rendered DOM elements (works without MAIN world access)

Additional features:

  • Normalize JS-only tokens (undefined/NaN/Infinity) in JSON-like state
  • Remove duplicate hashtag mentions from description text
  • Include comments in contentHtml by default; opt-out via excludeCommentsFromContent option
  • Provide comments as {{comments}} template variable (markdown)
  • Register extractor for xiaohongshu.com URL patterns

Relates #273

Extract note content (description, images, video, tags) and comments
from xiaohongshu.com note pages via __INITIAL_STATE__ JSON parsing.

Comment extraction uses a three-tier fallback strategy:
1. Parse from __INITIAL_STATE__ (HTML source / MAIN world bridge)
2. Fetch from XHS comment API (async fallback)
3. Extract from rendered DOM elements (works without MAIN world access)

Additional features:
- Normalize JS-only tokens (undefined/NaN/Infinity) in JSON-like state
- Remove duplicate hashtag mentions from description text
- Include comments in contentHtml by default; opt-out via
  excludeCommentsFromContent option
- Provide comments as {{comments}} template variable (markdown)
- Register extractor for xiaohongshu.com URL patterns

Relates kepano#273
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant