ODAILY News Scraper collects fast-moving crypto and blockchain flash news from ODAILY (星球日报) into a clean, analysis-ready dataset. It’s built for teams who need reliable, repeatable crypto news ingestion for monitoring, research, and alerting workflows. Use ODAILY News Scraper to pull a configurable batch of the latest flash news without manual copy-paste.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for odaily-news-scraper you've just found your team — Let’s Chat. 👆👆
This project fetches flash news items from ODAILY and outputs structured records that are easy to store, search, and analyze. It solves the problem of constantly changing crypto headlines by turning an endless stream of updates into consistent, timestamped data. It’s ideal for traders, analysts, builders, and anyone running crypto news tracking pipelines.
- Pulls flash news items from a stable API endpoint with predictable structure.
- Supports requesting a specific number of items (from 1 up to 500 per run).
- Automatically paginates to fulfill large requests without missing items.
- Adds rate limiting to reduce the chance of request throttling.
- Validates inputs to prevent malformed runs and inconsistent outputs.
| Feature | Description |
|---|---|
| Configurable item count | Request between 1 and 500 flash news items per run to match your workload. |
| Automatic pagination | Retrieves large batches across multiple pages while keeping ordering consistent. |
| Input validation | Ensures numOfNews is within allowed bounds to avoid failed runs. |
| Rate limiting | Applies controlled request pacing to reduce API restriction risks. |
| Structured dataset output | Produces consistent JSON records ready for analytics, dashboards, or storage. |
| Error-aware extraction | Captures failures cleanly so you can detect gaps and retry confidently. |
| Field Name | Field Description |
|---|---|
| id | Unique identifier of the flash news item. |
| title | Headline/title text of the flash news item. |
| description | Short summary or body snippet of the news item. |
| news_url | Source link associated with the item (often external). |
| published_at | Publish timestamp string for the item. |
| user.id | Identifier of the author/account that posted the item. |
| user.name | Display name of the poster/author. |
| user.avatar_url | Avatar image URL for the author/account. |
| raw | Optional raw payload for forward compatibility when the upstream schema changes. |
| fetched_at | Timestamp when the item was collected (useful for auditing freshness). |
[
{
"id": 430518,
"title": "QCP Capital: As macro narrative shifts from protectionism to trade optimism, BTC may remain range-bound",
"description": "Odaily News: According to QCP Capital...",
"news_url": "https://t.me/QCPbroadcast/1449",
"published_at": "2025-05-13 17:12:10",
"user": {
"id": 2147668722,
"name": "CryptoLeo",
"avatar_url": "https://piccdn.0daily.com/202407/15/51d36f6b7c1fd959d3210d2a7061f2b2.jpg"
},
"fetched_at": "2025-12-14T19:00:00+05:00"
}
]
ODAILY News Scraper/
├── src/
│ ├── main.js
│ ├── client/
│ │ ├── httpClient.js
│ │ └── rateLimiter.js
│ ├── collectors/
│ │ ├── fetchFlashNews.js
│ │ └── pagination.js
│ ├── validators/
│ │ └── inputSchema.js
│ ├── normalizers/
│ │ └── normalizeItem.js
│ ├── storage/
│ │ └── datasetWriter.js
│ └── utils/
│ ├── logger.js
│ └── time.js
├── data/
│ ├── input.sample.json
│ └── output.sample.json
├── .env.example
├── package.json
├── package-lock.json
└── README.md
- Crypto traders use it to track flash headlines in near real-time, so they can react faster to market-moving updates.
- Research analysts use it to build historical news timelines, so they can correlate events with price moves and volatility.
- Risk teams use it to monitor breaking security/regulatory headlines, so they can trigger internal alerts and mitigation steps.
- Builders & founders use it to follow ecosystem announcements, so they can spot trends, partnerships, and product launches early.
- Content & media teams use it to summarize daily crypto updates, so they can publish faster and keep coverage comprehensive.
Set numOfNews to an integer between 1 and 500. Smaller values are best for frequent polling; larger values are better for backfilling recent history.
Requests above 500 are rejected by input validation to prevent unstable runs. If you need more than 500, run multiple passes and merge results by id and published_at.
Yes. If your requested count requires multiple pages, the scraper continues fetching until it reaches the target count or the upstream source has no more items available.
Use id as the primary key for de-duplication. For added safety, combine id + published_at and keep the latest record if a conflict occurs.
Primary Metric: A 500-item run typically completes in ~20–40 seconds, depending on network conditions and upstream response time.
Reliability Metric: Sustains ~98–99% successful runs in routine usage when rate limiting is enabled and requests stay within the 1–500 range.
Efficiency Metric: Averages ~12–25 items/second effective throughput across paginated pulls, with steady memory usage under typical JSON payload sizes.
Quality Metric: Common fields (id, title, description, news_url, published_at, user) are consistently populated; completeness is typically >99% for headline + timestamp fields, with occasional missing author metadata when the upstream record omits it.
