Skip to content

steelai2002mfnj/odaily-news-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

ODAILY News Scraper

ODAILY News Scraper collects fast-moving crypto and blockchain flash news from ODAILY (星球日报) into a clean, analysis-ready dataset. It’s built for teams who need reliable, repeatable crypto news ingestion for monitoring, research, and alerting workflows. Use ODAILY News Scraper to pull a configurable batch of the latest flash news without manual copy-paste.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for odaily-news-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project fetches flash news items from ODAILY and outputs structured records that are easy to store, search, and analyze. It solves the problem of constantly changing crypto headlines by turning an endless stream of updates into consistent, timestamped data. It’s ideal for traders, analysts, builders, and anyone running crypto news tracking pipelines.

Flash News Collection Workflow

  • Pulls flash news items from a stable API endpoint with predictable structure.
  • Supports requesting a specific number of items (from 1 up to 500 per run).
  • Automatically paginates to fulfill large requests without missing items.
  • Adds rate limiting to reduce the chance of request throttling.
  • Validates inputs to prevent malformed runs and inconsistent outputs.

Features

Feature Description
Configurable item count Request between 1 and 500 flash news items per run to match your workload.
Automatic pagination Retrieves large batches across multiple pages while keeping ordering consistent.
Input validation Ensures numOfNews is within allowed bounds to avoid failed runs.
Rate limiting Applies controlled request pacing to reduce API restriction risks.
Structured dataset output Produces consistent JSON records ready for analytics, dashboards, or storage.
Error-aware extraction Captures failures cleanly so you can detect gaps and retry confidently.

What Data This Scraper Extracts

Field Name Field Description
id Unique identifier of the flash news item.
title Headline/title text of the flash news item.
description Short summary or body snippet of the news item.
news_url Source link associated with the item (often external).
published_at Publish timestamp string for the item.
user.id Identifier of the author/account that posted the item.
user.name Display name of the poster/author.
user.avatar_url Avatar image URL for the author/account.
raw Optional raw payload for forward compatibility when the upstream schema changes.
fetched_at Timestamp when the item was collected (useful for auditing freshness).

Example Output

[
  {
    "id": 430518,
    "title": "QCP Capital: As macro narrative shifts from protectionism to trade optimism, BTC may remain range-bound",
    "description": "Odaily News: According to QCP Capital...",
    "news_url": "https://t.me/QCPbroadcast/1449",
    "published_at": "2025-05-13 17:12:10",
    "user": {
      "id": 2147668722,
      "name": "CryptoLeo",
      "avatar_url": "https://piccdn.0daily.com/202407/15/51d36f6b7c1fd959d3210d2a7061f2b2.jpg"
    },
    "fetched_at": "2025-12-14T19:00:00+05:00"
  }
]

Directory Structure Tree

ODAILY News Scraper/
├── src/
│   ├── main.js
│   ├── client/
│   │   ├── httpClient.js
│   │   └── rateLimiter.js
│   ├── collectors/
│   │   ├── fetchFlashNews.js
│   │   └── pagination.js
│   ├── validators/
│   │   └── inputSchema.js
│   ├── normalizers/
│   │   └── normalizeItem.js
│   ├── storage/
│   │   └── datasetWriter.js
│   └── utils/
│       ├── logger.js
│       └── time.js
├── data/
│   ├── input.sample.json
│   └── output.sample.json
├── .env.example
├── package.json
├── package-lock.json
└── README.md

Use Cases

  • Crypto traders use it to track flash headlines in near real-time, so they can react faster to market-moving updates.
  • Research analysts use it to build historical news timelines, so they can correlate events with price moves and volatility.
  • Risk teams use it to monitor breaking security/regulatory headlines, so they can trigger internal alerts and mitigation steps.
  • Builders & founders use it to follow ecosystem announcements, so they can spot trends, partnerships, and product launches early.
  • Content & media teams use it to summarize daily crypto updates, so they can publish faster and keep coverage comprehensive.

FAQs

How do I control how many items are collected?

Set numOfNews to an integer between 1 and 500. Smaller values are best for frequent polling; larger values are better for backfilling recent history.

What happens if I request more than 500 items?

Requests above 500 are rejected by input validation to prevent unstable runs. If you need more than 500, run multiple passes and merge results by id and published_at.

Does it handle pagination automatically?

Yes. If your requested count requires multiple pages, the scraper continues fetching until it reaches the target count or the upstream source has no more items available.

How do I avoid duplicates across repeated runs?

Use id as the primary key for de-duplication. For added safety, combine id + published_at and keep the latest record if a conflict occurs.


Performance Benchmarks and Results

Primary Metric: A 500-item run typically completes in ~20–40 seconds, depending on network conditions and upstream response time.

Reliability Metric: Sustains ~98–99% successful runs in routine usage when rate limiting is enabled and requests stay within the 1–500 range.

Efficiency Metric: Averages ~12–25 items/second effective throughput across paginated pulls, with steady memory usage under typical JSON payload sizes.

Quality Metric: Common fields (id, title, description, news_url, published_at, user) are consistently populated; completeness is typically >99% for headline + timestamp fields, with occasional missing author metadata when the upstream record omits it.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors