| name | firstdata | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| description | Find official portals, APIs, and download paths for authoritative primary data sources (governments, international organizations, research institutions, etc.). Use when users need to know "where to find this data from an official source", "which source is more authoritative", or "how to cite primary data". Covers 1000+ global data sources with authority comparison and site navigation guidance. | ||||||||||||||||
| version | 0.0.2 | ||||||||||||||||
| metadata |
|
FirstData is a structured knowledge base of authoritative primary data sources, covering 1000+ sources to help agents locate official origins rather than generating unverified answers.
It does not replace raw data — it acts as an "authoritative data navigator", taking vague user needs as input, recommending the most appropriate primary sources, and providing clear access paths, API information, and download methods so both users and agents can trace back to original evidence.
Coverage:
- International organizations: World Bank, IMF, OECD, WHO, FAO, etc.
- Chinese government agencies: PBC, National Bureau of Statistics, General Administration of Customs, CSRC, etc.
- National official agencies: US, Canada, Japan, UK, Australia, etc.
- Academic & research databases: NBER, Penn World Table, PubMed, etc.
- Corporate disclosure & market platforms: stock exchange disclosure systems, listed company filings, etc.
- Industry-specific databases: energy, finance, health, climate, legal & regulatory, etc.
When to use: When users need to find official data sources, compare source authority, obtain official URLs/APIs/download paths, or build evidence-chain workflows. FirstData is a source locator, not an answer generator — after receiving results, guide users back to original sources for verification rather than treating them as final answers.
1. Source Locator — Returns the top 3–5 most relevant sources with authority level, matching rationale, access URL, API documentation, and download methods.
2. Site Pathfinder — Provides step-by-step navigation from homepage to target data for complex official websites, including alternative paths and API access methods.
3. Evidence-Ready Workflows — Can be embedded into workflows requiring evidence chains: deep research, policy analysis, investment research, compliance auditing, fact-checking, etc.
Each data source includes structured metadata: authority level (government / international / research / market / commercial / other), access URL, API information, download formats, geographic scope, update frequency, access level, etc.
Typical query scenarios when agents call FirstData via MCP:
| User Need | Query Direction | Expected Output |
|---|---|---|
| "Which official source should I cite for China's 2023 NEV export volume?" | China Customs, National Bureau of Statistics | Official source + authority level + data page URL |
| "Where to download IPO prospectus for a Hong Kong-listed company?" | HKEXnews | Official platform + step-by-step navigation |
| "World Bank vs IMF GDP data — which is better for academic citation?" | World Bank WDI, IMF WEO | Source comparison + authority differences + API docs |
| "Need global climate data with API access" | NASA Earthdata, NOAA CDO | Data source + API docs + access methods |
| "Where is the official data for China's M2 money supply?" | People's Bank of China | Official data portal + update frequency + historical coverage |
Full project background and feature documentation: README
This skill connects to the FirstData MCP server (firstdata.deepminer.com.cn), the project's official hosted API endpoint. An API key (FIRSTDATA_API_KEY) is required for authentication.
If you already have FIRSTDATA_API_KEY set, configure the MCP connection:
npx mcporter config add firstdata https://firstdata.deepminer.com.cn/mcp --header 'Authorization=Bearer ${FIRSTDATA_API_KEY}'Or add manually to your MCP config:
{
"mcpServers": {
"firstdata": {
"type": "streamable-http",
"url": "https://firstdata.deepminer.com.cn/mcp",
"headers": {
"Authorization": "Bearer <FIRSTDATA_API_KEY>"
}
}
}
}If you don't have an API key, see firstdata-register.md for the registration process (two API calls to the FirstData server to obtain a JWT token).
Once connected, browse the tool list provided by the firstdata MCP and select the appropriate tool based on your needs.
The FirstData MCP server provides 5 tools. Below is a reference with usage guidelines, limitations, and examples.
- Authentication required: All tools require a valid API key (JWT token) via
Authorization: Bearer <token>header. - Daily call quota: API usage is subject to a per-token daily call quota. Quota varies by API key tier (trial accounts: 30 calls/day). MCP tool calls do not return remaining quota information. To check quota, use the Token verification API (
POST /api/token/verify) which returnsremaining_dailyin the response — this is a separate HTTP call, not available through MCP tool invocation. - Network dependency: All tools make HTTP calls to the FirstData server (
firstdata.deepminer.com.cn). Network latency and server availability affect response times.
Purpose: Unified data source search tool supporting keyword search, structured filtering, pagination, and multiple output modes.
Limitations:
- Maximum 200 results per query (
limitparameter range: 1–200, default: 20). ⚠️ Each keyword is matched as an independent substring — pass each search term as a separate array element. For example, use["中国", "GDP"](173 results) instead of["中国 GDP"](0 results). This is by design to preserve multi-word terms like"New Zealand"or"World Bank".- Keyword matching is substring-based, not semantic search. Keywords are matched against source metadata fields (name, description, tags, content).
- The
domainparameter uses substring matching, not exact enum matching (e.g.,"finance"matches"public-finance","finance","financial-markets"). - No boolean operators (AND/OR/NOT). Multiple keywords in the array are combined with OR logic (results matching any keyword are returned, deduplicated).
- Response time: typically ~1 second.
Purpose: Retrieve full details for specific data sources by their IDs.
Limitations:
- Invalid
source_idvalues do NOT cause an error response (isError: false). Instead, the result array includes{"id": "xxx", "error": "Not found"}for each invalid ID alongside valid results. Callers must check individual items forerrorfields rather than relying solely onisError. - No schema-level limit on the number of
source_idsper request, but performance with large batches (50+) is unverified. As a practical guideline (not a hard limit), consider batching in groups of ~20. - The
fieldsparameter filters returned fields; when omitted, all fields are returned.
Purpose: LLM-powered intelligent search agent for complex, cross-domain, or ambiguous queries that require multi-step reasoning.
Limitations:
- Query length: 2–1,000 characters.
- Maximum results: 1–20 (default: 5).
- Non-idempotent: Same query may return different results across calls (LLM reasoning varies).
- Response time: typically 2–8 seconds (involves LLM inference). May take longer (10–30+ seconds) when the agent triggers
web_searchfor external information. - Internally uses LangChain ReAct agent with
jqfor local data queries plus optionalweb_search. The web search step is not user-controllable. - Use
search_sourceinstead for simple keyword matching or structured filtering — it is faster, deterministic, and cheaper.
Purpose: Generate detailed access instructions for a specific data source using RAG (Retrieval-Augmented Generation).
Limitations:
- Not all data sources have instruction libraries. If a source has no pre-built instructions, results will be empty or irrelevant.
- Invalid
source_idreturns{"error": "数据源 xxx 不存在"}. top_krange: 1–5 (default: 3).- Response time is highly variable: 3–20 seconds, depending on RAG retrieval complexity and server load.
- Retrieval quality depends heavily on the specificity of the
operationparameter. Vague descriptions yield lower-quality matches. Use specific action verbs and entity names (e.g., "查询2024年M2货币供应量数据" rather than "查数据").
Purpose: Submit user feedback to the development team when FirstData has a confirmed issue.
Limitations:
feedback_messagelength: 10–2,000 characters.- Non-idempotent: Duplicate calls create duplicate feedback entries. Do not retry on success.
- Only use when a genuine issue is confirmed (missing source, incorrect data, broken functionality). Do not use as a general comment channel.
Examples:
# Example 1: Broken link
feedback_message="链接失效:数据源 china-pbc 的 data_url 返回 404,无法访问数据页面。检索关键词:中国货币供应量"
# Example 2: Outdated content
feedback_message="数据内容过时:数据源 worldbank-open-data 的 update_frequency 标注为 quarterly,但实际已超过 6 个月未更新"
When adding or modifying MCP tool descriptions, follow these principles (based on MCP tool description quality research):
Core principle: "Write it right before writing it all" — Functionality accuracy (+11.6% impact) matters ~8× more than Conciseness (+1.5%).
6-dimension checklist (check all before submitting):
- Purpose: Is the tool's function clearly stated in the first sentence?
- Guidelines: Are usage scenarios and when-to-use / when-not-to-use rules included?
- Examples: Are typical input/output examples provided?
- Limitations: Are constraints, edge cases, and known limitations documented?
- Parameters: Are all parameters described with types, ranges, and defaults?
- Return Format: Is the response structure documented?
FirstData is an open-source project — join us in building the authoritative data source knowledge base for agents: