A comprehensive RESTful API for fetching Indonesian news content from detik.com using web scraping techniques. This API provides structured access to Indonesian news articles with features like categorization, search, pagination, caching, and rate limiting.
- Real-time News Scraping: Fetches latest news from detik.com
- Category Support: News, Finance, Sports, Technology, Health, and more
- Advanced Search: Search articles by keywords with relevance scoring
- Date Filtering: Filter articles by date range
- Pagination: Efficient pagination for large result sets
- Caching: Redis-like in-memory caching for improved performance
- Rate Limiting: Prevents API abuse with configurable limits
- Indonesian Language Support: Proper encoding for Indonesian characters
- Comprehensive Documentation: OpenAPI/Swagger documentation
- Error Handling: Robust error handling with detailed error responses
- Logging: Comprehensive logging for monitoring and debugging
- Node.js 16+
- npm or yarn
- Internet connection for scraping
- Clone the repository
git clone <repository-url>
cd <project-directory>- Install dependencies
npm install- Environment Configuration
Create a
.envfile in the root directory:
NODE_ENV=development
PORT=3001
API_BASE_URL=http://localhost:3001- Start the development server
npm run api:dev- Start the production server
npm run api:startOnce the server is running, visit:
- Swagger UI:
http://localhost:3001/api/docs - Health Check:
http://localhost:3001/api/health
GET /api/healthGET /api/news/latest?page=1&limit=20&category=newsParameters:
page(optional): Page number (default: 1)limit(optional): Articles per page (1-50, default: 20)category(optional): Filter by category
GET /api/news/category/{category}?page=1&limit=20&dateFrom=2024-01-01&dateTo=2024-12-31Parameters:
category(required): One of: news, finance, sport, otomotif, properti, travel, food, health, wolipop, inet, edu, hotpage(optional): Page number (default: 1)limit(optional): Articles per page (1-50, default: 20)dateFrom(optional): Start date (YYYY-MM-DD)dateTo(optional): End date (YYYY-MM-DD)
GET /api/news/search?q=jakarta&page=1&limit=20&category=news&sortBy=relevanceParameters:
q(required): Search query (minimum 2 characters)page(optional): Page number (default: 1)limit(optional): Articles per page (1-50, default: 20)category(optional): Filter by categorysortBy(optional): Sort by relevance, date, or popularity (default: relevance)dateFrom(optional): Start date (YYYY-MM-DD)dateTo(optional): End date (YYYY-MM-DD)
GET /api/news/{id}{
"success": true,
"data": {
"articles": [
{
"id": "unique-article-id",
"title": "Article Title",
"description": "Article description or excerpt",
"content": "Full article content",
"url": "https://detik.com/article-url",
"imageUrl": "https://detik.com/image-url",
"publishedAt": "2024-01-15T10:30:00.000Z",
"author": "Author Name",
"category": "news",
"tags": ["tag1", "tag2"],
"source": {
"name": "Detik.com",
"url": "https://detik.com"
}
}
],
"totalResults": 150,
"page": 1,
"pageSize": 20,
"totalPages": 8
},
"message": "Latest news retrieved successfully",
"timestamp": "2024-01-15T10:30:00.000Z"
}{
"success": false,
"error": {
"message": "Invalid category. Valid categories: news, finance, sport...",
"code": "INVALID_CATEGORY",
"details": {
"url": "/api/news/category/invalid",
"method": "GET",
"timestamp": "2024-01-15T10:30:00.000Z"
}
},
"timestamp": "2024-01-15T10:30:00.000Z"
}The API supports the following news categories:
| Category | Description | Detik Section |
|---|---|---|
news |
General News | news |
finance |
Financial News | finance |
sport |
Sports News | sport |
otomotif |
Automotive | oto |
properti |
Property | properti |
travel |
Travel | travel |
food |
Food & Culinary | food |
health |
Health & Medical | health |
wolipop |
Lifestyle & Entertainment | wolipop |
inet |
Technology & Internet | inet |
edu |
Education | edu |
hot |
Trending News | hot |
- Window: 15 minutes
- Max Requests: 100 per IP
- Headers:
X-RateLimit-*headers included in responses
- Default TTL: 5 minutes for latest news
- Search TTL: 10 minutes for search results
- Article TTL: 30 minutes for individual articles
- Headers:
X-Cacheheader indicates HIT/MISS
- Log Files:
server/logs/app.logandserver/logs/error.log - Log Levels: INFO, WARN, ERROR, DEBUG
- Rotation: Manual log rotation recommended
Run the test suite:
npm testRun tests in watch mode:
npm run test:watch- API endpoint testing
- Parameter validation
- Error handling
- Rate limiting
- Caching behavior
- Response format validation
- In-memory caching with configurable TTL
- Cache keys based on request URL and parameters
- Automatic cache invalidation
- Cache statistics available
- IP-based rate limiting
- Configurable windows and limits
- Graceful degradation under load
- Comprehensive error categorization
- Detailed error responses
- Automatic retry mechanisms
- Circuit breaker patterns
- Helmet.js: Security headers
- CORS: Configurable cross-origin requests
- Input Validation: Comprehensive parameter validation
- XSS Protection: Input sanitization
- Rate Limiting: DDoS protection
npm run api:devnpm run api:startFROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY server/ ./server/
EXPOSE 3001
CMD ["npm", "run", "api:start"]const axios = require('axios');
// Get latest news
const getLatestNews = async () => {
try {
const response = await axios.get('http://localhost:3001/api/news/latest');
console.log(response.data.data.articles);
} catch (error) {
console.error('Error:', error.response.data);
}
};
// Search news
const searchNews = async (query) => {
try {
const response = await axios.get(`http://localhost:3001/api/news/search?q=${query}`);
console.log(response.data.data.articles);
} catch (error) {
console.error('Error:', error.response.data);
}
};import requests
# Get latest news
def get_latest_news():
try:
response = requests.get('http://localhost:3001/api/news/latest')
response.raise_for_status()
data = response.json()
return data['data']['articles']
except requests.exceptions.RequestException as e:
print(f'Error: {e}')
return []
# Search news
def search_news(query):
try:
response = requests.get(f'http://localhost:3001/api/news/search?q={query}')
response.raise_for_status()
data = response.json()
return data['data']['articles']
except requests.exceptions.RequestException as e:
print(f'Error: {e}')
return []# Get latest news
curl -X GET "http://localhost:3001/api/news/latest?page=1&limit=10"
# Search news
curl -X GET "http://localhost:3001/api/news/search?q=jakarta&sortBy=date"
# Get news by category
curl -X GET "http://localhost:3001/api/news/category/sport?limit=5"- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
This API is for educational and research purposes. Please respect detik.com's robots.txt and terms of service. Consider implementing appropriate delays between requests and respect rate limits to avoid overwhelming the source website.
For support and questions:
- Create an issue on GitHub
- Check the API documentation at
/api/docs - Review the test files for usage examples
- Initial release
- Basic news scraping functionality
- Category support
- Search functionality
- Pagination and caching
- Comprehensive documentation
- Test suite