Skip to content

Commit ca4b5a0

Browse files
committed
Merge branch 'feature/service-tiers' into develop
2 parents 15566f5 + e1cd8d9 commit ca4b5a0

53 files changed

Lines changed: 3560 additions & 48 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

COMMIT_MESSAGE.txt

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
Implement API service tiers and rate limiting with API key authentication
2+
3+
This commit introduces a comprehensive service tier and rate limiting system
4+
for the BTAA Geospatial API, enabling tiered access control and request
5+
throttling based on API keys.
6+
7+
Key features:
8+
- Service tier system with six predefined tiers (btaa_primary, btaa_secondary,
9+
btaa_member_primary, btaa_member_affiliated, general_registered, anonymous)
10+
each with configurable rate limits (or unlimited for internal tiers)
11+
- API key management with SHA-256 hashing, validation, and tier association
12+
- Redis-based rate limiting middleware that enforces per-tier, per-identifier
13+
limits using sliding window algorithm
14+
- Admin endpoints for creating, listing, and revoking API keys, and listing
15+
service tiers
16+
- Database migrations to create api_service_tiers, api_keys, and
17+
api_usage_logs tables with default tier seeding
18+
- Comprehensive test coverage including unit tests, integration tests, and
19+
middleware tests
20+
- Documentation updates including service tiers runbook and README updates
21+
explaining authentication methods and rate limiting behavior
22+
23+
API keys can be provided via:
24+
1. X-API-Key header (highest priority)
25+
2. api_key query parameter
26+
3. Cookie named 'api_key'
27+
28+
Anonymous requests are automatically assigned to the 'anonymous' tier with
29+
10 requests/minute limit. All requests are logged to api_usage_logs for
30+
analytics.
31+
32+
Rate limiting is configurable via RATE_LIMIT_ENABLED environment variable
33+
and uses a dedicated Redis database (RATE_LIMIT_REDIS_DB=2) to avoid
34+
conflicts with caching.
35+

Makefile

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,13 @@ endif
1313
# Can be overridden with: COVERAGE_THRESHOLD=25 make test
1414
COVERAGE_THRESHOLD ?= 50
1515

16+
# Number of parallel workers for pytest-xdist
17+
# Default: 4 (to avoid hitting PostgreSQL connection limits)
18+
# Can be overridden with: PARALLEL_WORKERS=8 make test
19+
# Use 'auto' to use all CPU cores (may hit connection limits with many cores)
20+
# Set to 0 or empty to disable parallelism
21+
PARALLEL_WORKERS ?= 4
22+
1623
# Run both linting and formatting checks (without modifying files)
1724
lint:
1825
@echo "Checking code with ruff..."
@@ -42,12 +49,32 @@ test:
4249
docker compose exec -T paradedb bash -lc 'PGPASSWORD=$$POSTGRES_PASSWORD psql -U postgres -c "CREATE DATABASE btaa_geospatial_api_test WITH TEMPLATE btaa_geospatial_api OWNER postgres;"'; \
4350
fi
4451
@echo "Running tests with coverage threshold of $(COVERAGE_THRESHOLD)%..."
45-
pytest --cov=app --cov-report=term-missing --cov-report=html --cov-fail-under=$(COVERAGE_THRESHOLD)
52+
@if [ -n "$(PARALLEL_WORKERS)" ] && [ "$(PARALLEL_WORKERS)" != "0" ]; then \
53+
echo "Running tests in parallel with $(PARALLEL_WORKERS) workers..."; \
54+
pytest -n $(PARALLEL_WORKERS) --cov=app --cov-report=term-missing --cov-report=html --cov-fail-under=$(COVERAGE_THRESHOLD); \
55+
else \
56+
echo "Running tests sequentially..."; \
57+
pytest --cov=app --cov-report=term-missing --cov-report=html --cov-fail-under=$(COVERAGE_THRESHOLD); \
58+
fi
4659

4760
# Run just the tests without coverage threshold (for debugging)
4861
test-no-coverage:
4962
@echo "Running tests without coverage threshold..."
50-
pytest --full-trace
63+
@if [ -n "$(PARALLEL_WORKERS)" ] && [ "$(PARALLEL_WORKERS)" != "0" ]; then \
64+
echo "Running tests in parallel with $(PARALLEL_WORKERS) workers..."; \
65+
pytest -n $(PARALLEL_WORKERS) --full-trace; \
66+
else \
67+
pytest --full-trace; \
68+
fi
69+
70+
# Run tests in parallel without coverage (fastest option for local development)
71+
test-fast:
72+
@echo "Running tests in parallel without coverage (fast mode)..."
73+
@if [ -n "$(PARALLEL_WORKERS)" ] && [ "$(PARALLEL_WORKERS)" != "0" ]; then \
74+
pytest -n $(PARALLEL_WORKERS); \
75+
else \
76+
pytest -n 4; \
77+
fi
5178

5279
# Force a fresh clone of the test database
5380
test-fresh-db:

README.md

Lines changed: 122 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,13 @@ SEARCH_CACHE_TTL=3600 # 1 hour
161161
SUGGEST_CACHE_TTL=7200 # 2 hours
162162
LIST_CACHE_TTL=43200 # 12 hours
163163
CACHE_TTL=43200 # Default TTL (12 hours)
164+
165+
# Rate Limiting settings
166+
RATE_LIMIT_ENABLED=true # Enable/disable rate limiting
167+
RATE_LIMIT_REDIS_DB=2 # Redis database number for rate limiting (uses same Redis instance)
168+
169+
# API Usage Analytics Enrichment (User Agent Parsing)
170+
# Note: Geocoding has been removed due to licensing complexity
164171
```
165172

166173
When caching is enabled:
@@ -178,6 +185,119 @@ You can manually clear the cache using:
178185
GET /api/v1/cache/clear?cache_type=search|resource|suggest|all
179186
```
180187

188+
## API Usage Analytics
189+
190+
The API automatically logs all requests to the `api_usage_logs` table for analytics purposes. This includes:
191+
192+
- Request metadata (endpoint, method, status code, response time)
193+
- API key and tier information
194+
- IP address and user agent
195+
- Referrer and UTM parameters
196+
- Query parameters (stored in JSON properties field)
197+
198+
### Service tiers, API keys, and rate limiting
199+
200+
The public API supports **service tiers** and **API key–based rate limiting**.
201+
202+
- **Service tiers** are defined in the `api_service_tiers` table and seeded by the migrations into tiers such as:
203+
- `btaa_primary` / `btaa_secondary` – internal BTAA applications with unlimited access
204+
- `btaa_member_primary` / `btaa_member_affiliated` – member applications with higher limits
205+
- `general_registered` – registered external users
206+
- `anonymous` – unauthenticated access with the lowest limits
207+
- **API keys** are stored (hashed) in the `api_keys` table and associated with a tier.
208+
- **Rate limits** are enforced per tier, per identifier (API key hash or IP address) using Redis.
209+
210+
#### How clients authenticate
211+
212+
Clients can authenticate with an API key in one of three ways (in order of precedence):
213+
214+
- `X-API-Key` header:
215+
216+
```http
217+
X-API-Key: your-api-key-here
218+
```
219+
220+
- `Authorization` header with Bearer token:
221+
222+
```http
223+
Authorization: Bearer your-api-key-here
224+
```
225+
226+
- `api_key` query parameter:
227+
228+
```text
229+
GET /api/v1/search?q=roads&api_key=your-api-key-here
230+
```
231+
232+
If no valid API key is provided, the request is treated as **anonymous** and uses the anonymous tier’s rate limit.
233+
234+
#### Admin API for managing keys and tiers
235+
236+
Admin users (protected by HTTP Basic auth with `ADMIN_USERNAME` / `ADMIN_PASSWORD`) can manage keys and inspect tiers:
237+
238+
- `POST /api/v1/admin/api-keys` – create a new API key for a given `tier_name`.
239+
- Request body: `{ "tier_name": "anonymous", "name": "optional friendly name" }`
240+
- Response includes the **plaintext** `api_key` once, plus `key_id` and `tier_name`.
241+
- `GET /api/v1/admin/api-keys` – list existing keys and their tiers.
242+
- `PATCH /api/v1/admin/api-keys/{key_id}` – update `tier_name`, `is_active`, or `name`.
243+
- `DELETE /api/v1/admin/api-keys/{key_id}` – revoke (deactivate) a key.
244+
- `GET /api/v1/admin/api-tiers` – list all tiers, limits, and descriptions.
245+
246+
The admin endpoints are intended for trusted operators only; do **not** expose them directly to the public internet without appropriate protections (e.g., network restrictions, stronger auth).
247+
248+
#### Rate limiting behavior
249+
250+
Rate limiting is enforced by middleware in front of all non-admin API routes:
251+
252+
- Configuration is controlled via environment variables:
253+
254+
```text
255+
RATE_LIMIT_ENABLED=true # Enable/disable rate limiting middleware
256+
RATE_LIMIT_REDIS_DB=2 # Redis database used for rate limiting
257+
REDIS_HOST=redis # Redis host
258+
REDIS_PORT=6379 # Redis port
259+
REDIS_PASSWORD=optional_password
260+
```
261+
262+
- For each request, the middleware:
263+
- Resolves the caller’s **tier** from the API key (if provided) or falls back to the `anonymous` tier.
264+
- Uses Redis to track the number of requests per minute per `(tier_name, identifier)`, where `identifier` is the API key hash or client IP (via `X-Forwarded-For` or socket address).
265+
- Enforces the tier’s `requests_per_minute` limit.
266+
267+
When rate limiting is enabled, responses include:
268+
269+
- `X-RateLimit-Limit` – the allowed number of requests per minute for the current tier (or `unlimited`).
270+
- `X-RateLimit-Remaining` – remaining requests in the current window (or `unlimited`).
271+
- `X-RateLimit-Reset` – UNIX timestamp when the window resets.
272+
273+
If a client exceeds its rate limit:
274+
275+
- The API returns **HTTP 429 Too Many Requests** with a JSON body describing the error.
276+
- The response includes `Retry-After` and `X-RateLimit-*` headers indicating when to retry.
277+
278+
### Enrichment with User Agent Parsing
279+
280+
API usage logs are automatically enriched in the background with:
281+
282+
- **User agent parsing**: Browser, operating system, and device type
283+
284+
This enrichment happens asynchronously via Celery tasks to avoid blocking API requests.
285+
286+
**Note**: IP geocoding (country, region, city, latitude, longitude) has been removed due to licensing complexity with geocoding databases.
287+
288+
#### Backfilling Enrichment Data
289+
290+
To enrich existing API usage logs that were created before enrichment was enabled, you can use the batch enrichment task:
291+
292+
```python
293+
from app.tasks.api_usage_enrichment import enrich_api_usage_logs_batch
294+
295+
# Enrich 100 logs at a time
296+
enrich_api_usage_logs_batch.delay(batch_size=100)
297+
```
298+
299+
This can be run repeatedly until all logs are enriched.
300+
181301
## AI Summarization
182302

183303
The API uses OpenAI's ChatGPT API to generate summaries and identify geographic named entities of historical maps and geographic datasets. To use this feature:
@@ -279,8 +399,8 @@ Data from Who's On First. [License](https://whosonfirst.org/docs/licenses/)
279399
- [X] Search - basic faceting
280400
- [X] Performance - Redis caching
281401
- [X] Search - facet include/exclude
282-
- [ ] Search - facet alpha and numerical pagination, and search within facets
283-
- [ ] Search - advanced/fielded search
402+
- [X] Search - facet alpha and numerical pagination, and search within facets
403+
- [X] Search - advanced/fielded search
284404
- [X] Search - spatial search
285405
- [X] Search Results - thumbnail images (needs improvements)
286406
- [X] Search Results - bookmarked resources

app/api/v1/endpoint_modules/admin.py

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, Query
55
from fastapi.security import HTTPBasic
6+
from pydantic import BaseModel
67

78
from app.api.v1.auth import verify_credentials
89
from app.api.v1.utils import create_response, sanitize_for_json
@@ -16,6 +17,7 @@
1617
ResourceProcessingError,
1718
ResourceProcessingService,
1819
)
20+
from app.services.api_key_service import APIKeyService
1921

2022
logger = logging.getLogger(__name__)
2123

@@ -34,6 +36,21 @@ def get_admin_service() -> AdminService:
3436
# Module-level singleton for dependency injection
3537
_admin_service_dependency = Depends(get_admin_service)
3638

39+
# API Key Service instance (handles its own async engine and session)
40+
api_key_service = APIKeyService()
41+
42+
43+
# Pydantic models for request/response
44+
class CreateAPIKeyRequest(BaseModel):
45+
tier_name: str
46+
name: Optional[str] = None
47+
48+
49+
class UpdateAPIKeyRequest(BaseModel):
50+
tier_name: Optional[str] = None
51+
is_active: Optional[bool] = None
52+
name: Optional[str] = None
53+
3754

3855
@router.post("/cache/clear")
3956
async def clear_cache(
@@ -136,3 +153,98 @@ async def identify_geo_entities(
136153
f"for resource {id}: {str(e)}"
137154
)
138155
raise HTTPException(status_code=500, detail=str(e)) from e
156+
157+
158+
# API Key Management Endpoints
159+
160+
161+
@router.post("/api-keys")
162+
async def create_api_key(
163+
request: CreateAPIKeyRequest,
164+
):
165+
"""Create a new API key."""
166+
try:
167+
result = await api_key_service.create_api_key(
168+
tier_name=request.tier_name,
169+
name=request.name,
170+
)
171+
172+
if result is None:
173+
raise HTTPException(
174+
status_code=400,
175+
detail=f"Failed to create API key. Tier '{request.tier_name}' may not exist.",
176+
)
177+
178+
return create_response(result)
179+
except HTTPException:
180+
raise
181+
except Exception as e:
182+
logger.error(f"Error creating API key: {str(e)}")
183+
raise HTTPException(status_code=500, detail=str(e)) from e
184+
185+
186+
@router.get("/api-keys")
187+
async def list_api_keys():
188+
"""List all API keys."""
189+
try:
190+
keys = await api_key_service.list_api_keys()
191+
return create_response({"keys": keys})
192+
except Exception as e:
193+
logger.error(f"Error listing API keys: {str(e)}")
194+
raise HTTPException(status_code=500, detail=str(e)) from e
195+
196+
197+
@router.patch("/api-keys/{key_id}")
198+
async def update_api_key(
199+
key_id: int,
200+
request: UpdateAPIKeyRequest,
201+
):
202+
"""Update an API key."""
203+
try:
204+
updated = await api_key_service.update_api_key_by_id(
205+
key_id=key_id,
206+
tier_name=request.tier_name,
207+
is_active=request.is_active,
208+
name=request.name,
209+
)
210+
211+
if not updated:
212+
# Could be missing key, missing tier, or no fields to update
213+
raise HTTPException(status_code=400, detail="Failed to update API key")
214+
215+
return create_response({"message": "API key updated successfully"})
216+
except HTTPException:
217+
raise
218+
except Exception as e:
219+
logger.error(f"Error updating API key: {str(e)}")
220+
raise HTTPException(status_code=500, detail=str(e)) from e
221+
222+
223+
@router.delete("/api-keys/{key_id}")
224+
async def revoke_api_key(key_id: int):
225+
"""Revoke (deactivate) an API key."""
226+
try:
227+
# Use service method that handles its own async session (NullPool) to
228+
# avoid cross-event-loop issues with the shared database connection.
229+
success = await api_key_service.revoke_api_key_by_id(key_id)
230+
231+
if not success:
232+
raise HTTPException(status_code=500, detail="Failed to revoke API key")
233+
234+
return create_response({"message": "API key revoked successfully"})
235+
except HTTPException:
236+
raise
237+
except Exception as e:
238+
logger.error(f"Error revoking API key: {str(e)}")
239+
raise HTTPException(status_code=500, detail=str(e)) from e
240+
241+
242+
@router.get("/api-tiers")
243+
async def list_api_tiers():
244+
"""List all service tiers."""
245+
try:
246+
tiers = await api_key_service.list_tiers()
247+
return create_response({"tiers": tiers})
248+
except Exception as e:
249+
logger.error(f"Error listing API tiers: {str(e)}")
250+
raise HTTPException(status_code=500, detail=str(e)) from e

app/api/v1/endpoint_modules/resources/viewer.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from typing import Optional
22

3-
from fastapi import Query, Request
3+
from fastapi import HTTPException, Query, Request
44
from fastapi.responses import JSONResponse
55
from sqlalchemy.sql import select
66

@@ -27,7 +27,8 @@ async def get_resource_viewer_data(
2727
row = result.fetchone()
2828

2929
if not row:
30-
return JSONResponse(content={"error": "Resource not found"}, status_code=404)
30+
# Align with tests: return 404 with {"detail": "Resource not found"}
31+
raise HTTPException(status_code=404, detail="Resource not found")
3132

3233
resource_dict = sanitize_for_json(dict(row._mapping))
3334

0 commit comments

Comments
 (0)