This guide shows how to configure cit.is with residential proxy support for location-aware web archiving.
When enabled, cit.is will automatically select residential proxies near the requester's location, providing more geographically accurate web archives. This is particularly useful for:
- Capturing region-specific content
- Bypassing geo-blocking
- More accurate archiving of location-sensitive websites
- Compliance with data locality requirements
Add these settings to your .env file:
# Enable residential proxy functionality
RESIDENTIAL_PROXY_ENABLED=True
# Choose your proxy provider
RESIDENTIAL_PROXY_PROVIDER=brightdata # brightdata, smartproxy, or custom
# GeoIP database for location detection (required)
GEOLITE_DB_PATH=/path/to/GeoLite2-City.mmdbSign up at Bright Data and add:
BRIGHTDATA_USERNAME=your_brightdata_username
BRIGHTDATA_PASSWORD=your_brightdata_password
BRIGHTDATA_ENDPOINT=brd.superproxy.io
BRIGHTDATA_PORT=22225Set a fallback proxy for when location-specific proxies fail:
# Format: http://username:password@proxy-server:port
FALLBACK_PROXY_URL=http://user:pass@fallback-proxy.example.com:8080# How to choose which proxy to use
PROXY_SELECTION_STRATEGY=closest # closest, country_match, random
# Maximum distance in kilometers for "closest" strategy
PROXY_MAX_DISTANCE_KM=500Download the MaxMind GeoLite2 database:
# Register at https://dev.maxmind.com/geoip/accounts/current/license-key
# Then download:
wget -O GeoLite2-City.tar.gz "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City&license_key=YOUR_LICENSE_KEY&suffix=tar.gz"
tar -xzf GeoLite2-City.tar.gz
mv GeoLite2-City_*/GeoLite2-City.mmdb /opt/geoip/Update your .env:
GEOLITE_DB_PATH=/opt/geoip/GeoLite2-City.mmdbTest your proxy setup:
python manage.py shellfrom core.proxy_manager import ProxyManager
# Initialize proxy manager
pm = ProxyManager()
# Test with a known IP (Google's DNS)
proxy = pm.get_optimal_proxy('8.8.8.8')
if proxy:
print(f"Proxy: {proxy.server} in {proxy.country_code}")
print(f"Provider: {proxy.provider}")
# Test if proxy is working
success = pm.test_proxy(proxy)
print(f"Test result: {'✓ Working' if success else '✗ Failed'}")
else:
print("No proxy configured or available")- Request Arrives: User creates archive via API/web interface
- Location Detection: System determines requester's location from IP using GeoIP
- Proxy Selection: Chooses optimal residential proxy near requester
- Archive Creation: SingleFile uses proxy to archive the URL
- Metadata Storage: Proxy information saved with archive
Each archive stores proxy metadata in two places:
- JSON File:
{archive_path}/proxy_metadata.json
{
"proxy_server": "brd.superproxy.io:22225",
"proxy_country": "US",
"proxy_city": "New York",
"proxy_lat": 40.7128,
"proxy_lon": -74.0060,
"proxy_provider": "brightdata",
"proxy_ip": "192.168.1.100",
"proxy_configured": true
}- Database Fields: On the
Shortcodemodelproxy_ip: IP address used for archivingproxy_country: Country code of proxyproxy_provider: Provider name
The Django admin shows proxy information:
- List View: Proxy country and provider
- Detail View: Full proxy metadata
- Filtering: By proxy provider and country
The system is designed to always work, even if proxy configuration fails:
- Proxy Disabled: Archives directly without proxy
- Proxy Config Invalid: Falls back to direct connection
- Proxy Unreachable: Uses fallback proxy or direct
- GeoIP Unavailable: Uses fallback proxy or direct
# Core Settings
SECRET_KEY=your-secret-key
DEBUG=False
ALLOWED_HOSTS=yourdomain.com
DATABASE_URL=postgresql://user:pass@localhost/citis
REDIS_URL=redis://localhost:6379/0
# Server
SERVER_BASE_URL=https://yourdomain.com
MASTER_API_KEY=your-master-api-key
# Archive
ARCHIVE_MODE=singlefile
SINGLEFILE_EXECUTABLE_PATH=/usr/local/bin/single-file
SINGLEFILE_DATA_PATH=./archives
# Proxy Configuration
RESIDENTIAL_PROXY_ENABLED=True
RESIDENTIAL_PROXY_PROVIDER=brightdata
BRIGHTDATA_USERNAME=your_username
BRIGHTDATA_PASSWORD=your_password
GEOLITE_DB_PATH=/opt/geoip/GeoLite2-City.mmdb
FALLBACK_PROXY_URL=http://user:pass@backup-proxy.com:8080- Proxy Selection: Cached for 5 minutes per IP
- GeoIP Lookups: Minimal overhead (~1ms)
- Proxy Testing: Only when explicitly requested
- Failover: Automatic with <1s delay
- Proxy credentials never logged
- IP addresses stored according to privacy settings
- Proxy metadata can be disabled if not needed
- All connections use HTTPS where possible
-
"Proxy not configured"
- Check
RESIDENTIAL_PROXY_ENABLED=True - Verify provider credentials
- Check
-
"GeoIP lookup failed"
- Ensure GeoLite2 database exists at
GEOLITE_DB_PATH - Check file permissions
- Ensure GeoLite2 database exists at
-
"Proxy test failed"
- Verify proxy credentials
- Check network connectivity
- Try fallback proxy
Enable detailed proxy logging:
# In settings.py or .env
LOG_LEVEL=DEBUGCheck logs for proxy-related messages:
tail -f logs/citis.log | grep -i proxy