A Streamlit-based observability utility that parses browser HAR files and surfaces ad-tech network bottlenecks through latency metrics and an interactive waterfall timeline.
- Features
- Tech Stack / Architecture
- Getting Started
- Testing
- Deployment
- Usage
- Configuration
- License
- Support the Project
- Parses
.harand.jsonexports generated by browser DevTools. - Detects ad-related traffic using a curated keyword list (e.g.,
doubleclick,prebid,pubmatic,adnxs,rubicon,openx, and others). - Isolates ad calls from all page traffic so teams can focus on monetization-path latency.
- Builds an interactive Plotly timeline waterfall to inspect request overlap, sequence, and duration.
- Calculates operational latency KPIs:
- Total ad request count
- Aggregated ad load time
- Average request latency
- Slowest request duration
- Highlights the slowest domains by average latency.
- Summarizes top domains by request volume.
- Produces a searchable tabular request log with timing, status, and URL visibility.
- Includes guided in-app HAR export instructions for faster onboarding.
Important
This tool is intentionally filtered for ad-tech traffic and is best used for ad stack performance debugging, not full-site synthetic monitoring.
Note
Detection relies on substring matching against known ad-tech vendor keywords. Custom SSP/ad server domains may require extending the keyword list in utils.py.
- Python: Application runtime and data transformation logic.
- Streamlit: Single-page web UI for upload, metric display, and chart rendering.
- Pandas: Timestamp parsing and tabular request manipulation.
- Plotly Express: Interactive waterfall/timeline visualization.
HAR-Ad-Latency-Analyzer/
├── app.py # Streamlit UI, metrics, charts, and table rendering
├── utils.py # HAR parsing and ad-call extraction pipeline
├── requirements.txt # Python dependencies
├── README.md # Project documentation
└── LICENSE # Apache License 2.0
- UI-first diagnostic workflow: Streamlit allows rapid upload-analyze-iterate cycles for AdOps and engineers.
- Client-provided data model: HAR upload avoids injecting agents/scripts into production pages.
- Keyword-driven filtering: Lightweight and transparent detection logic prioritizes explainability over opaque heuristics.
- Timeline-centric analysis: Gantt-style view reveals concurrency, blocking windows, and outlier calls quickly.
- Minimal dependency footprint: Only essential analytics and visualization libraries are included.
flowchart LR
A[Upload HAR/JSON in Streamlit] --> B[load_har_file]
B --> C{Valid HAR schema?}
C -- No --> D[Show invalid file error]
C -- Yes --> E[extract_ad_calls]
E --> F[Filter URLs by ad keyword list]
F --> G[Parse Start/End timestamps and metadata]
G --> H[Build pandas DataFrame]
H --> I[Render KPIs + Waterfall + Domain charts + Log table]
Tip
For best signal quality, export HAR with a full page refresh and cache disabled in DevTools.
- Python
3.9+(recommended for recent Streamlit/Pandas compatibility) - pip for dependency installation
- A browser-generated HAR file (
.haror.json) from Chrome/Firefox DevTools
git clone https://github.com/<your-org>/HAR-Ad-Latency-Analyzer.git
cd HAR-Ad-Latency-Analyzer
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txtstreamlit run app.pyOpen the local URL printed by Streamlit (typically http://localhost:8501) and upload your HAR file.
This repository currently does not ship a formal automated test suite. Validation is primarily functional/manual through the UI.
python -m py_compile app.py utils.py
python -m pip check
streamlit run app.pyUse the following conventions for future CI hardening:
# unit tests (when tests/ exists)
pytest -q
# integration smoke checks (example)
pytest -q tests/integration
# linting/formatting (example)
ruff check .
black --check .Warning
Because HAR inputs vary by browser and site architecture, include fixture HAR files in future tests to avoid regressions in field names and timestamp handling.
- Run behind a reverse proxy (Nginx/Caddy) when exposing beyond localhost.
- Restrict access because HAR files may contain sensitive request metadata.
- Pin dependency versions in
requirements.txtfor reproducible environments. - Configure process supervision (e.g.,
systemd, container orchestrator).
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]docker build -t har-ad-latency-analyzer .
docker run --rm -p 8501:8501 har-ad-latency-analyzer- Add a pipeline stage for syntax checks and linting.
- Add artifact-based test fixtures for representative HAR samples.
- Optionally publish a versioned container image for reproducible deployments.
- Open target website in Chrome/Firefox.
- Open DevTools and navigate to Network.
- Enable Preserve log and disable cache (recommended).
- Refresh the page.
- Export requests as HAR.
streamlit run app.pyUpload the HAR via sidebar, then inspect:
- KPI row: request volume and latency aggregates
- Waterfall chart: per-domain request overlap and duration
- Slowest domains chart: mean latency by vendor endpoint
- Request-count chart: top ad domains by volume
- Detailed table: raw call-level diagnostics
from utils import load_har_file, extract_ad_calls
# read HAR payload
with open("sample.har", "r", encoding="utf-8") as f:
har_data = load_har_file(f)
# extract ad-tech calls into a DataFrame
ad_df = extract_ad_calls(har_data)
# inspect top latency offenders
print(ad_df.groupby("Domain")["Duration_ms"].mean().sort_values(ascending=False).head(10))- High
Duration_mswith low request count suggests isolated vendor slowness. - High request count with moderate latency may indicate cumulative ad stack drag.
- Overlapping long bars in the waterfall can indicate concurrent contention.
Caution
HAR timing is client-observed and can be affected by local CPU, browser extensions, throttling settings, and network conditions.
The application currently has a minimal configuration model and no .env requirement.
- Accepted file types:
.har,.json - Expected schema root:
log.entries[] - Required request fields:
request.urlstartedDateTimetimeresponse.statusresponse.content.size(optional fallback:0)
Ad request matching is defined in utils.py via the ad_keywords list.
ad_keywords = [
"doubleclick", "googlesyndication", "adnxs", "rubicon",
"pubmatic", "criteo", "openx", "appnexus", "prebid",
"amazon-adsystem", "smartadserver"
]Streamlit supports CLI-level configuration (examples):
streamlit run app.py --server.port 8501 --server.address 0.0.0.0No custom environment variables are required by the project today. You may still use standard Streamlit variables if needed:
STREAMLIT_SERVER_PORTSTREAMLIT_SERVER_ADDRESS
This project is licensed under the Apache License 2.0. See LICENSE for the full legal text.
If you find this tool useful, consider leaving a star on GitHub or supporting the author directly.