HAR Ad Latency Analyzer

A Streamlit-based observability utility that parses browser HAR files and surfaces ad-tech network bottlenecks through latency metrics and an interactive waterfall timeline.

Features

Parses .har and .json exports generated by browser DevTools.
Detects ad-related traffic using a curated keyword list (e.g., doubleclick, prebid, pubmatic, adnxs, rubicon, openx, and others).
Isolates ad calls from all page traffic so teams can focus on monetization-path latency.
Builds an interactive Plotly timeline waterfall to inspect request overlap, sequence, and duration.
Calculates operational latency KPIs:
- Total ad request count
- Aggregated ad load time
- Average request latency
- Slowest request duration
Highlights the slowest domains by average latency.
Summarizes top domains by request volume.
Produces a searchable tabular request log with timing, status, and URL visibility.
Includes guided in-app HAR export instructions for faster onboarding.

Important

This tool is intentionally filtered for ad-tech traffic and is best used for ad stack performance debugging, not full-site synthetic monitoring.

Note

Detection relies on substring matching against known ad-tech vendor keywords. Custom SSP/ad server domains may require extending the keyword list in utils.py.

Tech Stack & Architecture

Core Technologies

Python: Application runtime and data transformation logic.
Streamlit: Single-page web UI for upload, metric display, and chart rendering.
Pandas: Timestamp parsing and tabular request manipulation.
Plotly Express: Interactive waterfall/timeline visualization.

Project Structure

HAR-Ad-Latency-Analyzer/
├── app.py            # Streamlit UI, metrics, charts, and table rendering
├── utils.py          # HAR parsing and ad-call extraction pipeline
├── requirements.txt  # Python dependencies
├── README.md         # Project documentation
└── LICENSE           # Apache License 2.0

Key Design Decisions

UI-first diagnostic workflow: Streamlit allows rapid upload-analyze-iterate cycles for AdOps and engineers.
Client-provided data model: HAR upload avoids injecting agents/scripts into production pages.
Keyword-driven filtering: Lightweight and transparent detection logic prioritizes explainability over opaque heuristics.
Timeline-centric analysis: Gantt-style view reveals concurrency, blocking windows, and outlier calls quickly.
Minimal dependency footprint: Only essential analytics and visualization libraries are included.

Processing Pipeline

flowchart LR
    A[Upload HAR/JSON in Streamlit] --> B[load_har_file]
    B --> C{Valid HAR schema?}
    C -- No --> D[Show invalid file error]
    C -- Yes --> E[extract_ad_calls]
    E --> F[Filter URLs by ad keyword list]
    F --> G[Parse Start/End timestamps and metadata]
    G --> H[Build pandas DataFrame]
    H --> I[Render KPIs + Waterfall + Domain charts + Log table]

Tip

For best signal quality, export HAR with a full page refresh and cache disabled in DevTools.

Getting Started

Prerequisites

Python 3.9+ (recommended for recent Streamlit/Pandas compatibility)
pip for dependency installation
A browser-generated HAR file (.har or .json) from Chrome/Firefox DevTools

Installation

git clone https://github.com/<your-org>/HAR-Ad-Latency-Analyzer.git
cd HAR-Ad-Latency-Analyzer
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

Run Locally

streamlit run app.py

Open the local URL printed by Streamlit (typically http://localhost:8501) and upload your HAR file.

Testing

This repository currently does not ship a formal automated test suite. Validation is primarily functional/manual through the UI.

Recommended Checks

python -m py_compile app.py utils.py
python -m pip check
streamlit run app.py

If You Add Tests

Use the following conventions for future CI hardening:

# unit tests (when tests/ exists)
pytest -q

# integration smoke checks (example)
pytest -q tests/integration

# linting/formatting (example)
ruff check .
black --check .

Warning

Because HAR inputs vary by browser and site architecture, include fixture HAR files in future tests to avoid regressions in field names and timestamp handling.

Deployment

Production Guidelines

Run behind a reverse proxy (Nginx/Caddy) when exposing beyond localhost.
Restrict access because HAR files may contain sensitive request metadata.
Pin dependency versions in requirements.txt for reproducible environments.
Configure process supervision (e.g., systemd, container orchestrator).

Containerization Example (Optional)

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

docker build -t har-ad-latency-analyzer .
docker run --rm -p 8501:8501 har-ad-latency-analyzer

CI/CD Integration Notes

Add a pipeline stage for syntax checks and linting.
Add artifact-based test fixtures for representative HAR samples.
Optionally publish a versioned container image for reproducible deployments.

Usage

1. Export a HAR File

Open target website in Chrome/Firefox.
Open DevTools and navigate to Network.
Enable Preserve log and disable cache (recommended).
Refresh the page.
Export requests as HAR.

2. Analyze in the App

streamlit run app.py

Upload the HAR via sidebar, then inspect:

KPI row: request volume and latency aggregates
Waterfall chart: per-domain request overlap and duration
Slowest domains chart: mean latency by vendor endpoint
Request-count chart: top ad domains by volume
Detailed table: raw call-level diagnostics

3. Programmatic Extraction Example

from utils import load_har_file, extract_ad_calls

# read HAR payload
with open("sample.har", "r", encoding="utf-8") as f:
    har_data = load_har_file(f)

# extract ad-tech calls into a DataFrame
ad_df = extract_ad_calls(har_data)

# inspect top latency offenders
print(ad_df.groupby("Domain")["Duration_ms"].mean().sort_values(ascending=False).head(10))

4. Interpreting Latency Signals

High Duration_ms with low request count suggests isolated vendor slowness.
High request count with moderate latency may indicate cumulative ad stack drag.
Overlapping long bars in the waterfall can indicate concurrent contention.

Caution

HAR timing is client-observed and can be affected by local CPU, browser extensions, throttling settings, and network conditions.

Configuration

The application currently has a minimal configuration model and no .env requirement.

Input Configuration

Accepted file types: .har, .json
Expected schema root: log.entries[]
Required request fields:
- request.url
- startedDateTime
- time
- response.status
- response.content.size (optional fallback: 0)

Detection Configuration

Ad request matching is defined in utils.py via the ad_keywords list.

ad_keywords = [
    "doubleclick", "googlesyndication", "adnxs", "rubicon",
    "pubmatic", "criteo", "openx", "appnexus", "prebid",
    "amazon-adsystem", "smartadserver"
]

Runtime Flags

Streamlit supports CLI-level configuration (examples):

streamlit run app.py --server.port 8501 --server.address 0.0.0.0

Environment Variables (Optional)

No custom environment variables are required by the project today. You may still use standard Streamlit variables if needed:

STREAMLIT_SERVER_PORT
STREAMLIT_SERVER_ADDRESS

License

This project is licensed under the Apache License 2.0. See LICENSE for the full legal text.

Support the Project

If you find this tool useful, consider leaving a star on GitHub or supporting the author directly.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
utils.py		utils.py

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

HAR Ad Latency Analyzer

Table of Contents

Features

Tech Stack & Architecture

Core Technologies

Project Structure

Key Design Decisions

Processing Pipeline

Getting Started

Prerequisites

Installation

Run Locally

Testing

Recommended Checks

If You Add Tests

Deployment

Production Guidelines

Containerization Example (Optional)

CI/CD Integration Notes

Usage

1. Export a HAR File

2. Analyze in the App

3. Programmatic Extraction Example

4. Interpreting Latency Signals

Configuration

Input Configuration

Detection Configuration

Runtime Flags

Environment Variables (Optional)

License

Support the Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages