Skip to content
This repository was archived by the owner on Mar 16, 2026. It is now read-only.

OstinUA/HAR-Ad-Latency-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HAR Ad Latency Analyzer

A Streamlit-based observability utility that parses browser HAR files and surfaces ad-tech network bottlenecks through latency metrics and an interactive waterfall timeline.

Python Streamlit Pandas Plotly License: Apache-2.0

Table of Contents

Features

  • Parses .har and .json exports generated by browser DevTools.
  • Detects ad-related traffic using a curated keyword list (e.g., doubleclick, prebid, pubmatic, adnxs, rubicon, openx, and others).
  • Isolates ad calls from all page traffic so teams can focus on monetization-path latency.
  • Builds an interactive Plotly timeline waterfall to inspect request overlap, sequence, and duration.
  • Calculates operational latency KPIs:
    • Total ad request count
    • Aggregated ad load time
    • Average request latency
    • Slowest request duration
  • Highlights the slowest domains by average latency.
  • Summarizes top domains by request volume.
  • Produces a searchable tabular request log with timing, status, and URL visibility.
  • Includes guided in-app HAR export instructions for faster onboarding.

Important

This tool is intentionally filtered for ad-tech traffic and is best used for ad stack performance debugging, not full-site synthetic monitoring.

Note

Detection relies on substring matching against known ad-tech vendor keywords. Custom SSP/ad server domains may require extending the keyword list in utils.py.

Tech Stack & Architecture

Core Technologies

  • Python: Application runtime and data transformation logic.
  • Streamlit: Single-page web UI for upload, metric display, and chart rendering.
  • Pandas: Timestamp parsing and tabular request manipulation.
  • Plotly Express: Interactive waterfall/timeline visualization.

Project Structure

HAR-Ad-Latency-Analyzer/
├── app.py            # Streamlit UI, metrics, charts, and table rendering
├── utils.py          # HAR parsing and ad-call extraction pipeline
├── requirements.txt  # Python dependencies
├── README.md         # Project documentation
└── LICENSE           # Apache License 2.0

Key Design Decisions

  • UI-first diagnostic workflow: Streamlit allows rapid upload-analyze-iterate cycles for AdOps and engineers.
  • Client-provided data model: HAR upload avoids injecting agents/scripts into production pages.
  • Keyword-driven filtering: Lightweight and transparent detection logic prioritizes explainability over opaque heuristics.
  • Timeline-centric analysis: Gantt-style view reveals concurrency, blocking windows, and outlier calls quickly.
  • Minimal dependency footprint: Only essential analytics and visualization libraries are included.

Processing Pipeline

flowchart LR
    A[Upload HAR/JSON in Streamlit] --> B[load_har_file]
    B --> C{Valid HAR schema?}
    C -- No --> D[Show invalid file error]
    C -- Yes --> E[extract_ad_calls]
    E --> F[Filter URLs by ad keyword list]
    F --> G[Parse Start/End timestamps and metadata]
    G --> H[Build pandas DataFrame]
    H --> I[Render KPIs + Waterfall + Domain charts + Log table]
Loading

Tip

For best signal quality, export HAR with a full page refresh and cache disabled in DevTools.

Getting Started

Prerequisites

  • Python 3.9+ (recommended for recent Streamlit/Pandas compatibility)
  • pip for dependency installation
  • A browser-generated HAR file (.har or .json) from Chrome/Firefox DevTools

Installation

git clone https://github.com/<your-org>/HAR-Ad-Latency-Analyzer.git
cd HAR-Ad-Latency-Analyzer
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

Run Locally

streamlit run app.py

Open the local URL printed by Streamlit (typically http://localhost:8501) and upload your HAR file.

Testing

This repository currently does not ship a formal automated test suite. Validation is primarily functional/manual through the UI.

Recommended Checks

python -m py_compile app.py utils.py
python -m pip check
streamlit run app.py

If You Add Tests

Use the following conventions for future CI hardening:

# unit tests (when tests/ exists)
pytest -q

# integration smoke checks (example)
pytest -q tests/integration

# linting/formatting (example)
ruff check .
black --check .

Warning

Because HAR inputs vary by browser and site architecture, include fixture HAR files in future tests to avoid regressions in field names and timestamp handling.

Deployment

Production Guidelines

  • Run behind a reverse proxy (Nginx/Caddy) when exposing beyond localhost.
  • Restrict access because HAR files may contain sensitive request metadata.
  • Pin dependency versions in requirements.txt for reproducible environments.
  • Configure process supervision (e.g., systemd, container orchestrator).

Containerization Example (Optional)

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
docker build -t har-ad-latency-analyzer .
docker run --rm -p 8501:8501 har-ad-latency-analyzer

CI/CD Integration Notes

  • Add a pipeline stage for syntax checks and linting.
  • Add artifact-based test fixtures for representative HAR samples.
  • Optionally publish a versioned container image for reproducible deployments.

Usage

1. Export a HAR File

  1. Open target website in Chrome/Firefox.
  2. Open DevTools and navigate to Network.
  3. Enable Preserve log and disable cache (recommended).
  4. Refresh the page.
  5. Export requests as HAR.

2. Analyze in the App

streamlit run app.py

Upload the HAR via sidebar, then inspect:

  • KPI row: request volume and latency aggregates
  • Waterfall chart: per-domain request overlap and duration
  • Slowest domains chart: mean latency by vendor endpoint
  • Request-count chart: top ad domains by volume
  • Detailed table: raw call-level diagnostics

3. Programmatic Extraction Example

from utils import load_har_file, extract_ad_calls

# read HAR payload
with open("sample.har", "r", encoding="utf-8") as f:
    har_data = load_har_file(f)

# extract ad-tech calls into a DataFrame
ad_df = extract_ad_calls(har_data)

# inspect top latency offenders
print(ad_df.groupby("Domain")["Duration_ms"].mean().sort_values(ascending=False).head(10))

4. Interpreting Latency Signals

  • High Duration_ms with low request count suggests isolated vendor slowness.
  • High request count with moderate latency may indicate cumulative ad stack drag.
  • Overlapping long bars in the waterfall can indicate concurrent contention.

Caution

HAR timing is client-observed and can be affected by local CPU, browser extensions, throttling settings, and network conditions.

Configuration

The application currently has a minimal configuration model and no .env requirement.

Input Configuration

  • Accepted file types: .har, .json
  • Expected schema root: log.entries[]
  • Required request fields:
    • request.url
    • startedDateTime
    • time
    • response.status
    • response.content.size (optional fallback: 0)

Detection Configuration

Ad request matching is defined in utils.py via the ad_keywords list.

ad_keywords = [
    "doubleclick", "googlesyndication", "adnxs", "rubicon",
    "pubmatic", "criteo", "openx", "appnexus", "prebid",
    "amazon-adsystem", "smartadserver"
]

Runtime Flags

Streamlit supports CLI-level configuration (examples):

streamlit run app.py --server.port 8501 --server.address 0.0.0.0

Environment Variables (Optional)

No custom environment variables are required by the project today. You may still use standard Streamlit variables if needed:

  • STREAMLIT_SERVER_PORT
  • STREAMLIT_SERVER_ADDRESS

License

This project is licensed under the Apache License 2.0. See LICENSE for the full legal text.

Support the Project

Patreon Ko-fi Boosty YouTube Telegram

If you find this tool useful, consider leaving a star on GitHub or supporting the author directly.

About

Performance debugging tool for AdOps and Developers to analyze .har network logs. It automatically isolates ad-tech requests, calculates latency metrics, and generates interactive waterfall visualizations to identify SSP or bidder bottlenecks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages