Digital Balance Index (DBI) Dashboard

A portfolio-grade analytics project that turns raw “hours spent” into composition-first behavioral insights.

Most screen-time analyses stop at totals (“people spend 8 hours/day”).
This project focuses on how that time is distributed across:

Social
Work/Study
Entertainment

It ships:

a reproducible scoring pipeline (python -m src.pipeline)
clean exported datasets + figures
a Streamlit dashboard for interactive exploration across age_group, primary_device, and internet_type

Dataset (source + thanks)

This project uses the Kaggle dataset:

Daily Internet Usage Statistics by Age Group by jayjoshi37
Dataset URL: https://www.kaggle.com/datasets/jayjoshi37/daily-internet-usage-statistics-by-age-group

Thanks to the dataset author and Kaggle for making the data available.

Why this project exists

The problem with “total hours”

Two users can both have 10 hours/day of screen time, but with completely different patterns:

One: mostly Work/Study (structured use)
Another: mostly Social/Entertainment (dominant categories)

Totals alone can’t describe behavior.
Composition metrics can.

The core idea

For each record, we compute:

Shares of total screen time for each category
DBI (Digital Balance Index): how evenly those shares are distributed
Dominance: how strongly one category “owns” the day
Practical tiers and a flag: High-load & Skewed

What’s inside (pipeline + dashboard)

Pipeline outputs

After running the pipeline you get:

outputs/scored_rows.csv
Original data + engineered features:
- shares (p_social, p_work, p_entertainment)
- dbi, dominance, dominant_category
- tiers (dbi_tier, load_tier)
- risk flag (flag_highload_skewed)
outputs/segment_summary.csv
Aggregated statistics by:
- age_group × primary_device × internet_type
outputs/daily_summary.csv
Daily aggregates:
- mean DBI / mean total / sample size (n) per day
outputs/metric_cards.json
KPI cards + thresholds and validation checks
reports/figures/
All figures used in this README

Dashboard

The Streamlit dashboard allows:

filtering by age group, device, internet type, DBI tier
exploring DBI distributions and segment comparisons
safe interpretation notes (“decision safety”)

Metrics (clear + reproducible)

Let:

total = total_screen_time
social = social_media_hours
work = work_or_study_hours
ent = entertainment_hours

1) Shares (composition)

p_social = social / total
p_work = work / total
p_entertainment = ent / total

These are proportions in [0, 1] and (when total > 0) they sum to ~1.

2) DBI (Digital Balance Index) 0 to 1

DBI uses normalized Shannon entropy over the three shares:

H = - Σ (p_i * ln(p_i)) for i in {social, work, entertainment}
DBI = H / ln(3)

Interpretation:

DBI ≈ 1.00 → time is distributed evenly (balanced composition)
DBI ≈ 0.00 → time is concentrated in one category (skewed composition)

Important note:

DBI does not say “good” or “bad”
DBI says “balanced” vs “dominated”

3) Dominance 0.33 to 1.00

dominance = max(p_social, p_work, p_entertainment)

Interpretation:

dominance close to 1.0 → one category dominates
dominance close to 0.33 → near-even split

4) Practical tiers (used in outputs + dashboard)

DBI tiers:

Balanced: DBI ≥ 0.80
Mixed: 0.60–0.79
Skewed: < 0.60

Load tiers (by quantiles of total screen time):

Low / Medium / High based on P33 and P66

Flag:

flag_highload_skewed = 1 if load_tier == High AND dbi_tier == Skewed

This is intentionally phrased as “attention-worthy,” not “harmful.”

Quickstart

1) Install

python -m venv .venv
# Windows:
#   .venv\Scripts\activate
# macOS/Linux:
#   source .venv/bin/activate
pip install -r requirements.txt

2) Run the pipeline

python -m src.pipeline --input data/raw/daily_internet_usage_by_age_group.csv

You should see a Done message and the output folders printed.

3) Launch the dashboard

streamlit run app/app.py

Project structure

digital-balance-index-dashboard/
  data/
    raw/
    processed/
  outputs/
  reports/
    figures/
  src/
    io.py
    validate.py
    scoring.py
    aggregates.py
    reporting.py
    pipeline.py
  app/
    app.py
  README.md
  requirements.txt

Figures (with interpretation)

Below are the key plots generated by the pipeline.

1) Composition by age group (mean shares)

What this chart shows

Each bar is an age group. The bar stacks to 1.0 (100% of total screen time), split into:

Social
Work/Study
Entertainment

This answers:

Do age groups differ more by total time, or by how time is distributed?

How to read it correctly

Look for relative differences in shares (e.g., Work/Study share slightly higher in one group).
A similar composition across groups suggests:
- differences may lie more in total screen time than usage mix.

Common pitfalls

Don’t interpret this as “who uses more.” This plot is composition, not absolute hours.
A group can have the same composition but different total hours.

2) DBI distribution

What this chart shows

A histogram of DBI values across all records.

Why it matters

This is your “macro fingerprint”:

If DBI is mostly high → usage is generally mixed/balanced in composition.
If DBI has a heavy low tail → many users have dominated usage (one category owns most of the day).

Practical interpretation

High DBI dominance suggests many “mixed days,” not necessarily “low usage.”
A low-DBI tail is where you’d investigate dominant categories:
- dominated by Social?
- dominated by Entertainment?
- or dominated by Work/Study?

3) Total screen time vs DBI (composition vs load)

What this chart shows

Each dot is a record:

x-axis: total screen time (hours)
y-axis: DBI (0–1)

The key insight

This separates two different questions:

How much? (load)
How distributed? (composition)

The useful “quadrants”

Even without drawing lines, you can think in quadrants:

High total + High DBI Heavy usage, but spread across categories (mixed day)
High total + Low DBI Heavy usage, dominated by one category (attention-worthy)
Low total + High DBI Light usage, balanced composition
Low total + Low DBI Light usage, but concentrated (short + focused)

Decision safety note

This plot is descriptive: it suggests where to look, not what to diagnose.

4) Daily trends: mean DBI with sample size

What this chart shows

Solid line: daily mean DBI
Dashed line: sample size (n) that day

Why sample size is plotted

Because daily means are only meaningful if daily n is stable.

If n swings sharply, the mean can swing even if behavior didn’t change.

What you can and cannot conclude

Reasonable:

“DBI is fairly stable across dates.”
“Some days show deviations, but sample size should be checked.”

Not reasonable:

“Behavior changed over time for the same people.” This dataset is not a per-person time series; it’s a collection of records by date.

5) Mean DBI by primary device

What this chart shows

Average DBI per primary device category.

What it’s useful for

Quick comparison: are some devices associated with more balanced composition?

How to interpret cautiously

Small differences can be real or just sample noise.
Device does not “cause” DBI differences; it’s an association.

A strong follow-up is to check:

DBI distribution by device (not just means)
segment breakdown: age group × device

6) Mean DBI by internet type

What this chart shows

Average DBI for WiFi vs Mobile Data.

What it suggests (carefully)

If values are similar:

connectivity context may not be a major driver of composition balance

If values differ:

could reflect behavior contexts (e.g., mobile data used on the go)

Again: this is association, not causation.

7) Mean DBI heatmap: age group × primary device

What this chart shows

Mean DBI for each (age_group, primary_device) combination.

Why it’s powerful

This reveals interaction patterns that “mean by device” hides:

a device might look balanced overall,
but show different balance depending on age group.

How to use it

Spot extremes (highest and lowest cells)
Use segment_summary.csv to confirm:
- n per segment
- composition means
- high-load & skewed rate

Decision Safety (how to present this responsibly)

This project is designed for safe analytics:

DBI is not a mental health score.
DBI is not a productivity score.
DBI is not a diagnosis.

Recommended language:

“composition pattern”
“dominant category”
“balanced vs skewed distribution”
“attention-worthy segment”

Avoid language like:

“addicted”
“harmful”
“unhealthy”
“caused by WiFi/device”

Reproducibility & validation

The pipeline includes:

schema validation (required columns)
identity validation: total ≈ social + work + entertainment
consistent export paths and figures

Roadmap (optional upgrades)

If you want to level this up further:

Add bootstrap confidence intervals for segment means (DBI/total)
Add “What-if composition simulator” (move minutes between categories → new DBI)
Add dominance + category-specific deep dives
Add segment stability checks (min n threshold)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
app		app
data		data
outputs		outputs
reports/figures		reports/figures
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Digital Balance Index (DBI) Dashboard

Dataset (source + thanks)

Why this project exists

The problem with “total hours”

The core idea

What’s inside (pipeline + dashboard)

Pipeline outputs

Dashboard

Metrics (clear + reproducible)

1) Shares (composition)

2) DBI (Digital Balance Index) 0 to 1

3) Dominance 0.33 to 1.00

4) Practical tiers (used in outputs + dashboard)

Quickstart

1) Install

2) Run the pipeline

3) Launch the dashboard

Project structure

Figures (with interpretation)

1) Composition by age group (mean shares)

What this chart shows

How to read it correctly

Common pitfalls

2) DBI distribution

What this chart shows

Why it matters

Practical interpretation

3) Total screen time vs DBI (composition vs load)

What this chart shows

The key insight

The useful “quadrants”

Decision safety note

4) Daily trends: mean DBI with sample size

What this chart shows

Why sample size is plotted

What you can and cannot conclude

5) Mean DBI by primary device

What this chart shows

What it’s useful for

How to interpret cautiously

6) Mean DBI by internet type

What this chart shows

What it suggests (carefully)

7) Mean DBI heatmap: age group × primary device

What this chart shows

Why it’s powerful

How to use it

Decision Safety (how to present this responsibly)

Reproducibility & validation

Roadmap (optional upgrades)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages