Monitorul Prețurilor

monitorulpreturilor.gov2.ro – interfață alternativă pentru monitorulpreturilor.info

Preia și stochează date despre prețurile alimentelor și ale combustibililor de pe Monitorului Prețurilor, proiect al Consiliului Concurenței

Proiectul Monitorul Prețurilor produselor alimentare își propune să acorde consumatorilor posibilitatea de a compara prețul aferent coșului de produse a cărui achiziție intenționează să o realizeze.

Pipeline overview

fetch_reference.py          # once / weekly — retail reference data
  └─ fetch_prices.py        # daily — retail prices (per-store, population-ordered)

fetch_gas_reference.py      # once / weekly — gas reference data
  └─ fetch_gas_prices.py    # daily — gas prices

discover_stores.py          # one-shot — store discovery from locality centroids
  └─ update_store_populations.py   # after discovery — enrich stores with population

Analysis (standalone, read-only):

analyse_prices.py           # price variability: intra-network + cross-network CSVs
analyse_products.py         # brand/word frequency + category anomaly CSVs

Site generation (run after daily fetch):

generate_site.py            # static GitHub Pages site (9 HTML pages + CSVs)
generate_map.py             # standalone Leaflet store map → dashboard/stores_map.html
export_analytics.py         # export DB views to docs/data/*.csv

CI helpers:

build_ci_subset.py          # build store + product ID subsets for GitHub Actions

Workflow

First-time setup

# 1. Retail reference data (networks, UATs, categories, products)
python fetch_reference.py

# 2. Discover all stores across Romania
python discover_stores.py

# 3. Enrich stores with surrounding population (used for fetch ordering)
python update_store_populations.py

# 4. Gas reference data (networks, fuel types)
python fetch_gas_reference.py

Daily run

python fetch_prices.py       # retail — resumes from checkpoint if interrupted
python fetch_gas_prices.py   # gas    — resumes from checkpoint if interrupted

Weekly refresh (reference data may change)

python fetch_reference.py
python fetch_gas_reference.py
python discover_stores.py    # picks up newly opened stores
python update_store_populations.py

Analysis (run any time after prices are loaded)

python analyse_prices.py     # → data/price_intra_network.csv, data/price_cross_network.csv
python analyse_products.py   # → data/brands.csv, data/product_words.csv, data/category_anomalies.csv

Data files in `data/`

File	Created by	Purpose
`prices.db`	`db.py` (via any script)	Main SQLite database — all tables
`prices.db-wal`	SQLite (WAL mode)	Write-ahead log; normal when a script is running or was interrupted. Checkpointed automatically on clean close.
`prices.db-shm`	SQLite (WAL mode)	Shared-memory index for the WAL; accompanies `prices.db-wal`
`prices_checkpoint.json`	`fetch_prices.py`	Retail fetch progress; deleted/reset on fresh run
`gas_checkpoint.json`	`fetch_gas_prices.py`	Gas fetch progress; deleted/reset on fresh run
`discover_stores_checkpoint.json`	`discover_stores.py`	Store discovery progress; deleted/reset on fresh run
`retail_checkpoint.json`	`fetch_prices_by_uat.py` (legacy)	Legacy UAT-based fetch progress
`price_intra_network.csv`	`analyse_prices.py`	Intra-network price variance
`price_cross_network.csv`	`analyse_prices.py`	Cross-network price comparison
`brands.csv`	`analyse_products.py`	Normalized brand list
`product_words.csv`	`analyse_products.py`	Word/bigram frequency in product names
`category_anomalies.csv`	`analyse_products.py`	Products assigned to unexpected categories
`discover_gas_checkpoint.json`	`discover_gas_stations.py`	Gas station discovery progress; deleted/reset on fresh run
`ci_stores.txt`	`build_ci_subset.py`	Store IDs for CI test runs (one per line)
`ci_products.txt`	`build_ci_subset.py`	Product IDs for CI test runs (one per line)

WAL note: prices.db-wal and prices.db-shm are safe to leave in place between runs — SQLite manages them. Only delete them if you're sure no script is writing (e.g., after a crash and before restoring from backup).

Scripts

Retail

`fetch_reference.py`

Fetches slow-changing reference data: retail networks, UATs, product categories, and products. Run once, or weekly to pick up new products.

python fetch_reference.py                  # full run → data/prices.db
python fetch_reference.py --limit 5        # first 5 categories only (for testing)
python fetch_reference.py path/to/db.db    # custom DB path

`fetch_prices.py`

Fetches current prices for all stores × product combinations, ordered by store population (busiest cities first). Saves progress to data/prices_checkpoint.json so interrupted runs resume automatically. Requires reference data — run fetch_reference.py first.

Checkpoint behaviour:

Interrupted run → resumes from last saved position on next run
Completed run, same day → exits immediately (no redundant API calls)
Completed run, new day → starts a fresh run automatically
--fresh → ignores any checkpoint and starts clean
--resume → continues a completed same-day run; skips already-processed store×batch keys, fetches only new stores

Product ordering (--products-order):

db (default) — products in DB insertion order
stale — never-fetched products first, then sorted by oldest fetched_at ascending. Use with --max-runtime to cap daily overhead while always filling coverage gaps first. On interrupted runs the product order is saved in the checkpoint so resume is stable within the same day; the next day's run re-derives staleness from the DB.

python fetch_prices.py                                           # full run → data/prices.db
python fetch_prices.py --resume                                  # fetch only new stores added since today's run
python fetch_prices.py --products-order stale --max-runtime 3600 # prioritise never-fetched / stale products, stop after 1h
python fetch_prices.py --limit-stores 3 --limit-products 90      # quick smoke test
python fetch_prices.py --fresh                                   # ignore checkpoint, start clean
python fetch_prices.py path/to/db.db                             # custom DB path

`discover_stores.py`

Discovers retail stores by probing the API from Romanian locality centroids (3 180 localities from the official population CSV). Deduplicates probe points within 4 km to avoid redundant calls. Writes stores to DB; does not write prices. Resumes from checkpoint on restart.

Run once (or after a long gap) to populate the stores table before running fetch_prices.py.

python discover_stores.py                    # all localities ≥ 2 500 pop
python discover_stores.py --min-pop 5000     # larger threshold
python discover_stores.py --limit 50         # first 50 probe points (quick test)
python discover_stores.py --dry-run          # parse + log, no DB writes
python discover_stores.py --fresh            # ignore checkpoint, restart
python discover_stores.py --debug            # verbose logging

`update_store_populations.py`

Computes a surrounding_population estimate for every store: sums populations of all localities whose centroid falls within --radius km (default 10 km). Used by fetch_prices.py to prioritise high-traffic stores.

Run once after discover_stores.py, or re-run whenever the stores table grows.

python update_store_populations.py           # default 10 km radius
python update_store_populations.py --radius 5
python update_store_populations.py --debug

`discover_stores_by_uat.py` (legacy)

Earlier store-discovery approach using UAT centroids from the DB (only ~20 UATs loaded by default). Superseded by discover_stores.py which covers all 3 180 Romanian localities. Kept for reference.

python discover_stores_by_uat.py [--pop-threshold 10000] [--debug] [--dry-run]

`fetch_prices_by_uat.py` (legacy)

Earlier price-fetching approach: iterates UAT × product batches using the centroid endpoint. Superseded by fetch_prices.py which fetches per-store with population ordering. Kept for reference.

python fetch_prices_by_uat.py
python fetch_prices_by_uat.py --limit-uats 3 --limit-products 90
python fetch_prices_by_uat.py --fresh

Gas

`fetch_gas_reference.py`

Fetches gas networks and fuel product types (6 fuel types: benzină/motorină standard & premium, GPL, electric). Run once or weekly.

python fetch_gas_reference.py              # full run → data/prices.db
python fetch_gas_reference.py path/to/db  # custom DB path

`fetch_gas_prices.py`

Fetches current fuel prices for all UATs (one request per fuel type per UAT). Saves progress to data/gas_checkpoint.json. Same checkpoint behaviour as the retail pipeline.

python fetch_gas_prices.py                         # full run → data/prices.db
python fetch_gas_prices.py --limit-uats 3          # quick smoke test
python fetch_gas_prices.py --fresh                 # ignore checkpoint, start clean
python fetch_gas_prices.py path/to/db              # custom DB path

`discover_gas_stations.py`

Discovers gas stations by probing GetGasItemsByLatLon from populated locality centroids (≥ 2 500 pop by default). Uses Motorină standard (fuel type 21) to find all stations within 5 km. Resumes from data/discover_gas_checkpoint.json.

Run once (or periodically) before fetch_gas_prices.py to populate the gas_stations table.

python discover_gas_stations.py                  # all localities ≥ 2 500 pop
python discover_gas_stations.py --min-pop 5000   # larger threshold
python discover_gas_stations.py --limit 50       # first 50 probe points (quick test)
python discover_gas_stations.py --dry-run        # no API calls or DB writes
python discover_gas_stations.py --fresh          # ignore checkpoint
python discover_gas_stations.py --debug          # verbose logging

Analysis

`analyse_prices.py`

Analyses retail price variability. Produces two CSVs in data/:

Output	Contents
`price_intra_network.csv`	Per (product, network): min/max/avg/CV/store count — flags chains with non-uniform pricing
`price_cross_network.csv`	Per product: cheapest/most-expensive network, spread, ratio, price per network

Unit normalization collapses kg/Kg/1kg etc. to a common bucket. SELGROS (B2B wholesale) is excluded from cross-network rankings by default.

python analyse_prices.py
python analyse_prices.py --min-stores 3        # require at least 3 stores per group
python analyse_prices.py --include-selgros     # include SELGROS in rankings
python analyse_prices.py --debug

`analyse_products.py`

Analyses product names and brands. Produces three CSVs in data/:

Output	Contents
`brands.csv`	Normalized brands with raw variants, counts, parent categories
`product_words.csv`	Most common words and bigrams in product names (diacritic-insensitive)
`category_anomalies.csv`	Products whose brand dominates one category but the product is assigned elsewhere

python analyse_products.py
python analyse_products.py --db path/to/prices.db --top 200
python analyse_products.py --anomaly-threshold 0.85

Site generation

`generate_site.py`

Generates the full static GitHub Pages site from the database — 9 HTML pages (dashboard, price index, fuel leaderboard, pipeline health, store map, trends, compare, analytics, gas map) plus per-product CSVs in docs/data/products/.

python generate_site.py                      # → docs/
python generate_site.py --out path/to/out   # custom output directory
python generate_site.py --db path/to/db     # custom DB path

`generate_map.py`

Generates a self-contained Leaflet interactive map of retail stores with network colour coding and marker clustering. Standalone alternative to the map page built by generate_site.py.

python generate_map.py                             # → dashboard/stores_map.html
python generate_map.py --out docs/stores_map.html
python generate_map.py --db path/to/prices.db

`export_analytics.py`

Exports 8 analytical SQL views to CSV files in docs/data/ for use by the static site and external tools.

Output CSV	Contents
`price_variability.csv`	Per-product price variability across networks
`cross_network_spread.csv`	Cross-network price spread per product
`popular_products.csv`	Top 200 most-recorded products
`private_labels.csv`	Top 100 private-label candidates
`stores_per_network.csv`	Store count per retail network
`price_freshness.csv`	Price freshness for the last 30 days
`products_no_prices.csv`	Products with no price records
`run_history.csv`	Last 30 pipeline run records

python export_analytics.py                   # → docs/data/
python export_analytics.py --out path/to/   # custom output directory
python export_analytics.py path/to/db       # custom DB path

CI / Testing helpers

`build_ci_subset.py`

Builds compact store and product ID lists for GitHub Actions CI, selecting representative stores by network rank × population geography and products by coverage rank. Outputs two text files used to limit CI fetch scope.

python build_ci_subset.py                          # defaults: top-10 per net, top-50 products
python build_ci_subset.py --top-per-net 5 --top-overall 30
python build_ci_subset.py --debug                  # print selection details
python build_ci_subset.py path/to/db               # custom DB path

Outputs: data/ci_stores.txt, data/ci_products.txt (one ID per line).

Shared modules

File	Role
`db.py`	`init_db(path)` creates all tables; upsert/insert helpers for retail and gas
`api.py`	`fetch_xml(url)` with retry/backoff; all XML parsers; `centroid_from_wkt(wkt)`

Not run directly.

Database

All data is stored in data/prices.db (SQLite).

Retail tables

retail_networks (id TEXT PK, name, logo_url)
uats            (id INT PK, name, route_id, wkt, center_lat, center_lon)
categories      (id INT PK, name, parent_id, logo_url, source TEXT)
products        (id INT PK, name, categ_id)
stores          (id INT PK, name, addr, lat, lon, uat_id, network_id, zipcode,
                 surrounding_population)
prices          (id AUTOINCREMENT PK, product_id, store_id, price, price_date, promo,
                 brand, unit, retail_categ_id, retail_categ_name,
                 fetched_at, last_checked_at,
                 UNIQUE(product_id, store_id, price_date))

Gas tables

gas_networks  (id TEXT PK, name, logo_url)
gas_products  (id INTEGER PK, name, logo_url)   -- 6 fuel types
gas_stations  (id INTEGER PK, name, addr, lat, lon, uat_id, network_id, zipcode, update_date)
gas_prices    (id AUTOINCREMENT PK, product_id, station_id, price, price_date,
               fetched_at, last_checked_at,
               UNIQUE(product_id, station_id, price_date))

Reference tables use INSERT OR REPLACE; price tables use INSERT OR IGNORE.

Quick queries

# latest retail prices
sqlite3 data/prices.db "SELECT s.name, p.price, p.price_date FROM prices p JOIN stores s ON p.store_id=s.id ORDER BY p.price_date DESC LIMIT 20;"

# latest gas prices
sqlite3 data/prices.db "SELECT n.name, pr.name, gp.price FROM gas_prices gp JOIN gas_stations s ON gp.station_id=s.id JOIN gas_networks n ON s.network_id=n.id JOIN gas_products pr ON gp.product_id=pr.id LIMIT 20;"

# store count per network
sqlite3 data/prices.db "SELECT n.name, COUNT(*) FROM stores s JOIN retail_networks n ON s.network_id=n.id GROUP BY n.name ORDER BY 2 DESC;"

See docs/queries.md for more.

API

Both APIs return XML with no authentication required. They share the XML namespace: http://schemas.datacontract.org/2004/07/pmonsvc.Models.Protos

Retail — base `https://monitorulpreturilor.info/pmonsvc/Retail`

Endpoint	Description
`GET /GetRetailNetworks`	All retail chains (Kaufland, Lidl, etc.)
`GET /GetUATByName`	Top UATs; add `?uatname=` to search
`GET /GetProductCategoriesNetwork`	Product category tree
`GET /GetProductCategoriesNetworkOUG`	OUG (emergency ordinance) categories
`GET /GetCatalogProductsByNameNetwork?prodname=`	Search products by name
`GET /GetCatalogProductsByNameNetwork?CSVcategids=`	Products by category ID(s)
`GET /GetCatalogProductsById?csvcatprodids=`	Products by ID(s)
`GET /GetStoresForProductsByLatLon?lat=&lon=&buffer=&csvprodids=&OrderBy=price`	Stores + prices near a coordinate (max buffer ~5 000 m, max 50 stores)
`GET /GetStoresForProductsByUat?uatId={}&csvprodids=[]&csvnetworkids=[]&OrderBy=price`	Stores + prices for a UAT

Sample responses: docs/reference/sampleResponses/

Gas — base `https://monitorulpreturilor.info/pmonsvc/Gas`

Endpoint	Description
`GET /GetGasNetworks`	All fuel networks (Petrom, OMV, MOL, Rompetrol, etc.)
`GET /GetGasProductsFromCatalog`	Fuel types (benzină/motorină standard & premium, GPL, electric)
`GET /GetGasServicesFromCatalog`	Station services catalog (shop, ATM, car wash, etc.)
`GET /GetUATByName`	Same UAT search as retail
`GET /GetGasItemsByUat?UatId=&CSVGasCatalogProductIds=&OrderBy=dist`	Stations + prices for a UAT (one product ID per request)
`GET /GetGasItemsByRoute?startRoutePointId=&endRoutePointId=&CSVGasCatalogProductIds=&OrderBy=dist`	Stations + prices along a route

Sample responses: docs/carburanti/reference/

Note: GetGasItemsByUat returns HTTP 500 (not empty) for UATs with no stations — handle gracefully.

Roadmap

Vezi și: backlog

Open questions

Does same network has different prices for different stores? — document scope and frequency

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github/workflows		.github/workflows
config		config
data		data
docs		docs
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CNAME		CNAME
analyse_prices.py		analyse_prices.py
analyse_products.py		analyse_products.py
api.py		api.py
build_anomalies.py		build_anomalies.py
build_baskets.py		build_baskets.py
build_categories.py		build_categories.py
build_ci_subset.py		build_ci_subset.py
build_cpi.py		build_cpi.py
build_stores_index.py		build_stores_index.py
build_uat_geojson.py		build_uat_geojson.py
db.py		db.py
discover_gas_stations.py		discover_gas_stations.py
discover_stores.py		discover_stores.py
discover_stores_by_uat.py		discover_stores_by_uat.py
explore_api.py		explore_api.py
export_analytics.py		export_analytics.py
fetch_gas_prices.py		fetch_gas_prices.py
fetch_gas_reference.py		fetch_gas_reference.py
fetch_prices.py		fetch_prices.py
fetch_prices_by_uat.py		fetch_prices_by_uat.py
fetch_reference.py		fetch_reference.py
generate_map.py		generate_map.py
generate_site.py		generate_site.py
networks.py		networks.py
readme.md		readme.md
units.py		units.py
update_store_populations.py		update_store_populations.py

Folders and files

Latest commit

History

Repository files navigation

Monitorul Prețurilor

Pipeline overview

Workflow

First-time setup

Daily run

Weekly refresh (reference data may change)

Analysis (run any time after prices are loaded)

Data files in data/

Scripts

Retail

fetch_reference.py

fetch_prices.py

discover_stores.py

update_store_populations.py

discover_stores_by_uat.py (legacy)

fetch_prices_by_uat.py (legacy)

Gas

fetch_gas_reference.py

fetch_gas_prices.py

discover_gas_stations.py

Analysis

analyse_prices.py

analyse_products.py

Site generation

generate_site.py

generate_map.py

export_analytics.py

CI / Testing helpers

build_ci_subset.py