feat/direct get scraper by Infraviored · Pull Request #1 · Infraviored/DealMapper

Infraviored · 2026-06-18T17:13:52Z

feat: migrate to reusable UI components and lightweight translation system
style: fix linter errors and typecast compiler constraints
feat(scraper): add direct GET scraping and discovery demonstration scripts
feat: implement fast requests-based HTTP scraper, preserve legacy Selenium fallback, and add configurable politeness delays

Summary by CodeRabbit

Release Notes

New Features
- Added English/German internationalization across the app.
- Added cookie-based login/logout, “current user” endpoint, and protected API access.
- Scheduling now supports configurable delays between pages and listings.
Improvements
- Refreshed key screens with consistent shared UI components (Card, Button, Input) and localized labels.
- Enhanced scraping using a faster direct-HTTP discovery/enrichment flow, with Selenium kept as a fallback.
Bug Fixes
- Improved highlight handling for non-legacy listings and standardized UTC timestamps.

…ystem

…ripts

…enium fallback, and add configurable politeness delays

coderabbitai · 2026-06-18T17:14:08Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f853200e-052a-48ec-ab01-a85d7416499b

📥 Commits

Reviewing files that changed from the base of the PR and between 25adb7c and e1295a1.

📒 Files selected for processing (2)

backend/server.js
scraper/scraper.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/server.js
scraper/scraper.py

📝 Walkthrough

Walkthrough

Introduces a frontend i18n system (translations dictionary, useTranslation hook, shared Button/Card/Input primitives) wired across all major views and components, adds JWT/bcrypt authentication to backend API routes, adds delayBetweenPages/delayBetweenListings to the /api/schedule API and SettingsView. Refactors scraper.py to use requests+BeautifulSoup for discovery and harvesting with Selenium as a fallback, extracts the legacy Selenium path into a new scraper_selenium.py module, updates timestamps to UTC across agent_worker.py, and adds three standalone demo scripts.

Changes

Backend authentication and schedule configuration

Layer / File(s)	Summary
Auth: JWT/bcrypt initialization and user table `backend/package.json`, `backend/server.js`	Imports `jsonwebtoken` and `bcrypt`, defines `JWT_SECRET` from environment with hardcoded fallback and production-time validation, creates `users` table on startup, and seeds a default admin user with bcrypt hash when the table is empty.
Auth: authenticateToken middleware and endpoints `backend/server.js`	Introduces `authenticateToken` middleware extracting and verifying JWT from `token` cookie, fetching user from database, and returning `401` for invalid tokens. Adds `POST /api/auth/login` (validates email/password, verifies bcrypt hash, issues 7-day JWT, sets `httpOnly` cookie), `POST /api/auth/logout` (clears cookie), `GET /api/auth/me` (returns current user). Installs global `/api` middleware enforcing auth on all `/api/*` routes except the auth endpoints.
Schedule API: delay field persistence `backend/server.js`	`GET /api/schedule` default config gains `delayBetweenPages` and `delayBetweenListings` (default `0.25`). `POST /api/schedule` reads both from request body, parses as floats with `0.25` fallback for missing/negative/NaN, coerces `fullFetchOnStartup` to boolean, and persists all fields into `schedule_config.json`.

Frontend i18n system and UI component adoption

Layer / File(s)	Summary
Shared UI primitives: Button, Card, Input `frontend/src/components/ui/Button.tsx`, `frontend/src/components/ui/Card.tsx`, `frontend/src/components/ui/Input.tsx`	New `Button` supports `variant`/`size`/`loading` props with inline spinner SVG; `Card` renders a styled div with conditional interactive hover classes; `Input` uses `forwardRef` and applies error-state border styling.
i18n infrastructure: translations and useTranslation hook `frontend/src/i18n/translations.ts`, `frontend/src/hooks/useTranslation.ts`	Defines full en/de translation dictionaries across eight namespaces (common, landing, dashboard, wizard, settings, tuner, listing, auth) with `{{placeholder}}` interpolation. `useTranslation` provides module-level `globalLang` synced via listeners set and `localStorage`, dot-path `t()` resolver with English fallback, `toggleLanguage`, and `setLang`.
App.tsx: i18n wiring and all-view translation `frontend/src/App.tsx`	Wires `useTranslation` in `App()`, implements `checkAuth()` (`/api/auth/me`), `handleLoginSubmit()` (`/api/auth/login`), and `handleLogout()` (`/api/auth/logout`) with translated error messages. Displays translated loading screen and login form during auth flow. Refactors header session widget to use `Button` with translated strings. Replaces all hardcoded labels in landing/dashboard/edit/create-campaign views with `t()` calls, wraps form areas in `Card`, localizes dropdown options.
SettingsView: translation, delay fields, and UI component adoption `frontend/src/components/SettingsView.tsx`	Adds `useTranslation` and translates all UI text (back button, title, section headings, toggle labels, help text, errors). Extends `fetchConfig` to populate `delayBetweenPages`/`delayBetweenListings` from GET response (with `0.25` defaults). Updates `handleSave` to send and read back delay fields from `/api/schedule`. Refactors inputs to `Input` component with translated labels. Migrates Save/Cancel to `Button` with `loading={saving}`.
CriteriaTuner and ListingDetailCard localization `frontend/src/components/CriteriaTuner.tsx`, `frontend/src/components/ListingDetailCard.tsx`, `frontend/src/utils/listingTransformer.ts`	Adds `useTranslation` to each component; translates headings, labels, badges, status text; migrates boolean toggle and AI evaluation controls to `Button`; wraps `ListingDetailCard` in `Card`; localizes date/time formatting using `lang` for locale-aware display; derives per-dimension labels from translation keys; tightens `highlights` type cast from `any` to `Listing['highlights']`.
GuidelinesWizard localization `frontend/src/components/GuidelinesWizard.tsx`	Adds `useTranslation` and replaces all hardcoded strings across three wizard steps with translation keys; migrates copy-prompt, navigation, and save actions to shared `Button` components while preserving step logic and status rendering.

Scraper requests-based refactor and legacy Selenium preservation

Layer / File(s)	Summary
scraper.py: requests infrastructure and detail parsing `scraper/scraper.py`	Adds `requests`, `BeautifulSoup`, and `HEADERS` constant. Introduces `parse_listing_details_requests` to fetch detail pages and extract title/description from JSON-LD ImageObject schema (with HTML fallback), key/value details, and gallery image URLs. Routes `scrape_listings` to `scrape_listings_requests` (primary) or `scrape_listings_selenium` (fallback) based on `INTERACTIVE_LOGIN`. Implements `scrape_listings_requests` with paginated discovery, duplicate-ID detection, incremental JSON save, and per-page delays. Updates `preview_url_listings_count` to use `requests`+BeautifulSoup.
scraper.py: requests-based harvesting and update workflows `scraper/scraper.py`	Replaces per-listing Selenium harvest in `harvest_descriptions` with DB-only requests workflow: queries listings missing detailed info, fetches via `parse_listing_details_requests`, updates SQLite fields (`detailed_description`, JSON `details`/`images`, `full_info_obtained`, UTC `last_description_changed_at`) with per-listing error handling. Reworks `update_all_descriptions_session` to use requests parsing and conditional DB writes: re-fetches detail pages, compares normalized description to stored value, updates only when changed, sets timestamps.
scraper.py: Selenium helper functions and fallback discovery `scraper/scraper.py`	`save_logged_in_email` uses UTC ISO timestamps. `check_login_status` ensures navigation to projects page before reading login indicator. `manual_login` replaces quick-check branch with timed watcher loop (up to 300s). `scrape_listings_selenium` implements interactive-login fallback with persistent Chrome profile, automation-disabling flags, runs `manual_login`, early-return for validation. Inner discovery mirrors requests path: /seite: pagination URL rewriting, waits for results container, BeautifulSoup listing parsing, dedup checks, max-listing limits, incremental output_file save, per-page delays.
scraper_selenium.py: full legacy Selenium scraper module `scraper/scraper_selenium.py`	New module preserving complete Selenium-based scraper: `update_progress` (JSON status), `harvest_missing_descriptions` (DB-driven per-listing detail extraction), cookie/session persistence (pickle-based), `manual_login` with interactive watcher, `get_detailed_description`, `scrape_page` (search result parsing with dedup), `scrape_listings` (full orchestration with persistent profile and anti-detection), `preview_url_listings_count`, and session wrappers.
Demo scripts: requests-based scraping examples `scripts/demo_direct_get.py`, `scripts/demo_discover_listings.py`, `scripts/demo_full_pipeline.py`	Three standalone scripts demonstrating requests-based scraping without Selenium: single-listing detail parsing (schema JSON-LD with HTML fallback, key/value details, gallery images, price); paginated search discovery (pagination detection, listing extraction, dedup, preview); and end-to-end pipeline (chained discovery and detail harvesting with formatted summary).
agent_worker.py: UTC timestamp standardization `scraper/agent_worker.py`	Updates logging and database writes to use UTC-based ISO timestamps instead of local-time format for log file naming, RUN headers, and SQLite updates to `llm_processed_time` and `last_ai_evaluated_at`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐇 Hop, hop! The rabbit translates the land,
With t('landing.title') keys so grand.
Requests replace Selenium's slow crawl,
Button, Card, and Input stand tall.
Auth guards the API with JWT's might—
Two tongues, two workflows, both polished bright! 🌐

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title 'feat/direct get scraper' is partially related to the changeset but does not capture the full scope of changes. While the direct GET scraper is a significant feature, the PR includes substantial frontend i18n/UI component work, backend auth improvements, and multi-platform scraper refactoring that are not reflected in the title.	Consider a more comprehensive title such as 'feat: direct GET scraper with auth, i18n, and UI components' or 'feat: HTTP-based scraper, authentication system, and internationalization layer' to better represent the multi-faceted nature of the changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/direct-get-scraper

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request introduces a lightweight internationalization (i18n) system with English and German translations, refactors several frontend components to use standardized UI elements (Button, Card, Input), and optimizes the scraper backend to support a fast requests-based pipeline alongside the legacy Selenium implementation. The code review feedback highlights several key areas for improvement: 1. Input Validation: Ensure configured scraper delays are validated as non-negative numbers in the backend to prevent runtime crashes. 2. Internationalization & Redundancy: Replace redundant ternary operators and hardcoded dimension labels in the frontend with the newly introduced translation keys. 3. Code Safety: Use replaceAll instead of dynamic RegExp in the translation hook to avoid injection vulnerabilities. 4. Defensive Programming: Guard against potential IndexError and TypeError exceptions in the scraper when parsing malformed URLs or empty JSON-LD scripts. 5. Timezone Safety: Consistently use timezone-aware UTC datetimes instead of naive local datetimes to prevent synchronization mismatches.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-18T17:15:46Z

+      delayBetweenPages: parseFloat(delayBetweenPages !== undefined ? delayBetweenPages : 0.25),
+      delayBetweenListings: parseFloat(delayBetweenListings !== undefined ? delayBetweenListings : 0.25)


The parsed values for delayBetweenPages and delayBetweenListings are not validated to ensure they are non-negative numbers. If a user configures a negative value, Python's time.sleep() in the scraper will throw a ValueError and crash the background process. Additionally, if parseFloat returns NaN, it will be serialized as null in the JSON file, which might cause issues when read.

Validate that the parsed values are valid, non-negative numbers, falling back to 0.25 if invalid.

gemini-code-assist · 2026-06-18T17:15:46Z

                <span className="text-slate-750">|</span>
-                <h2 className="text-base font-bold text-slate-200">{campaigns.find(c => c.id === currentCampaignId)?.name} Dashboard</h2>
+                <h2 className="text-base font-bold text-slate-200">
+                  {campaigns.find(c => c.id === currentCampaignId)?.name} {lang === 'en' ? 'Dashboard' : 'Dashboard'}


The ternary operator lang === 'en' ? 'Dashboard' : 'Dashboard' is redundant and hardcoded. Since you are migrating to a lightweight translation system, you should use the translation key listing.dashboardTitle from your translations file.

gemini-code-assist · 2026-06-18T17:15:46Z

+                        trustworthiness: lang === 'en' ? 'Trustworthiness' : 'Vertrauenswürdigkeit',
+                        transparency: lang === 'en' ? 'Transparency' : 'Transparenz',
+                        conditionConfidence: lang === 'en' ? 'Condition Confidence' : 'Zustandssicherheit',
+                        documentationQuality: lang === 'en' ? 'Documentation Quality' : 'Dokumentationsqualität',
+                        hiddenRiskSuspicion: lang === 'en' ? 'Hidden Risk Suspicion' : 'Versteckter Risikoverdacht',
+                        marketAboveAverageSignal: lang === 'en' ? 'Market Above Average Signal' : 'Überdurchschnittliches Marktsignal'


These soft dimension labels are hardcoded with inline language checks. To align with the lightweight translation system introduced in this PR, these strings should be moved to the translation dictionary (translations.ts) and accessed via the t function.

gemini-code-assist · 2026-06-18T17:15:46Z

+    let result = current;
+    if (replacements) {
+      Object.entries(replacements).forEach(([k, v]) => {
+        result = result.replace(new RegExp(`{{${k}}}`, 'g'), String(v));


Using new RegExp with dynamic keys can be unsafe if the keys contain special regex characters. Since you are performing a simple global string replacement, using replaceAll is safer, cleaner, and more performant.

gemini-code-assist · 2026-06-18T17:15:46Z

+                    parts = base_url.split("/")
+                    domain_part = "/".join(parts[:3])
+                    path_part = parts[3]


If base_url has fewer than 4 parts when split by / (e.g., if an invalid URL or just the domain is entered), accessing parts[3] will raise an IndexError and crash the scraper. Guard against this by checking the length of parts first.

gemini-code-assist · 2026-06-18T17:15:46Z

+        for script in soup.find_all("script", type="application/ld+json"):
+            try:
+                data = json.loads(script.string)


If a <script> tag of type application/ld+json is empty or has no text content, script.string will be None. Passing None to json.loads() raises a TypeError. While this is caught by the try...except block, it is better to defensively check that script.string is not empty before parsing.

gemini-code-assist · 2026-06-18T17:15:46Z

+                    parsed["detailed_description"],
+                    json.dumps(parsed["details"]),
+                    json.dumps(parsed["images"]),
+                    datetime.datetime.now().isoformat(),


Using naive local datetimes (datetime.datetime.now()) can lead to timezone mismatch bugs when compared with backend-generated timestamps (e.g., if the backend runs in UTC and the scraper runs in local time). Use timezone-aware UTC datetimes consistently across the system.

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scraper/scraper.py (1)

331-425: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure SQLite connections are closed on all error paths.

Both functions close conn only on the happy path. Any exception before the trailing conn.close() can leave the connection open and eventually cause lock contention in subsequent runs.

Suggested fix

 def harvest_descriptions(campaign_id=None):
@@
-    try:
-        conn = sqlite3.connect(db_path)
+    conn = None
+    try:
+        conn = sqlite3.connect(db_path)
         conn.row_factory = sqlite3.Row
@@
-        conn.close()
     except Exception as e:
         logger.error(f"Error in requests-based harvest_descriptions: {str(e)}")
+    finally:
+        if conn is not None:
+            conn.close()
@@
 def update_all_descriptions_session(campaign_id=None):
@@
-    try:
-        conn = sqlite3.connect(db_path)
+    conn = None
+    try:
+        conn = sqlite3.connect(db_path)
         conn.row_factory = sqlite3.Row
@@
-        conn.close()
     except Exception as e:
         logger.error(
             f"Error in requests-based update_all_descriptions_session: {str(e)}"
         )
+    finally:
+        if conn is not None:
+            conn.close()

Also applies to: 440-532

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper/scraper.py` around lines 331 - 425, The SQLite connection opened with
sqlite3.connect(db_path) is only closed in the try block's happy path, leaving
it open if any exception occurs before reaching conn.close(). Wrap the entire
try block's operations in a try-finally structure where the finally block
ensures conn.close() is always executed, or alternatively use a with statement
for the database connection to automatically manage its lifecycle. This same
pattern needs to be applied to the other similar function mentioned in the
comment (also applies to lines 440-532).

🧹 Nitpick comments (2)

scraper/scraper_selenium.py (1)

219-226: 💤 Low value

Pickle deserialization from local file has limited attack surface.

Static analysis flags pickle.load as unsafe (S301). In this context, the cookie file is application-controlled and stored locally, so exploitation requires prior file system access. The current try/except pattern around add_cookie is good defensive practice. Consider adding a comment documenting this risk/tradeoff for future maintainers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper/scraper_selenium.py` around lines 219 - 226, Add a comment above the
pickle.load() call in the cookie loading block to document the security
consideration and justification. The comment should explain that while
pickle.load is flagged as unsafe by static analysis tools (S301), this usage is
acceptable because the cookie file is application-controlled and stored locally,
which limits the attack surface and requires prior file system access for
exploitation. This will help future maintainers understand the deliberate
tradeoff being made here.
Source: Linters/SAST tools

frontend/src/components/ListingDetailCard.tsx (1)

320-327: ⚡ Quick win

Centralize soft-dimension labels in translations instead of inline language branching.

This duplicates i18n logic in the component and makes new-language rollout harder. Use translation keys for these labels like the rest of the card.

♻️ Suggested refactor

-const labels: Record<string, string> = {
-  trustworthiness: lang === 'en' ? 'Trustworthiness' : 'Vertrauenswürdigkeit',
-  transparency: lang === 'en' ? 'Transparency' : 'Transparenz',
-  conditionConfidence: lang === 'en' ? 'Condition Confidence' : 'Zustandssicherheit',
-  documentationQuality: lang === 'en' ? 'Documentation Quality' : 'Dokumentationsqualität',
-  hiddenRiskSuspicion: lang === 'en' ? 'Hidden Risk Suspicion' : 'Versteckter Risikoverdacht',
-  marketAboveAverageSignal: lang === 'en' ? 'Market Above Average Signal' : 'Überdurchschnittliches Marktsignal'
-};
-const label = labels[key] || key;
+const label = t(`listing.softDimensionsLabels.${key}`);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/ListingDetailCard.tsx` around lines 320 - 327, Move
the soft-dimension labels (trustworthiness, transparency, conditionConfidence,
documentationQuality, hiddenRiskSuspicion, marketAboveAverageSignal) from the
inline Record definition with ternary language branching into your
translation/i18n system. Replace the hardcoded labels object with translation
key references using the same i18n pattern already used elsewhere in the
ListingDetailCard component, eliminating the redundant lang === 'en' checks and
making it easier to support new languages without modifying the component code.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/server.js`:
- Around line 645-647: The delayBetweenPages and delayBetweenListings properties
are using parseFloat with only undefined checks, which allows invalid values
like NaN or negative numbers to be persisted. Add validation after each
parseFloat call to ensure the parsed number is a valid positive number; if the
value is NaN or negative, use the default value (0.25) instead. Apply this
validation logic consistently to both delay properties to prevent invalid delays
from corrupting the config.

In `@frontend/src/App.tsx`:
- Around line 1012-1016: The select dropdown in the status filter has a mismatch
between the option value and label. The option with value="New" uses the
translation key dashboard.statusEvaluated which suggests an "evaluated only"
filter, but the underlying predicate filters for status === 'New'. Change the
label for the option with value="New" to use a translation key that accurately
describes filtering for new items (e.g., dashboard.statusNew or similar) instead
of dashboard.statusEvaluated to align the UI label with the actual filtering
behavior.
- Around line 948-950: The h2 element in the dashboard header contains a no-op
ternary operator that always renders "Dashboard" regardless of the language,
which bypasses the i18n system. Replace the entire ternary expression `lang ===
'en' ? 'Dashboard' : 'Dashboard'` with a proper i18n translation function call
(such as t('dashboard') or equivalent based on your application's translation
system) to ensure the dashboard title is properly localized.

In `@frontend/src/components/ListingDetailCard.tsx`:
- Line 395: The neutral status in the ternary operator evaluating
evalItem.status is hardcoded as the English string 'NEUTRAL' instead of being
translated like the satisfied and violated statuses. Replace the hardcoded
'NEUTRAL' string with t('listing.neutral').toUpperCase() to apply the same
translation and uppercase formatting pattern used for the other two status cases
in the ternary expression.

In `@frontend/src/components/SettingsView.tsx`:
- Line 300: The button text on line 300 in SettingsView.tsx is using the
translation key `settings.scraperRules`, which is meant for a section title, not
a button action. Replace this translation key with an appropriate save-action
translation key (such as `common.save` or `actions.save`) that properly
describes the submit action for this button, ensuring the UI correctly labels
the primary CTA.
- Line 80: The setTimeout call in SettingsView that clears the success message
does not cancel previously scheduled timers, causing multiple overlapping
timeouts to potentially fire and clear the message prematurely, plus leaving
dangling timers on unmount. Store the timeout ID from each setTimeout in a ref
variable, and before scheduling a new timeout in the setSuccess handler, first
call clearTimeout on any existing timeout ID stored in that ref to ensure only
one active timer exists at any time.

In `@frontend/src/components/ui/Button.tsx`:
- Around line 9-17: The Button component is missing an explicit type attribute,
which causes HTML to default to type="submit" inside forms, leading to
unintended form submissions. Add a type prop with a default value of 'button' to
the Button component's destructured parameters (alongside variant, size,
loading, disabled, className, children, and props), then pass this type prop to
the underlying button element to ensure the button behaves as expected in all
contexts.

In `@frontend/src/components/ui/Input.tsx`:
- Around line 8-18: The Input component in its forwardRef callback needs to
expose the error state semantically for accessibility. Add the aria-invalid
attribute to the input element, setting it to the boolean value of the error
prop. Position the aria-invalid attribute before the {...props} spread operator
so that consumers can override the accessibility attribute if needed through the
props parameter.

In `@frontend/src/hooks/useTranslation.ts`:
- Around line 8-11: The globalLang initialization casts the localStorage value
directly to Language type without validating it matches the supported locales.
Add validation after retrieving the value from localStorage using
localStorage.getItem('ui-lang') to check if it is either 'en' or 'de'. If the
retrieved value is not one of the supported locales, fall back to a default
language (such as 'en'). This ensures only valid language values are assigned to
globalLang.

In `@frontend/src/utils/listingTransformer.ts`:
- Line 149: The type assertion on line 149 using `as Listing['highlights']` does
not validate the runtime shape of the highlights data from the LLM payload.
Create a validation function that checks each highlight object in the
extracted_facts.highlights array to ensure the sentiment, type, and confidence
fields contain only the expected literal values (sentiment: 'positive' |
'negative' | 'neutral', type: 'maintenance' | 'warning' | 'feature', confidence:
'high' | 'med' | 'low'). Apply this validation function to filter or validate
the highlights before the type assertion in the highlights line, ensuring only
valid highlights are passed to the Listing type and preventing silent failures
in downstream UI code like ListingDetailCard.tsx that checks conditions such as
h.sentiment === 'negative'.

In `@scraper/scraper_selenium.py`:
- Around line 721-724: The cookie loading loop that iterates over pickled
cookies and calls driver.add_cookie(cookie) lacks error handling for individual
cookies. Wrap each driver.add_cookie(cookie) call within a try/except block
similar to the pattern used in the load_cookies() helper function. This ensures
that if a single cookie fails to load, the loop continues processing remaining
cookies instead of aborting the entire session.
- Around line 386-393: The exception handler in the except block attempts to use
`current_url` which may not be defined if an exception occurs before the
assignment at line 356, causing a NameError. Additionally, the silent except
handler at lines 391-392 hides navigation failures. Initialize `current_url`
before the try block to ensure it's always defined, and replace the silent
`except: pass` clause with a proper logger.error call to capture and log
navigation failures instead of silently swallowing them.
- Around line 65-201: The database connection created with sqlite3.connect at
the beginning of the harvest_missing_descriptions function is not guaranteed to
be closed if an exception occurs before reaching the conn.close() call at the
end. Replace the current try/except structure with a context manager (using the
with statement) for the sqlite3.connect call to ensure the connection is
automatically closed and resources are properly cleaned up even if an exception
occurs during the listing harvesting loop or database operations.

In `@scraper/scraper.py`:
- Around line 445-461: The current update logic for listings only triggers a
database update when the description text changes, causing specs and images
changes to be ignored if the description remains the same. Modify the
conditional logic that determines whether to update a listing row (in the
section starting around line 478) to check not only if the description has
changed, but also if there are any changes to the specs or images. This ensures
that all changes to listing details are persisted to the database, even when the
description text itself remains unchanged.

In `@scripts/demo_direct_get.py`:
- Around line 123-125: The fallback CSS class selector in the soup.select_one()
call for price_elem contains a typo in the class name format. The selector uses
ad-item-main--middle--price-shipping--price but should use
aditem-main--middle--price-shipping--price (without the hyphen between ad and
item) to match the correct class naming convention used elsewhere in the code.
Update the first selector string in the soup.select_one() call to correct this
class name so that price extraction works properly when the primary id selector
is not available.

In `@scripts/demo_full_pipeline.py`:
- Around line 88-90: The price extraction in the code directly chains
select_one() and get_text() methods, which will raise an exception and skip the
entire listing if the price element is missing. Add a null-safety check after
the select_one() call to verify the element exists before calling get_text() on
it, allowing listings with missing prices to be handled gracefully rather than
being dropped.

---

Outside diff comments:
In `@scraper/scraper.py`:
- Around line 331-425: The SQLite connection opened with
sqlite3.connect(db_path) is only closed in the try block's happy path, leaving
it open if any exception occurs before reaching conn.close(). Wrap the entire
try block's operations in a try-finally structure where the finally block
ensures conn.close() is always executed, or alternatively use a with statement
for the database connection to automatically manage its lifecycle. This same
pattern needs to be applied to the other similar function mentioned in the
comment (also applies to lines 440-532).

---

Nitpick comments:
In `@frontend/src/components/ListingDetailCard.tsx`:
- Around line 320-327: Move the soft-dimension labels (trustworthiness,
transparency, conditionConfidence, documentationQuality, hiddenRiskSuspicion,
marketAboveAverageSignal) from the inline Record definition with ternary
language branching into your translation/i18n system. Replace the hardcoded
labels object with translation key references using the same i18n pattern
already used elsewhere in the ListingDetailCard component, eliminating the
redundant lang === 'en' checks and making it easier to support new languages
without modifying the component code.

In `@scraper/scraper_selenium.py`:
- Around line 219-226: Add a comment above the pickle.load() call in the cookie
loading block to document the security consideration and justification. The
comment should explain that while pickle.load is flagged as unsafe by static
analysis tools (S301), this usage is acceptable because the cookie file is
application-controlled and stored locally, which limits the attack surface and
requires prior file system access for exploitation. This will help future
maintainers understand the deliberate tradeoff being made here.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 87d1f053-bd6e-42e9-9c4f-3b7220aa747d

📥 Commits

Reviewing files that changed from the base of the PR and between e92cd69 and df86c79.

📒 Files selected for processing (17)

backend/server.js
frontend/src/App.tsx
frontend/src/components/CriteriaTuner.tsx
frontend/src/components/GuidelinesWizard.tsx
frontend/src/components/ListingDetailCard.tsx
frontend/src/components/SettingsView.tsx
frontend/src/components/ui/Button.tsx
frontend/src/components/ui/Card.tsx
frontend/src/components/ui/Input.tsx
frontend/src/hooks/useTranslation.ts
frontend/src/i18n/translations.ts
frontend/src/utils/listingTransformer.ts
scraper/scraper.py
scraper/scraper_selenium.py
scripts/demo_direct_get.py
scripts/demo_discover_listings.py
scripts/demo_full_pipeline.py

coderabbitai · 2026-06-18T17:30:22Z

+                  <option value="All">{t('dashboard.statusAll')}</option>
+                  <option value="High Niceness">{t('dashboard.statusMatches')} (70+)</option>
+                  <option value="Evaluate with AI">{t('dashboard.statusPending')}</option>
+                  <option value="New">{t('dashboard.statusEvaluated')}</option>
                </select>


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix status filter label/value mismatch in the dropdown.

The option with value="New" is labeled as AI Evaluated Only, but the predicate for New filters l.status === 'New'. This makes the UI misleading and removes a true evaluated-only filter path.

Minimal alignment fix (label matches current behavior)

<option value="All">{t('dashboard.statusAll')}</option> <option value="High Niceness">{t('dashboard.statusMatches')} (70+)</option> <option value="Evaluate with AI">{t('dashboard.statusPending')}</option> - <option value="New">{t('dashboard.statusEvaluated')}</option> + <option value="New">{t('landing.new')}</option>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/App.tsx` around lines 1012 - 1016, The select dropdown in the status filter has a mismatch between the option value and label. The option with value="New" uses the translation key dashboard.statusEvaluated which suggests an "evaluated only" filter, but the underlying predicate filters for status === 'New'. Change the label for the option with value="New" to use a translation key that accurately describes filtering for new items (e.g., dashboard.statusNew or similar) instead of dashboard.statusEvaluated to align the UI label with the actual filtering behavior.

coderabbitai · 2026-06-18T17:30:22Z

                      </div>
                      <span className={`text-[10px] font-bold px-2 py-0.5 rounded-md ${evalItem.status === 'satisfied' ? 'bg-emerald-500/10 text-emerald-400' : evalItem.status === 'violated' ? 'bg-rose-500/10 text-rose-400' : 'bg-slate-800 text-slate-400'}`}>
-                        {evalItem.status === 'satisfied' ? 'SATISFIED' : evalItem.status === 'violated' ? 'VIOLATED' : 'NEUTRAL'}
+                        {evalItem.status === 'satisfied' ? t('listing.satisfied').toUpperCase() : evalItem.status === 'violated' ? t('listing.violated').toUpperCase() : 'NEUTRAL'}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Localize the neutral checklist status.

The satisfied/violated statuses are translated, but the neutral branch is still hardcoded as English text.

🌐 Suggested fix

-{evalItem.status === 'satisfied' ? t('listing.satisfied').toUpperCase() : evalItem.status === 'violated' ? t('listing.violated').toUpperCase() : 'NEUTRAL'} +{evalItem.status === 'satisfied' + ? t('listing.satisfied').toUpperCase() + : evalItem.status === 'violated' + ? t('listing.violated').toUpperCase() + : t('listing.neutral').toUpperCase()}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{evalItem.status === 'satisfied' ? t('listing.satisfied').toUpperCase() : evalItem.status === 'violated' ? t('listing.violated').toUpperCase() : 'NEUTRAL'}

{evalItem.status === 'satisfied'

? t('listing.satisfied').toUpperCase()

: evalItem.status === 'violated'

? t('listing.violated').toUpperCase()

: t('listing.neutral').toUpperCase()}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/components/ListingDetailCard.tsx` at line 395, The neutral status in the ternary operator evaluating evalItem.status is hardcoded as the English string 'NEUTRAL' instead of being translated like the satisfied and violated statuses. Replace the hardcoded 'NEUTRAL' string with t('listing.neutral').toUpperCase() to apply the same translation and uppercase formatting pattern used for the other two status cases in the ternary expression.

coderabbitai · 2026-06-18T17:30:22Z

+        setDelayBetweenPages(data.config.delayBetweenPages ?? 0.25);
+        setDelayBetweenListings(data.config.delayBetweenListings ?? 0.25);
+        setSuccess(t('settings.saveSuccess'));
        setTimeout(() => setSuccess(''), 4000);


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clear existing success timers before scheduling a new one.

Line 80 starts a new timeout every save without cancelling prior ones. Rapid saves can clear the newest message too early, and a pending timer may fire after unmount.

Suggested fix

-import { useState, useEffect } from 'react'; +import { useState, useEffect, useRef } from 'react'; @@ export default function SettingsView({ onBack }: SettingsViewProps) { + const successTimerRef = useRef<number | null>(null); @@ - setSuccess(t('settings.saveSuccess')); - setTimeout(() => setSuccess(''), 4000); + setSuccess(t('settings.saveSuccess')); + if (successTimerRef.current) { + window.clearTimeout(successTimerRef.current); + } + successTimerRef.current = window.setTimeout(() => setSuccess(''), 4000); @@ + useEffect(() => { + return () => { + if (successTimerRef.current) { + window.clearTimeout(successTimerRef.current); + } + }; + }, []);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/components/SettingsView.tsx` at line 80, The setTimeout call in SettingsView that clears the success message does not cancel previously scheduled timers, causing multiple overlapping timeouts to potentially fire and clear the message prematurely, plus leaving dangling timers on unmount. Store the timeout ID from each setTimeout in a ref variable, and before scheduling a new timeout in the setSuccess handler, first call clearTimeout on any existing timeout ID stored in that ref to ensure only one active timer exists at any time.

coderabbitai · 2026-06-18T17:30:23Z

+    except Exception as e:
+        logger.error(f"Error getting detailed description: {str(e)}")
+        # Try to go back to the search results
+        try:
+            driver.get(current_url)
+        except Exception:
+            pass
+        return ""


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Potential NameError if driver.current_url fails, and silent exception swallows errors.

If the exception originates at line 356 (before current_url is assigned), line 390 raises NameError. Additionally, the silent except: pass at lines 391-392 hides navigation failures.

🛠️ Suggested fix

+ current_url = None try: # Store current URL to return to later current_url = driver.current_url ... except Exception as e: logger.error(f"Error getting detailed description: {str(e)}") # Try to go back to the search results - try: - driver.get(current_url) - except Exception: - pass + if current_url: + try: + driver.get(current_url) + except Exception as nav_err: + logger.warning(f"Failed to navigate back: {nav_err}") return ""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as e:

logger.error(f"Error getting detailed description: {str(e)}")

# Try to go back to the search results

try:

driver.get(current_url)

except Exception:

pass

return ""

current_url = None

try:

# Store current URL to return to later

current_url = driver.current_url

...

except Exception as e:

logger.error(f"Error getting detailed description: {str(e)}")

# Try to go back to the search results

if current_url:

try:

driver.get(current_url)

except Exception as nav_err:

logger.warning(f"Failed to navigate back: {nav_err}")

return ""

🧰 Tools

🪛 Ruff (0.15.17)

[warning] 386-386: Do not catch blind exception: Exception

(BLE001)

[warning] 387-387: Use explicit conversion flag

Replace with conversion flag

(RUF010)

[error] 391-392: try-except-pass detected, consider logging the exception

(S110)

[warning] 391-391: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scraper/scraper_selenium.py` around lines 386 - 393, The exception handler in the except block attempts to use `current_url` which may not be defined if an exception occurs before the assignment at line 356, causing a NameError. Additionally, the silent except handler at lines 391-392 hides navigation failures. Initialize `current_url` before the try block to ensure it's always defined, and replace the silent `except: pass` clause with a proper logger.error call to capture and log navigation failures instead of silently swallowing them.

Source: Linters/SAST tools

coderabbitai · 2026-06-18T17:30:23Z

+            with open(cookies_path, "rb") as f:
+                cookies = pickle.load(f)
+                for cookie in cookies:
+                    driver.add_cookie(cookie)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Cookie loading lacks per-cookie error handling, unlike load_cookies().

The load_cookies() helper (lines 221-226) wraps each add_cookie() in try/except to skip problematic cookies. Here, a single failing cookie will abort the entire session. Apply the same defensive pattern.

🛠️ Suggested fix

with open(cookies_path, "rb") as f: cookies = pickle.load(f) for cookie in cookies: - driver.add_cookie(cookie) + try: + driver.add_cookie(cookie) + except Exception as e: + logger.warning(f"Error loading cookie: {e}") logger.info("Loaded persistent session cookies for description harvesting")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with open(cookies_path, "rb") as f:

cookies = pickle.load(f)

for cookie in cookies:

driver.add_cookie(cookie)

with open(cookies_path, "rb") as f:

cookies = pickle.load(f)

for cookie in cookies:

try:

driver.add_cookie(cookie)

except Exception as e:

logger.warning(f"Error loading cookie: {e}")

🧰 Tools

🪛 OpenGrep (1.22.0)

[ERROR] 722-722: pickle.load/loads deserializes arbitrary Python objects and can execute arbitrary code. Use a safe format like JSON instead.

(coderabbit.deserialization.python-pickle)

🪛 Ruff (0.15.17)

[error] 722-722: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scraper/scraper_selenium.py` around lines 721 - 724, The cookie loading loop that iterates over pickled cookies and calls driver.add_cookie(cookie) lacks error handling for individual cookies. Wrap each driver.add_cookie(cookie) call within a try/except block similar to the pattern used in the load_cookies() helper function. This ensures that if a single cookie fails to load, the loop continues processing remaining cookies instead of aborting the entire session.

coderabbitai · 2026-06-18T17:30:23Z

+        if campaign_id is not None:
+            cursor.execute(
+                """
+                SELECT l.id, l.url, l.title, l.detailed_description FROM listings l
+                JOIN searches s ON l.search_id = s.id
+                WHERE s.enabled = 1 AND s.campaign_id = ?
+            """,
+                (campaign_id,),
+            )
+        else:
+            cursor.execute(
+                """
+                SELECT l.id, l.url, l.title, l.detailed_description FROM listings l
+                JOIN searches s ON l.search_id = s.id
+                WHERE s.enabled = 1
+            """
+            )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persist details/images updates even when description text is unchanged.

Current logic updates the row only when description changes. If listing specs or images change while description remains the same, DB data stays stale even though fresh values were parsed.

Suggested fix

- SELECT l.id, l.url, l.title, l.detailed_description FROM listings l + SELECT l.id, l.url, l.title, l.detailed_description, l.details, l.images FROM listings l @@ - SELECT l.id, l.url, l.title, l.detailed_description FROM listings l + SELECT l.id, l.url, l.title, l.detailed_description, l.details, l.images FROM listings l @@ old_description = r["detailed_description"] or "" + old_details = r["details"] or "{}" + old_images = r["images"] or "[]" @@ detailed_description = parsed["detailed_description"] or "" + details_json = json.dumps(parsed["details"], ensure_ascii=False, sort_keys=True) + images_json = json.dumps(parsed["images"], ensure_ascii=False) - if detailed_description.strip() != old_description.strip(): + if ( + detailed_description.strip() != old_description.strip() + or details_json != old_details + or images_json != old_images + ): @@ - json.dumps(parsed["details"]), - json.dumps(parsed["images"]), + details_json, + images_json,

Also applies to: 478-522

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scraper/scraper.py` around lines 445 - 461, The current update logic for listings only triggers a database update when the description text changes, causing specs and images changes to be ignored if the description remains the same. Modify the conditional logic that determines whether to update a listing row (in the section starting around line 478) to check not only if the description has changed, but also if there are any changes to the specs or images. This ensures that all changes to listing details are persisted to the database, even when the description text itself remains unchanged.

coderabbitai · 2026-06-18T17:30:23Z

+        price_elem = soup.select_one(
+            'h2.ad-item-main--middle--price-shipping--price, [itemprop="price"]'
+        )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix fallback price selector typo.

The fallback selector currently uses ad-item..., which does not match the class used elsewhere (aditem...). This can miss price extraction when #viewad-price is absent.

Suggested fix

- price_elem = soup.select_one( - 'h2.ad-item-main--middle--price-shipping--price, [itemprop="price"]' - ) + price_elem = soup.select_one( + 'p.aditem-main--middle--price-shipping--price, [itemprop="price"]' + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/demo_direct_get.py` around lines 123 - 125, The fallback CSS class selector in the soup.select_one() call for price_elem contains a typo in the class name format. The selector uses ad-item-main--middle--price-shipping--price but should use aditem-main--middle--price-shipping--price (without the hyphen between ad and item) to match the correct class naming convention used elsewhere in the code. Update the first selector string in the soup.select_one() call to correct this class name so that price extraction works properly when the primary id selector is not available.

coderabbitai · 2026-06-18T17:30:23Z

+            price = item.select_one(
+                "p.aditem-main--middle--price-shipping--price"
+            ).get_text(strip=True)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid skipping valid listings when the price node is missing.

This direct chained call can raise and drop the whole listing record if the price element is absent. Make extraction null-safe.

Suggested fix

- price = item.select_one( - "p.aditem-main--middle--price-shipping--price" - ).get_text(strip=True) + price_elem = item.select_one( + "p.aditem-main--middle--price-shipping--price" + ) + price = price_elem.get_text(strip=True) if price_elem else "N/A"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

price = item.select_one(

"p.aditem-main--middle--price-shipping--price"

).get_text(strip=True)

price_elem = item.select_one(

"p.aditem-main--middle--price-shipping--price"

)

price = price_elem.get_text(strip=True) if price_elem else "N/A"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/demo_full_pipeline.py` around lines 88 - 90, The price extraction in the code directly chains select_one() and get_text() methods, which will raise an exception and skip the entire listing if the price element is missing. Add a null-safety check after the select_one() call to verify the element exists before calling get_text() on it, allowing listings with missing prices to be handled gracefully rather than being dropped.

coderabbitai

🧹 Nitpick comments (2)

scraper/agent_worker.py (1)

301-302: ⚡ Quick win

Use one captured UTC timestamp per row update for both datetime columns.

llm_processed_time and last_ai_evaluated_at are written as a pair, but they currently use two separate now() calls, which can drift by microseconds. Capture once and reuse so both fields stay exactly aligned.

Suggested patch

@@
-            cursor.execute(
+            processed_at = datetime.datetime.now(datetime.timezone.utc).isoformat()
+            cursor.execute(
@@
-                (
-                    datetime.datetime.now(datetime.timezone.utc).isoformat(),
-                    datetime.datetime.now(datetime.timezone.utc).isoformat(),
+                (
+                    processed_at,
+                    processed_at,
                     json.dumps(facts_envelope),
                     listing_id,
                 ),
             )
@@
-            cursor.execute(
+            processed_at = datetime.datetime.now(datetime.timezone.utc).isoformat()
+            cursor.execute(
@@
-                (
-                    datetime.datetime.now(datetime.timezone.utc).isoformat(),
-                    datetime.datetime.now(datetime.timezone.utc).isoformat(),
+                (
+                    processed_at,
+                    processed_at,
                     full_info,
                     json.dumps(facts_envelope),
                     score,

Also applies to: 571-572

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper/agent_worker.py` around lines 301 - 302, The code at lines 301-302
makes two separate calls to
datetime.datetime.now(datetime.timezone.utc).isoformat() for the
llm_processed_time and last_ai_evaluated_at columns, which can cause microsecond
drift between the two timestamps. Capture the UTC timestamp once into a variable
before the update statement and reuse that same variable for both datetime
columns instead of calling .now() twice. Apply the same fix at lines 571-572
where this same pattern occurs.

scraper/scraper.py (1)

270-274: ⚡ Quick win

Excessive file I/O: writing to disk on every single listing.

This writes the entire JSON file after each listing is scraped, causing O(n²) total bytes written and high I/O overhead. For crash recovery, batching once per page would provide similar resilience with much better performance.

♻️ Suggested refactor: batch write per page

Move the file write outside the item loop to write once per page:

                     existing_ids.add(listing_id)
                     all_scraped_listings.append(listing)
                     scraped_count += 1
 
-                    if output_file:
-                        with open(output_file, "w", encoding="utf-8") as f:
-                            json.dump(
-                                existing_listings, f, ensure_ascii=False, indent=4
-                            )
-
                 except Exception as e:
                     logger.error(f"Error scraping single listing: {str(e)}")
                     continue
 
             logger.info(
                 f"Scraped {scraped_count} discovered listings from {current_url}"
             )
+
+            # Save after each page for crash recovery
+            if output_file and scraped_count > 0:
+                with open(output_file, "w", encoding="utf-8") as f:
+                    json.dump(existing_listings, f, ensure_ascii=False, indent=4)
 
         except Exception as e:
             logger.error(f"Error scraping page {current_url}: {str(e)}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper/scraper.py` around lines 270 - 274, The file write operation using
json.dump on existing_listings is currently inside the item loop, causing the
entire JSON file to be written after each listing is processed, resulting in
excessive I/O overhead. Move the entire block that opens output_file and calls
json.dump outside the innermost loop that processes individual listings so that
the file is written once per page rather than once per listing. This will
maintain crash recovery capabilities while significantly improving performance
by reducing write operations.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scraper/agent_worker.py`:
- Around line 301-302: The code at lines 301-302 makes two separate calls to
datetime.datetime.now(datetime.timezone.utc).isoformat() for the
llm_processed_time and last_ai_evaluated_at columns, which can cause microsecond
drift between the two timestamps. Capture the UTC timestamp once into a variable
before the update statement and reuse that same variable for both datetime
columns instead of calling .now() twice. Apply the same fix at lines 571-572
where this same pattern occurs.

In `@scraper/scraper.py`:
- Around line 270-274: The file write operation using json.dump on
existing_listings is currently inside the item loop, causing the entire JSON
file to be written after each listing is processed, resulting in excessive I/O
overhead. Move the entire block that opens output_file and calls json.dump
outside the innermost loop that processes individual listings so that the file
is written once per page rather than once per listing. This will maintain crash
recovery capabilities while significantly improving performance by reducing
write operations.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7700d350-353b-49b2-8783-78b0d5bff72e

📥 Commits

Reviewing files that changed from the base of the PR and between df86c79 and f12871c.

📒 Files selected for processing (8)

backend/server.js
frontend/src/App.tsx
frontend/src/components/ListingDetailCard.tsx
frontend/src/hooks/useTranslation.ts
frontend/src/i18n/translations.ts
scraper/agent_worker.py
scraper/scraper.py
scraper/scraper_selenium.py

✅ Files skipped from review due to trivial changes (1)

frontend/src/i18n/translations.ts

🚧 Files skipped from review as they are similar to previous changes (5)

frontend/src/hooks/useTranslation.ts
backend/server.js
scraper/scraper_selenium.py
frontend/src/components/ListingDetailCard.tsx
frontend/src/App.tsx

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

backend/server.js (1)
51-53: 💤 Low value

Consider generating a random initial password.

The hardcoded default password 'password' is trivially guessable. For a dev/demo setup, consider generating a random password on first run and printing it once, or at minimum document prominently that this must be changed before any non-local use.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/server.js` around lines 51 - 53, The hardcoded defaultPassword
variable is set to the weak default value 'password', which poses a security
risk. Replace this with a solution that either generates a random initial
password on first run and logs it to the console for the administrator to
capture, or add prominent documentation (in comments and README) stating that
this hardcoded password must be changed immediately before any non-local
deployment. If implementing random generation, ensure the password is only
generated once during initial setup and persisted securely.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/package.json`:
- Around line 13-15: The jsonwebtoken package at version 9.0.3 has transitive
dependencies on lodash with known security vulnerabilities. Update jsonwebtoken
to a newer version that resolves these vulnerabilities (consider upgrading to a
more recent stable version), and if necessary, explicitly update lodash to a
patched version in the dependencies. After making these changes, run npm audit
to verify that the vulnerabilities have been resolved and no new issues are
introduced. Additionally, consider setting up automated transitive dependency
scanning in your CI/CD pipeline to catch such vulnerabilities early in future
development.

In `@backend/server.js`:
- Line 64: The console.log statement that logs the seeded default admin user is
including the plaintext defaultPassword value in the output, which poses a
security risk as logs may be persisted or sent to external systems. Remove the
password variable from the log message. Keep the log statement that confirms the
default admin user was seeded but only include the defaultEmail and a success
message without exposing the actual defaultPassword value.
- Line 21: Remove the hardcoded fallback value from the JWT_SECRET constant
definition and implement a fail-fast approach that throws an error if the
environment variable is not set in production environments, or generate a random
ephemeral secret with a warning in development. Create a jwtSigningKey variable
that will hold the resolved secret value, and replace all uses of JWT_SECRET
with jwtSigningKey throughout the codebase for signing and verifying
authentication tokens.

---

Nitpick comments:
In `@backend/server.js`:
- Around line 51-53: The hardcoded defaultPassword variable is set to the weak
default value 'password', which poses a security risk. Replace this with a
solution that either generates a random initial password on first run and logs
it to the console for the administrator to capture, or add prominent
documentation (in comments and README) stating that this hardcoded password must
be changed immediately before any non-local deployment. If implementing random
generation, ensure the password is only generated once during initial setup and
persisted securely.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 51974059-1c2b-4477-8358-d8109b0c1f85

📥 Commits

Reviewing files that changed from the base of the PR and between f12871c and 5747716.

⛔ Files ignored due to path filters (1)

backend/package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (4)

backend/package.json
backend/server.js
frontend/src/App.tsx
frontend/src/i18n/translations.ts

🚧 Files skipped from review as they are similar to previous changes (1)

frontend/src/i18n/translations.ts

coderabbitai · 2026-06-19T13:54:27Z

CVE-2020-7689

+    "bcrypt": "^6.0.0",
    "express": "^4.21.2",
+    "jsonwebtoken": "^9.0.3",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for security advisories on bcrypt and jsonwebtoken echo "=== Checking bcrypt advisories ===" gh api graphql -f query=' { securityVulnerabilities(first: 5, ecosystem: NPM, package: "bcrypt") { nodes { advisory { summary severity publishedAt } vulnerableVersionRange firstPatchedVersion { identifier } } } }' echo "" echo "=== Checking jsonwebtoken advisories ===" gh api graphql -f query=' { securityVulnerabilities(first: 5, ecosystem: NPM, package: "jsonwebtoken") { nodes { advisory { summary severity publishedAt } vulnerableVersionRange firstPatchedVersion { identifier } } } }' echo "" echo "=== Latest versions on npm ===" curl -s https://registry.npmjs.org/bcrypt/latest | jq '.version' curl -s https://registry.npmjs.org/jsonwebtoken/latest | jq '.version'

Repository: Infraviored/DealMapper

Length of output: 1996

🌐 Web query:

bcrypt npm 6.0.0 security vulnerabilities advisories

💡 Result:

As of June 19, 2026, there are no known security vulnerabilities associated with bcrypt version 6.0.0 [1][2]. The bcrypt package previously contained a significant security issue known as the "wrap-around" bug (CVE-2020-7689), which caused passwords 255 characters or longer to be truncated, and an additional issue involving improper handling of NUL characters in passwords [3][4][5]. Both of these vulnerabilities were addressed and patched in version 5.0.0 [3][6][4]. Version 6.0.0, released on May 11, 2025, is the latest stable version and remains free of these historical vulnerabilities [1][4]. Users are encouraged to keep their dependencies up to date and use tools like npm audit to monitor for any newly discovered issues [7].

Citations:

1: https://security.snyk.io/package/npm/bcrypt

2: https://secure.software/npm/packages/bcrypt/6.0.0

3: https://security.snyk.io/vuln/SNYK-JS-BCRYPT-572911

4: https://www.npmjs.com/package/bcrypt

5: https://github.com/kelektiv/node.bcrypt.js/

6: https://github.com/kelektiv/node.bcrypt.js/blob/master/CHANGELOG.md

7: https://www.sherlockforensics.com/security/npm/bcrypt.html

🌐 Web query:

jsonwebtoken npm 9.0.3 security vulnerabilities CVE advisories

💡 Result:

As of June 19, 2026, the npm package jsonwebtoken version 9.0.3 does not have direct, uniquely assigned CVEs for vulnerabilities within its own codebase [1][2]. While major security advisories (such as CVE-2022-23529, CVE-2022-23540, CVE-2022-23541, and CVE-2022-23539) affected versions prior to 9.0.0, these have been patched [3][4][5]. However, users should be aware of the following security context for version 9.0.3: 1. Transitive Vulnerabilities: As of April 2026, version 9.0.3 has been reported to be flagged by security scanners due to vulnerabilities in its transitive lodash dependencies (e.g., CVE-2026-4800, CVE-2019-10744, CVE-2021-23337, and CVE-2020-8203) [6]. Development teams are tracking these issues in the project repository (e.g., issue #1023), but these are dependency-level concerns rather than flaws in the jsonwebtoken logic itself [6]. 2. Dependency Updates: Version 9.0.3 (released December 4, 2025) updated the jws dependency to version 4.0.1 [3][7]. 3. Recommended Practice: Because jsonwebtoken is a widely used library, always ensure that your application-level usage follows secure coding practices, such as validating algorithms and sanitizing user input, as recommended in the official documentation [8][9]. For the most accurate and up-to-date status, consult the project's official GitHub repository issues and your own security scanning tools to assess the impact of these transitive dependencies on your specific environment [6].

Citations:

1: https://security.snyk.io/package/npm/jsonwebtoken

2: https://secure.software/npm/packages/jsonwebtoken

3: https://github.com/auth0/node-jsonwebtoken/blob/v9.0.3/CHANGELOG.md

4: https://nvd.nist.gov/vuln/detail/cve-2022-23539

5: GHSA-qwph-4952-7xr6

6: Inquiry about lodash dependency updates auth0/node-jsonwebtoken#1023

7: auth0/node-jsonwebtoken@v9.0.2...v9.0.3

8: GHSA-27h2-hvpr-p74q

9: https://npmx.dev/package/jsonwebtoken/v/9.0.3

🌐 Web query:

bcrypt npm latest version 2026

💡 Result:

As of June 19, 2026, the latest version of the bcrypt package on npm is 6.0.0 [1][2]. It was published on May 11, 2025 [1][3].

Citations:

1: https://www.npmjs.com/package/bcrypt

2: https://www.npmjs.com/package/bcrypt?activeTab=versions

3: https://www.npmjs.com/package/bcrypt?activeTab=code

🌐 Web query:

jsonwebtoken npm latest version 2026

💡 Result:

As of June 19, 2026, the latest version of the jsonwebtoken npm package is 9.0.3 [1][2][3]. This version was published on December 4, 2025 [1][3][4].

Citations:

1: https://www.npmjs.com/package/jsonwebtoken

2: https://npmx.dev/package/jsonwebtoken

3: https://security.snyk.io/package/npm/jsonwebtoken

4: https://packages.ecosyste.ms/registries/npmjs.org/packages/jsonwebtoken

Address transitive vulnerabilities in jsonwebtoken dependencies.

bcrypt 6.0.0 is current and free from known vulnerabilities. However, jsonwebtoken 9.0.3 depends on lodash with known vulnerabilities (CVE-2026-4800, CVE-2019-10744, CVE-2021-23337, CVE-2020-8203). Run npm audit to assess impact on your application and consider updating lodash or implementing transitive dependency auditing in your CI/CD pipeline.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/package.json` around lines 13 - 15, The jsonwebtoken package at version 9.0.3 has transitive dependencies on lodash with known security vulnerabilities. Update jsonwebtoken to a newer version that resolves these vulnerabilities (consider upgrading to a more recent stable version), and if necessary, explicitly update lodash to a patched version in the dependencies. After making these changes, run npm audit to verify that the vulnerabilities have been resolved and no new issues are introduced. Additionally, consider setting up automated transitive dependency scanning in your CI/CD pipeline to catch such vulnerabilities early in future development.

Infraviored · 2026-06-19T14:03:28Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces user authentication using JWT and bcrypt, adds a custom internationalization (i18n) framework supporting English and German, refactors the frontend to use reusable UI components (Button, Card, Input), and optimizes the scraper to use fast, direct HTTP requests instead of Selenium by default. The review feedback highlights critical security vulnerabilities regarding hardcoded production secrets and default credentials, a bug in cookie parsing, and a pagination regex issue. Additionally, valuable performance improvements are suggested to batch disk writes and reuse HTTP connections via requests.Session.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-19T14:05:36Z

+  if (req.headers.cookie) {
+    const cookies = Object.fromEntries(
+      req.headers.cookie.split('; ').map(c => {
+        const parts = c.split('=');
+        return [parts[0], parts.slice(1).join('=')];
+      })
+    );
+    token = cookies.token;
+  }


The current cookie parsing logic splits strictly by '; ' (semicolon followed by a space). However, according to RFC 6265, the space after the semicolon is optional. If a client sends cookies separated only by a semicolon (e.g., token=xyz;other=abc), this parsing logic will fail to split them correctly, resulting in a malformed token and authentication failure.

Using a regular expression like /;\s*/ to split the cookies and trimming the keys ensures robust parsing.

Suggested change

if (req.headers.cookie) {

const cookies = Object.fromEntries(

req.headers.cookie.split('; ').map(c => {

const parts = c.split('=');

return [parts[0], parts.slice(1).join('=')];

})

);

token = cookies.token;

}

if (req.headers.cookie) {

const cookies = Object.fromEntries(

req.headers.cookie.split(/;\s*/).map(c => {

const parts = c.split('=');

return [parts[0].trim(), parts.slice(1).join('=')];

})

);

token = cookies.token;

}

gemini-code-assist · 2026-06-19T14:05:36Z

 const sqlite3 = require('sqlite3').verbose();
+const jwt = require('jsonwebtoken');
+const bcrypt = require('bcrypt');
+const JWT_SECRET = process.env.JWT_SECRET || 'dealmapper_dev_secret_key_12345';


Using a hardcoded fallback for JWT_SECRET in production is a security risk. If the environment variable is missing, the application will run with a known default secret, making it vulnerable to token forgery.

It is highly recommended to enforce that JWT_SECRET is set when running in production, and throw an error or exit if it is missing.

Suggested change

const JWT_SECRET = process.env.JWT_SECRET || 'dealmapper_dev_secret_key_12345';

if (process.env.NODE_ENV === 'production' && !process.env.JWT_SECRET) {

console.error("FATAL: JWT_SECRET environment variable is required in production!");

process.exit(1);

}

const JWT_SECRET = process.env.JWT_SECRET || 'dealmapper_dev_secret_key_12345';

gemini-code-assist · 2026-06-19T14:05:36Z

+        const defaultEmail = 'admin@dealmapper.local';
+        const defaultPassword = 'password';
+        const saltRounds = 10;


Seeding a default admin user with a hardcoded password ('password') is a security risk. It is highly recommended to allow configuring the default admin credentials via environment variables, with fallback values only for development environments.

Suggested change

const defaultEmail = 'admin@dealmapper.local';

const defaultPassword = 'password';

const saltRounds = 10;

const defaultEmail = process.env.DEFAULT_ADMIN_EMAIL || 'admin@dealmapper.local';

const defaultPassword = process.env.DEFAULT_ADMIN_PASSWORD || 'password';

const saltRounds = 10;

gemini-code-assist · 2026-06-19T14:05:36Z

+app.use('/api', (req, res, next) => {
+  if (req.path === '/auth/login' || req.path === '/auth/logout' || req.path === '/auth/me') {
+    return next();
+  }
+  authenticateToken(req, res, next);
+});


The path checks inside the global API auth protection middleware are redundant because the public routes /api/auth/login and /api/auth/logout are defined before this middleware in the Express routing chain. In Express, requests matching those routes are fully handled and terminated before reaching this middleware.

You can simplify this by removing the redundant path checks and applying the authenticateToken middleware directly.

// Global API Auth Protection (applied to all subsequent /api/* routes) app.use('/api', authenticateToken);

gemini-code-assist · 2026-06-19T14:05:37Z

+def parse_listing_details_requests(url):
+    """Retrieve and parse detailed specifications, description, and images using direct requests"""
+    try:
+        response = requests.get(url, headers=HEADERS, timeout=10)


The parse_listing_details_requests function performs a direct requests.get call for every listing. When harvesting descriptions for multiple listings sequentially, this establishes a new TCP connection and SSL handshake for every single request, which is highly inefficient.

By utilizing a requests.Session object, you can enable HTTP Keep-Alive connection reuse, significantly speeding up the harvesting process and reducing overhead on both the client and server.

def parse_listing_details_requests(url, session=None): """Retrieve and parse detailed specifications, description, and images using direct requests""" try: caller = session if session else requests response = caller.get(url, headers=HEADERS, timeout=10)

gemini-code-assist · 2026-06-19T14:05:37Z

+                if "/seite:" in base_url:
+                    current_url = re.sub(r"/seite:\d+/", f"/seite:{page}/", base_url)


The regular expression r"/seite:\d+/" expects a trailing slash. If the base_url ends with /seite:2 without a trailing slash, the regex will fail to match, and the URL will not be paginated correctly.

Using a more robust regex pattern like r"/seite:\d+(/|$)" handles both cases correctly.

Suggested change

if "/seite:" in base_url:

current_url = re.sub(r"/seite:\d+/", f"/seite:{page}/", base_url)

if "/seite:" in base_url:

current_url = re.sub(r"/seite:\d+(/|$)", f"/seite:{page}\\1", base_url)

gemini-code-assist · 2026-06-19T14:05:37Z

+
+                        if output_file:
+                            with open(output_file, "w", encoding="utf-8") as f:
+                                json.dump(
+                                    existing_listings, f, ensure_ascii=False, indent=4
+                                )
+
+                    except Exception as e:
+                        logger.error(f"Error scraping single listing: {str(e)}")
+                        continue
+
+                logger.info(
+                    f"Scraped {scraped_count} discovered listings from {current_url}"


Writing the entire existing_listings list to the JSON file on every single iteration of the loop is extremely inefficient and causes unnecessary disk I/O overhead.

Instead, you should batch the write operation by moving the json.dump call outside of the inner loop, executing it once per page after all listings on that page have been parsed.

except Exception as e: logger.error(f"Error scraping single listing: {str(e)}") continue if output_file: with open(output_file, "w", encoding="utf-8") as f: json.dump( existing_listings, f, ensure_ascii=False, indent=4 ) logger.info( f"Scraped {scraped_count} discovered listings from {current_url}" )

…r efficiency

Infraviored · 2026-06-19T16:43:54Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces user authentication, internationalization (i18n) support, and a fast requests-based scraping pipeline. On the backend, JWT-based authentication and SQLite user seeding have been implemented, while the frontend has been updated with a login portal, reusable UI components, and English/German translations. Additionally, the scraper has been optimized to bypass browser automation using direct HTTP requests, with legacy Selenium paths preserved. The code review feedback highlights critical security vulnerabilities, including a default admin password fallback in production, plain-text logging of seeded credentials, and susceptibility to timing attacks on the login endpoint. A robustness issue was also identified in the JSON-LD parser when handling array-wrapped metadata structures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-19T16:45:50Z

+        const defaultEmail = process.env.DEFAULT_ADMIN_EMAIL || 'admin@dealmapper.local';
+        const defaultPassword = process.env.DEFAULT_ADMIN_PASSWORD || 'password';


In production, defaulting the admin password to 'password' if DEFAULT_ADMIN_PASSWORD is not set poses a severe security risk. It is highly recommended to require DEFAULT_ADMIN_PASSWORD to be explicitly set in production environments, similar to how JWT_SECRET is handled, and exit the process if it is missing.

const defaultEmail = process.env.DEFAULT_ADMIN_EMAIL || 'admin@dealmapper.local'; let defaultPassword = process.env.DEFAULT_ADMIN_PASSWORD; if (process.env.NODE_ENV === 'production' && !defaultPassword) { console.error("FATAL: DEFAULT_ADMIN_PASSWORD environment variable is required in production to seed the default admin user!"); process.exit(1); } if (!defaultPassword) { defaultPassword = 'password'; }

gemini-code-assist · 2026-06-19T16:45:51Z

+            [defaultEmail, hash, 'admin'], 
+            (err) => {
+              if (err) console.error("Failed to seed default admin user:", err);
+              else console.log(`Seeded default admin user: ${defaultEmail} (password: ${defaultPassword})`);


Logging the seeded admin password in plain text to the console/stdout is a security vulnerability (CWE-532). In production, these logs are often persisted or monitored by third-party systems, exposing sensitive credentials. Remove the password from the log message.

Suggested change

else console.log(`Seeded default admin user: ${defaultEmail} (password: ${defaultPassword})`);

else console.log(`Seeded default admin user: ${defaultEmail}`);

gemini-code-assist · 2026-06-19T16:45:51Z

+    const user = await get("SELECT * FROM users WHERE email = ?", [email.toLowerCase().trim()]);
+    if (!user) {
+      return res.status(401).json({ error: 'Invalid email or password.' });
+    }
+
+    const isMatch = await bcrypt.compare(password, user.password_hash);
+    if (!isMatch) {
+      return res.status(401).json({ error: 'Invalid email or password.' });
+    }


The login route is vulnerable to user enumeration via a timing attack. If the user is not found in the database, the server returns a 401 response immediately. If the user is found, it performs a computationally expensive bcrypt.compare operation. This timing difference allows an attacker to easily determine which email addresses are registered. To mitigate this, always perform a dummy bcrypt comparison when the user is not found.

Suggested change

const user = await get("SELECT * FROM users WHERE email = ?", [email.toLowerCase().trim()]);

if (!user) {

return res.status(401).json({ error: 'Invalid email or password.' });

}

const isMatch = await bcrypt.compare(password, user.password_hash);

if (!isMatch) {

return res.status(401).json({ error: 'Invalid email or password.' });

}

const user = await get("SELECT * FROM users WHERE email = ?", [email.toLowerCase().trim()]);

// Always run bcrypt.compare to prevent timing attacks (user enumeration)

const passwordHash = user ? user.password_hash : "$2b$10$invalidhashplaceholderdummyvalue";

const isMatch = await bcrypt.compare(password, passwordHash);

if (!user || !isMatch) {

return res.status(401).json({ error: 'Invalid email or password.' });

}

gemini-code-assist · 2026-06-19T16:45:51Z

+                data = json.loads(script.string)
+                if isinstance(data, dict) and data.get("@type") == "ImageObject":
+                    if data.get("representativeOfPage") is True:
+                        schema_description = data.get("description")
+                        schema_main_image = data.get("contentUrl")


The JSON-LD parser assumes that the parsed data is always a single dictionary. However, Schema.org metadata can often be wrapped in a JSON array (list) or contain a @graph array of objects. If data is a list, calling data.get will raise an AttributeError and skip the block. Enhance the parser to handle both single objects, lists, and @graph structures to make metadata extraction more robust.

Suggested change

data = json.loads(script.string)

if isinstance(data, dict) and data.get("@type") == "ImageObject":

if data.get("representativeOfPage") is True:

schema_description = data.get("description")

schema_main_image = data.get("contentUrl")

data = json.loads(script.string)

# Normalize to a list of objects to handle both single objects and arrays

items = []

if isinstance(data, list):

items = data

elif isinstance(data, dict):

if "@graph" in data and isinstance(data["@graph"], list):

items = data["@graph"]

else:

items = [data]

for item in items:

if isinstance(item, dict) and item.get("@type") == "ImageObject":

if item.get("representativeOfPage") is True:

schema_description = item.get("description")

schema_main_image = item.get("contentUrl")

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scraper/scraper.py (1)

384-423: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Database UPDATE is outside the for loop - only the last listing gets updated.

The cursor.execute block starting at line 405 is indented at the same level as the for statement, placing it outside the loop. This means:

The loop fetches and parses details for every listing
Only the final iteration's listing_id and parsed values are used in the UPDATE
All other listings' harvested data is silently lost

Proposed fix to move UPDATE inside the loop

                 parsed = parse_listing_details_requests(url, session=session)
                 if parsed is None:
                     continue
 
-            cursor.execute(
-                """
-                UPDATE listings 
-                SET detailed_description = ?, details = ?, images = ?, full_info_obtained = 1,
-                    last_description_changed_at = ?
-                WHERE id = ?
-            """,
-                (
-                    parsed["detailed_description"],
-                    json.dumps(parsed["details"]),
-                    json.dumps(parsed["images"]),
-                    datetime.datetime.now(datetime.timezone.utc).isoformat(),
-                    listing_id,
-                ),
-            )
-            conn.commit()
-            logger.info(
-                f"Successfully harvested detailed description, {len(parsed['details'])} details, and {len(parsed['images'])} images for ID {listing_id}"
-            )
+                cursor.execute(
+                    """
+                    UPDATE listings 
+                    SET detailed_description = ?, details = ?, images = ?, full_info_obtained = 1,
+                        last_description_changed_at = ?
+                    WHERE id = ?
+                """,
+                    (
+                        parsed["detailed_description"],
+                        json.dumps(parsed["details"]),
+                        json.dumps(parsed["images"]),
+                        datetime.datetime.now(datetime.timezone.utc).isoformat(),
+                        listing_id,
+                    ),
+                )
+                conn.commit()
+                logger.info(
+                    f"Successfully harvested detailed description, {len(parsed['details'])} details, and {len(parsed['images'])} images for ID {listing_id}"
+                )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper/scraper.py` around lines 384 - 423, The cursor.execute block that
updates the database is currently indented at the same level as the for loop,
placing it outside the loop. This causes only the last listing to be updated
while all other listings' harvested data is discarded. Move the entire
cursor.execute UPDATE statement (which updates the listings table with
detailed_description, details, images, and full_info_obtained) inside the for
loop by increasing its indentation level so it executes after each listing is
parsed. Also ensure the conn.commit() and logger.info calls that follow the
UPDATE are moved inside the loop as well.

♻️ Duplicate comments (1)

scraper/scraper.py (1)

485-530: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Update logic is outside the for loop - only the last listing is processed.

Lines 507-530 are indented at the with block level, not inside the for loop. This results in:

The loop fetches details for all listings but discards results except the last
Only the final listing's description is compared and potentially updated
All other listings are never checked for changes

Proposed fix to move processing inside the loop

                 parsed = parse_listing_details_requests(url, session=session)
                 if parsed is None:
                     continue
 
-            detailed_description = parsed["detailed_description"] or ""
-
-            if detailed_description.strip() != old_description.strip():
-                logger.info(
-                    f"Description changed for listing {listing_id}! Updating in DB."
-                )
-                cursor.execute(
-                    """
-                    UPDATE listings 
-                    SET detailed_description = ?, details = ?, images = ?, full_info_obtained = 1,
-                        last_description_changed_at = ?
-                    WHERE id = ?
-                """,
-                    (
-                        detailed_description,
-                        json.dumps(parsed["details"]),
-                        json.dumps(parsed["images"]),
-                        datetime.datetime.now(datetime.timezone.utc).isoformat(),
-                        listing_id,
-                    ),
-                )
-                conn.commit()
-            else:
-                logger.info(f"No description changes for listing {listing_id}.")
+                detailed_description = parsed["detailed_description"] or ""
+
+                if detailed_description.strip() != old_description.strip():
+                    logger.info(
+                        f"Description changed for listing {listing_id}! Updating in DB."
+                    )
+                    cursor.execute(
+                        """
+                        UPDATE listings 
+                        SET detailed_description = ?, details = ?, images = ?, full_info_obtained = 1,
+                            last_description_changed_at = ?
+                        WHERE id = ?
+                    """,
+                        (
+                            detailed_description,
+                            json.dumps(parsed["details"]),
+                            json.dumps(parsed["images"]),
+                            datetime.datetime.now(datetime.timezone.utc).isoformat(),
+                            listing_id,
+                        ),
+                    )
+                    conn.commit()
+                else:
+                    logger.info(f"No description changes for listing {listing_id}.")

Note: Once this indentation is fixed, the prior review feedback about persisting details/images changes even when description is unchanged will still apply.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scraper/scraper.py` around lines 485 - 530, The code block starting with the
detailed_description assignment and continuing through the database update and
logging logic is indented at the wrong level (at the with statement level
instead of inside the for loop). This means the loop iterates through all rows
and calls parse_listing_details_requests for each one, but the comparison and
update logic only executes once after the loop completes, using only the last
parsed result. Move all the code from the detailed_description assignment
through the final else block to be indented inside the for loop, at the same
indentation level as the parse_listing_details_requests call and the if parsed
is None check, so that each listing's details are compared and updated before
moving to the next listing.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@scraper/scraper.py`:
- Around line 384-423: The cursor.execute block that updates the database is
currently indented at the same level as the for loop, placing it outside the
loop. This causes only the last listing to be updated while all other listings'
harvested data is discarded. Move the entire cursor.execute UPDATE statement
(which updates the listings table with detailed_description, details, images,
and full_info_obtained) inside the for loop by increasing its indentation level
so it executes after each listing is parsed. Also ensure the conn.commit() and
logger.info calls that follow the UPDATE are moved inside the loop as well.

---

Duplicate comments:
In `@scraper/scraper.py`:
- Around line 485-530: The code block starting with the detailed_description
assignment and continuing through the database update and logging logic is
indented at the wrong level (at the with statement level instead of inside the
for loop). This means the loop iterates through all rows and calls
parse_listing_details_requests for each one, but the comparison and update logic
only executes once after the loop completes, using only the last parsed result.
Move all the code from the detailed_description assignment through the final
else block to be indented inside the for loop, at the same indentation level as
the parse_listing_details_requests call and the if parsed is None check, so that
each listing's details are compared and updated before moving to the next
listing.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: edece5e5-363f-4e1a-bfd2-4a9a3c8a07ff

📥 Commits

Reviewing files that changed from the base of the PR and between 5747716 and 25adb7c.

📒 Files selected for processing (2)

backend/server.js
scraper/scraper.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/server.js

… harden JSON-LD parser

Infraviored added 4 commits May 22, 2026 21:15

feat: migrate to reusable UI components and lightweight translation s…

39346b2

…ystem

style: fix linter errors and typecast compiler constraints

fc68f2b

feat(scraper): add direct GET scraping and discovery demonstration sc…

381bf7e

…ripts

feat: implement fast requests-based HTTP scraper, preserve legacy Sel…

df86c79

…enium fallback, and add configurable politeness delays

gemini-code-assist Bot reviewed Jun 18, 2026

View reviewed changes

fix: address Gemini bot review findings on safety and i18n

f12871c

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

feat: implement stateless JWT authentication via HTTP-Only cookies

5747716

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

refactor: address Gemini review feedback for auth security and scrape…

25adb7c

…r efficiency

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

security: mitigate login timing attacks, protect seeding secrets, and…

e1295a1

… harden JSON-LD parser

		delayBetweenPages: parseFloat(delayBetweenPages !== undefined ? delayBetweenPages : 0.25),
		delayBetweenListings: parseFloat(delayBetweenListings !== undefined ? delayBetweenListings : 0.25)

-const JWT_SECRET = process.env.JWT_SECRET || 'dealmapper_dev_secret_key_12345';
+if (process.env.NODE_ENV === 'production' && !process.env.JWT_SECRET) {
+  console.error("FATAL: JWT_SECRET environment variable is required in production!");
+  process.exit(1);
+}
+const JWT_SECRET = process.env.JWT_SECRET || 'dealmapper_dev_secret_key_12345';

		if "/seite:" in base_url:
		current_url = re.sub(r"/seite:\d+/", f"/seite:{page}/", base_url)

		const defaultEmail = process.env.DEFAULT_ADMIN_EMAIL \|\| 'admin@dealmapper.local';
		const defaultPassword = process.env.DEFAULT_ADMIN_PASSWORD \|\| 'password';

	else console.log(`Seeded default admin user: ${defaultEmail} (password: ${defaultPassword})`);
	else console.log(`Seeded default admin user: ${defaultEmail}`);

-                data = json.loads(script.string)
-                if isinstance(data, dict) and data.get("@type") == "ImageObject":
-                    if data.get("representativeOfPage") is True:
-                        schema_description = data.get("description")
-                        schema_main_image = data.get("contentUrl")
+                data = json.loads(script.string)
+                # Normalize to a list of objects to handle both single objects and arrays
+                items = []
+                if isinstance(data, list):
+                    items = data
+                elif isinstance(data, dict):
+                    if "@graph" in data and isinstance(data["@graph"], list):
+                        items = data["@graph"]
+                    else:
+                        items = [data]
+                for item in items:
+                    if isinstance(item, dict) and item.get("@type") == "ImageObject":
+                        if item.get("representativeOfPage") is True:
+                            schema_description = item.get("description")
+                            schema_main_image = item.get("contentUrl")

Conversation

Infraviored commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Infraviored commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading