Extract complete hotel detail data from Urlaub.check24.de in a structured format, including ratings, descriptions, images, and price signals. Built for travel analytics workflows where consistent hotel data is needed for comparison, monitoring, and reporting. This CHECK24 hotel details scraper turns individual hotel pages into clean, analysis-ready records.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for check24-reisen-details-scraper you've just found your team — Let’s Chat. 👆👆
This project collects rich hotel profile information from Urlaub.check24.de hotel detail and offer pages and returns normalized, structured output you can feed into dashboards, research pipelines, or pricing models. It removes the manual work of copying descriptions, ratings, media, and availability signals across many properties. It’s designed for travel businesses, hospitality analysts, and market researchers who need repeatable, high-quality hotel datasets for competitor analysis and trend tracking.
- Accepts a list of hotel detail/offer URLs and processes them with retries for stability.
- Extracts core hotel profile fields (identity, location, category, description) plus rich media and review metrics.
- Captures rating breakdowns and review summaries to support quality scoring and benchmarking.
- Collects distance and scarcity indicators to evaluate location strength and demand pressure.
- Produces consistent JSON records suitable for BI tools, pricing engines, or data warehouses.
| Feature | Description |
|---|---|
| Bulk URL processing | Submit many hotel URLs at once for scaled data collection. |
| Retry and resiliency controls | Configurable per-URL retries to reduce failed runs and partial records. |
| Proxy-ready configuration | Supports proxy configuration for stable access and reduced blocking risk. |
| Rich hotel profile extraction | Pulls identity, description, category, address, and coordinates in one pass. |
| Media collection | Extracts image counts and image URLs for gallery/visual analysis. |
| Ratings and reviews | Captures overall score, review count, recommendation, and review list details. |
| Location & distance signals | Extracts proximity fields (e.g., beach/city center) to power ranking models. |
| Scarcity insights | Captures urgency signals like high demand and recent booking indicators. |
| Amenity/attribute ranking | Extracts ranked attribute lists for feature-based comparisons. |
| Field Name | Field Description |
|---|---|
| id | Unique hotel identifier within the platform. |
| description | Full textual hotel description including amenities and positioning details. |
| hib_url | Canonical hotel profile path/URL used for cross-referencing the full hotel page. |
| master_data | Core structured profile data (name, category, address fields, country/region, map image, etc.). |
| master_data.name | Hotel name as shown on the detail page. |
| master_data.category | Star/category classification value where available. |
| master_data.street | Street address where available. |
| master_data.postal_code | Postal/ZIP code where available. |
| master_data.city | City where the property is located. |
| master_data.region | Region/area label used for browsing and grouping. |
| master_data.country | Country name of the hotel location. |
| master_data.geo_coordinate.latitude | Latitude coordinate for mapping and geospatial analysis. |
| master_data.geo_coordinate.longitude | Longitude coordinate for mapping and geospatial analysis. |
| flags | Special tags/badges associated with the hotel (e.g., labels, restrictions, promotions) when present. |
| media | Media summary including counts and a list of image items. |
| media.total_image_count | Total number of images associated with the property. |
| media.image_count | Number of official/property images. |
| media.guest_image_count | Number of guest/user images when available. |
| media.images[].url | Image URL entry for galleries, QA, or visual pipelines. |
| rating | Rating object with overall and detailed review metrics. |
| rating.average | Overall average rating score. |
| rating.count | Total number of reviews counted for the rating. |
| rating.recommendation | Recommendation percentage/score when provided. |
| rating.llm_aggregated_summary | Summary text describing common positives/negatives from reviews (if present). |
| rating.rating_list[] | Individual review entries including rating, headline, summary, travel dates, and metadata. |
| rating.trust_you_rating_list[] | Category-based rating summaries (e.g., location, cleanliness, food) with averages and counts. |
| distances | Distance and accessibility fields (e.g., beach distance, city center, airport status). |
| distances.city_center | Distance to city center where available. |
| distances.beach.distance | Distance to beach in meters where available. |
| distances.beach.has_direct_beach_access | Whether the property indicates direct beach access. |
| scarcity | Demand and urgency signals such as “highly demanded” and “recently booked” timing. |
| scarcity.is_highly_demanded | Boolean indicator for high demand. |
| scarcity.is_top_location | Boolean indicator for top location labeling. |
| scarcity.recently_booked_hotel | Text indicator of recent booking activity (e.g., “21 minutes”). |
| ranked_attribute_list | Ranked list of amenities/attributes used for comparisons and feature modeling. |
[
{
"id": 7305,
"description": "Das Hotel eignet sich mit seiner Lage im Ferienort Alanya an der Türkischen Riviera gut für einen abwechslungsreichen Badeurlaub...",
"hib_url": "/hib/7305/hotel?date=2025-07-26&preferredTouroperator=C24&traveltype=hotel&step=4&offerListParams%5Bairport%5D=BER%2CBRE%2C...",
"master_data": {
"has_direct_beach_access": true,
"region_id": 586,
"category": 4,
"name": "Grand Okan Hotel",
"street": "Atatürk Caddesi",
"postal_code": "07400",
"city": "Alanya",
"region": "Side & Alanya",
"country": "Türkei",
"map_image": "//ctsassets1.check24.de/size=250c200/.../picture.jpg",
"geo_coordinate": {
"latitude": 36.5941086,
"longitude": 31.8524075
}
},
"flags": {},
"media": {
"total_image_count": 297,
"image_count": 174,
"guest_image_count": 123,
"images": [
{ "url": "//ctsassets1.check24.de/size=625c440/.../picture.jpg" },
{ "url": "//ctsassets1.check24.de/size=308c263/.../picture.jpg" }
]
},
"rating": {
"average": 8.4,
"count": 143,
"recommendation": 83.9,
"llm_aggregated_summary": "Das Hotel besticht durch seine zentrale Lage nur 50 m vom Kleopatra-Strand entfernt...",
"rating_list": [
{
"id": 3629584,
"rating": 8,
"customer_name": "Ricardo Filipe Da S.",
"age": "26-30",
"headline": "Preis Leistung völlig angemessen",
"summary": "Pro: Ingesamt sehr erholend... Kontra: Überteuerte Strandliegen...",
"date_of_travel": "2025-07-01",
"created_at": "2025-07-09",
"traveled_as": "Pair"
}
]
},
"distances": {
"city_center": 14207,
"beach": { "has_direct_beach_access": true, "distance": 50 },
"airport": { "is_pending": true }
},
"scarcity": {
"is_highly_demanded": true,
"is_top_location": true,
"recently_booked_hotel": "21 Minuten"
},
"ranked_attribute_list": [
"DirectBeachAccess",
"Pool",
"Wellness",
"FreeWLAN"
]
}
]
check24-reisen-details-scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Check24 Reisen Details Scraper )/
├── src/
│ ├── runner.py
│ ├── main.py
│ ├── clients/
│ │ ├── http_client.py
│ │ └── session_pool.py
│ ├── extractors/
│ │ ├── hotel_details_extractor.py
│ │ ├── ratings_extractor.py
│ │ ├── media_extractor.py
│ │ └── distances_extractor.py
│ ├── parsers/
│ │ ├── json_normalizer.py
│ │ └── url_utils.py
│ ├── pipelines/
│ │ ├── validate_records.py
│ │ └── deduplicate.py
│ ├── outputs/
│ │ ├── dataset_writer.py
│ │ ├── exporters_json.py
│ │ └── exporters_csv.py
│ └── config/
│ ├── settings.example.json
│ └── logging.yaml
├── data/
│ ├── inputs.sample.json
│ └── urls.sample.txt
├── tests/
│ ├── test_url_utils.py
│ ├── test_normalizer.py
│ └── fixtures/
│ └── sample_hotel_response.json
├── scripts/
│ ├── run_local.sh
│ └── smoke_test.py
├── .env.example
├── .gitignore
├── requirements.txt
├── pyproject.toml
└── README.md
- Travel agencies use it to build standardized hotel profiles from many URLs, so they can power comparison pages and improve customer recommendations.
- Competitive intelligence teams use it to monitor ratings, amenities, and demand signals, so they can benchmark competitors and spot market shifts early.
- Pricing analysts use it to collect scarcity and offer-related indicators, so they can evaluate pricing pressure and optimize packages.
- Hospitality consultants use it to audit property positioning and review themes, so they can deliver evidence-based improvement plans.
- Market researchers use it to track destination trends across large hotel sets, so they can analyze seasonality, preferences, and emerging locations.
How do I prepare URLs for best results? Collect full hotel detail or offer URLs from Urlaub.check24.de that represent the properties you want. Use consistent search parameters (dates, travelers, departure airports) if you want results to be comparable across hotels. Mixing very different query parameters can still work, but it may reduce comparability in downstream analysis.
What happens if a hotel page is missing some fields?
Some hotels don’t expose all fields (e.g., certain distance items, flags, or media counts). The scraper returns partial objects where needed and keeps the record structure stable. In analytics pipelines, treat optional fields as nullable and validate critical fields (like id, master_data.name, and rating.average) before scoring.
How do retries and stability work?
You can set max_retries_per_url to re-attempt a URL when transient failures occur. This improves successful completion rates on large runs but increases runtime. For large batches, it’s best to keep retries modest and rely on re-running failed URLs as a separate batch.
Can I export results to CSV for BI tools?
Yes. The project includes exporters that can flatten key nested fields (e.g., master_data.*, rating.average, distances.*) into tabular form for BI tools. For deeply nested arrays like rating.rating_list and media.images, JSON export is recommended, or you can store those arrays separately depending on your warehouse design.
Primary Metric: Typical end-to-end extraction averages ~2–5 hotel pages per minute under steady network conditions with conservative request pacing.
Reliability Metric: With retries enabled (e.g., 2 per URL) and consistent proxy routing, runs commonly achieve ~90–97% successful URL completion on medium-sized batches.
Efficiency Metric: Memory usage remains stable because records are streamed and written incrementally; throughput scales linearly with concurrency until network limits are reached.
Quality Metric: Core fields (hotel id, name, category, rating summary, and key media counts) are usually captured with high completeness, while optional sections (some flags, airport distances, or sparse review lists) may vary by property and page type.
