ASTR The Label Scraper extracts structured apparel product data from the ASTR The Label online store, enabling reliable access to pricing, product details, and catalog updates. It helps businesses and analysts turn raw storefront pages into clean, usable data for smarter e-commerce decisions.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for astr-the-label-scraper you've just found your team — Let’s Chat. 👆👆
This project collects and structures product information from ASTR The Label’s apparel catalog into developer-friendly datasets. It solves the challenge of manually tracking product changes, prices, and availability across a growing online store. It’s designed for e-commerce teams, analysts, and developers who need consistent product intelligence at scale.
- Extracts detailed product metadata from live catalog pages
- Normalizes pricing and variant information for analysis
- Supports large catalogs with consistent data structures
- Suitable for automation pipelines and analytics workflows
| Feature | Description |
|---|---|
| Product Catalog Extraction | Collects structured data from apparel listing and product pages. |
| Price & Variant Tracking | Captures prices, variants, and option-specific details accurately. |
| Image & Description Parsing | Extracts product images and rich descriptions for downstream use. |
| Scalable Crawling | Handles small to large catalogs efficiently without manual intervention. |
| Field Name | Field Description |
|---|---|
| product_id | Unique identifier for the product. |
| title | Product name as displayed in the store. |
| price | Current product price. |
| currency | Currency associated with the price. |
| variants | Available sizes, colors, or other options. |
| images | Product image URLs. |
| description | Full product description text. |
| availability | Stock or availability status. |
| product_url | Direct link to the product page. |
ASTR The Label Scraper/
├── src/
│ ├── main.py
│ ├── crawler/
│ │ ├── product_list.py
│ │ └── product_detail.py
│ ├── parsers/
│ │ ├── pricing.py
│ │ └── variants.py
│ ├── utils/
│ │ └── helpers.py
│ └── config/
│ └── settings.example.json
├── data/
│ ├── sample_input.json
│ └── sample_output.json
├── requirements.txt
└── README.md
- E-commerce analysts use it to monitor apparel prices, so they can identify trends and competitive gaps.
- Retail teams use it to track product availability, enabling faster merchandising decisions.
- Data engineers use it to feed structured product data into analytics pipelines for reporting.
- Market researchers use it to analyze fashion catalog changes over time for insights.
Is this tool suitable for large apparel catalogs? Yes, it’s designed to scale across hundreds or thousands of products while maintaining consistent output structure.
Can the data be integrated into existing systems? The extracted data is structured and ready for use in databases, dashboards, or reporting tools.
Does it handle product variants like size and color? Yes, variant-specific details are captured and organized for accurate analysis.
How often can it be run? It can be executed as frequently as needed, depending on how often product updates are required.
Primary Metric: Processes an average of 120–180 product pages per minute under standard catalog conditions.
Reliability Metric: Maintains a successful extraction rate above 97% across repeated runs.
Efficiency Metric: Optimized crawling minimizes redundant requests while maintaining full coverage.
Quality Metric: Delivers consistently complete product records with high field accuracy across catalog updates.
