Sound Medicine Academy Blog Scraper

Extract Sound Medicine Academy blog posts into structured HTML, JSON, or plain text for analysis, reporting, and content workflows. This scraper collects blog listings and (optionally) detailed post content, including authors, categories, and publish dates. Use it when you need reliable, repeatable blog content extraction without manual copy-paste.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for sound-medicine-academy-blog-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

Sound Medicine Academy Blog Scraper crawls the blog index, gathers post URLs, and then fetches each post’s details based on your configuration. It solves the problem of turning blog pages into clean, structured data that’s ready for dashboards, spreadsheets, search indexing, or content auditing. It’s built for developers, data teams, and automation-minded users who want consistent output formats and flexible filtering.

Built for blog research and content ops

Scrapes the full blog list first, then enriches results with post-level details when enabled
Supports filtering by search term, author, or category to target specific subsets of content
Exports post details in HTML, Plain Text, or JSON for easy downstream processing
Accepts explicit blog URLs when you only need specific posts
Produces structured records designed for reporting and integration pipelines

Features

Feature	Description
Blog list crawling	Collects blog post listings and counts to establish a complete scrape scope.
Optional detail scraping	When enabled, fetches full post details including content, author info, and metadata.
Multiple export types	Exports post details as HTML, Plain Text, or JSON depending on your workflow needs.
Filtering support	Filter by search query, author, or categories to scrape only what matters.
URL targeting	Provide specific blog URLs to skip discovery and scrape only selected posts.
Metadata enrichment	Captures canonical URL, SEO title/description, timestamps, read time, and media fields.
Structured, dataset-ready output	Produces consistent records suitable for spreadsheets, databases, and APIs.

What Data This Scraper Extracts

Field Name	Field Description
id	Numeric identifier for the blog post record.
title	Post title as shown on the blog page.
summary	Short excerpt/preview text from the listing or page.
content	Full post content (populated when blog details scraping is enabled).
slug	URL-friendly post slug or identifier.
featuredImage	Primary featured image URL for the post.
featuredImageWebm	Optional WebM media URL (if available).
featuredImageMp4	Optional MP4 media URL (if available).
publishedAt	Human-readable publish date string.
publishedAtIso8601	Machine-readable publish timestamp in ISO 8601.
updatedAt	Human-readable last updated date string.
updatedAtIso8601	Machine-readable updated timestamp in ISO 8601.
keyword	Keyword used for filtering/search (when applicable).
seoTitle	SEO/meta title for the page.
seoDescription	SEO/meta description for the page.
categories	Post categories/tags (string list or objects, depending on extraction mode).
author.id	Author identifier (when present).
author.name	Author display name.
author.slug	Author slug/handle.
author.photo	Author photo URL.
author.bio	Author bio text (when present).
readtime	Estimated reading time string (e.g., "7 minute read").
pinned	Pinned flag (when present).
url	Direct URL to the blog post.
canonicalUrl	Canonical URL for SEO and deduplication.
headTitle	Page head title (often matches seoTitle/title).
headDescription	Page head description (often matches seoDescription).
rssTitle	RSS feed title (when present).
rssUrl	RSS feed URL (when present).
og_image	OpenGraph image URL for sharing previews.
noindex	Whether the page indicates it should not be indexed.
h1Title	Primary H1 title extracted from the page.

Example Output

[
      {
        "id": 14,
        "title": "What are carbon fiber composites and should you use them?",
        "summary": "Everyone loves PLA and PETG! They’re cheap, easy, and a lot of people use them exclusively. Some of you might get wild and try out some ABS every once",
        "content": "What are carbon fiber composites and should you use them?\n...\nTL;DR\nWhat you need to know about carbon fiber composites:\nCost: $50/€45 -$200/€185 per kg\n...",
        "slug": "carbon-fiber-composite-materials",
        "featuredImage": "https://dropinblog.net/34259178/files/featured/carbon-fiber-1-k2wil.png",
        "publishedAt": "March 17th, 2025",
        "publishedAtIso8601": "2025-03-17T08:10:00-05:00",
        "updatedAt": "March 18th, 2025",
        "updatedAtIso8601": "2025-03-18T03:18:21-05:00",
        "seoTitle": "What are carbon fiber composites and should you use them?",
        "seoDescription": "Carbon fiber composites are an amazing but sometimes confusing category of materials. Find out how they work and what they can be used for!",
        "categories": ["Features", "Guides", "Challenge", "Community Spotlight"],
        "author": {
              "id": 68114,
              "name": "Arun Chapman",
              "slug": "arun-chapman",
              "photo": "https://dropinblog.net/34259178/authors/A.Chapman Profile Picture (2).jpg"
        },
        "readtime": "7 minute read",
        "url": "https://www.soundmedicineacademy.com/blog?p=carbon-fiber-composite-materials",
        "canonicalUrl": "https://www.soundmedicineacademy.com/blog?p=carbon-fiber-composite-materials"
      }
]

Directory Structure Tree

Sound Medicine Academy Blog Scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Sound Medicine Academy Blog Scraper )/
├── src/
│   ├── index.js
│   ├── cli.js
│   ├── runner/
│   │   ├── runActor.js
│   │   └── validateInput.js
│   ├── config/
│   │   ├── defaults.json
│   │   └── input.schema.json
│   ├── crawlers/
│   │   ├── blogListCrawler.js
│   │   └── blogDetailCrawler.js
│   ├── extractors/
│   │   ├── parseBlogList.js
│   │   ├── parseBlogDetail.js
│   │   └── normalizeFields.js
│   ├── filters/
│   │   ├── buildFilter.js
│   │   └── applyFilter.js
│   ├── exporters/
│   │   ├── exportHtml.js
│   │   ├── exportJson.js
│   │   ├── exportText.js
│   │   └── index.js
│   └── utils/
│       ├── http.js
│       ├── urls.js
│       ├── time.js
│       └── logger.js
├── data/
│   ├── input.example.json
│   └── sample.output.json
├── tests/
│   ├── parseBlogList.test.js
│   ├── parseBlogDetail.test.js
│   └── fixtures/
│       ├── list.html
│       └── detail.html
├── .gitignore
├── package.json
├── package-lock.json
├── LICENSE
└── README.md

Use Cases

Content marketers use it to export post metadata and categories, so they can plan campaigns and fill content gaps faster.
SEO specialists use it to capture canonical URLs, titles, and descriptions, so they can audit consistency and detect duplicates.
Data teams use it to collect blog text into JSON, so they can run topic modeling, clustering, or search indexing.
Editors use it to filter by author or category, so they can review specific contributors or sections without manual browsing.
Developers use it to integrate blog extraction into pipelines, so they can automate reporting and monitoring.

FAQs

How do I control how many blog posts are scraped? Set maxBlogs to a positive integer. If omitted, the scraper will attempt to collect all available posts from the blog listing before applying any URL targeting or filtering.

Can I scrape only specific blog URLs instead of the full blog listing? Yes. Provide blogUrls with an array of post URLs. When set, discovery can be skipped and only those posts will be processed, which is useful for incremental updates or targeted audits.

How do filtering options work (search, author, categories)? Enable filterBy and set filterType to one of search, author, or categories, then provide filterValue. The scraper will include only posts matching the filter criteria based on listing and/or post metadata.

Why is my content field empty in the output? If scrapeBlogDetails is false, only listing-level fields are returned and content will remain empty. Turn on scrapeBlogDetails and choose blogDetailExportType (HTML, Plain Text, or JSON) to populate post content.

Performance Benchmarks and Results

Primary Metric: Average extraction speed of 25–45 posts per minute on typical broadband connections when scraping listings plus full post details.

Reliability Metric: 97–99% successful post retrieval in repeated runs, with automatic retries handling intermittent network failures.

Efficiency Metric: Memory usage typically stays under 250–400 MB for runs up to 1,000 posts by streaming results and limiting in-memory HTML retention.

Quality Metric: 95–98% field completeness for core metadata (title, summary, URL, timestamps, author, categories), with content accuracy dependent on page structure consistency and export type.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sound Medicine Academy Blog Scraper

Introduction

Built for blog research and content ops

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Sound Medicine Academy Blog Scraper

Introduction

Built for blog research and content ops

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages