| Type | Data Engineering & API Integration |
| Platform | Salesforce Marketing Cloud (SFMC) |
| Problem | Client shutting down SFMC account — all historical data at risk of permanent loss |
| Solution | Two automated Python pipelines extracting Content Builder assets (REST) and email send tracking data (SOAP) before shutdown |
| REST Throughput | ~5.4 records/second |
| SOAP Throughput | ~146 records/second |
| Output | Structured JSON + Excel ready for migration or archiving |
The client made the decision to shut down their Salesforce Marketing Cloud account. This created an immediate risk of permanent data loss — years of email content assets and historical send tracking data would become inaccessible once the account was closed.
The challenge was that SFMC does not provide a native bulk export tool for either Content Builder assets or email send tracking data. There was no built-in way to:
- Extract all content assets with structured metadata at scale
- Preserve email content (subject lines, preheaders, HTML) before account closure
- Extract complete email send history (Job IDs, send stats, bounce rates, open rates) via SOAP API
- Produce a clean, structured output ready for archiving or migration to another platform
Manual export was not a viable option given the volume of assets and the time constraint of the account shutdown deadline. An automated extraction pipeline was the only reliable solution.
Built two fully automated Python data pipelines sharing a common architecture:
Pipeline 1 — Content Builder Assets (REST API):
- Authenticates with SFMC using OAuth2 client credentials flow
- Fetches all assets page by page via the POST query endpoint (
/asset/v1/content/assets/query) - Transforms raw nested JSON into flat structured rows
- Saves output incrementally to JSON after each page
- Checkpoints progress using universal checkpoint system — resumes automatically if interrupted
Pipeline 2 — Email Send Tracking (SOAP API):
- Authenticates using the same OAuth2 token reused from Pipeline 1
- Fetches all send records in batches via SOAP
Sendobject - Transforms raw XML response into clean flat dicts
- Saves output incrementally to JSON after each batch
- Checkpoints progress using
RequestID— resumes automatically if interrupted
REST: SFMC REST API → Auth → Paginated Fetch → Transform → Checkpoint → JSON → Excel
SOAP: SFMC SOAP API → Auth → Batch Fetch (ContinueRequest) → Transform → Checkpoint → JSON → Excel
sfmc-data-pipeline/
├── main.py # Entry point — run_fetch_rest_data() + run_fetch_soap_data()
├── clients/
│ ├── __init__.py
│ ├── sfmc_client.py # REST — Content Builder assets (OAuth2 + POST query)
│ └── sfmc_soap_client.py # SOAP — Email send tracking (XML request builder + parser)
├── config/
│ ├── __init__.py
│ ├── settings.py # Loads environment variables
│ └── sfmc_columns.py # Defines fields to extract for both pipelines
├── state/
│ ├── __init__.py
│ └── checkpoint.py # Universal checkpoint — supports any pipeline via filename + dict
├── transform/
│ ├── __init__.py
│ ├── extract.py # REST — transforms raw JSON items into flat rows
│ ├── soap_extract.py # SOAP — cleans and flattens XML response rows
│ └── flatten.py # Helper to flatten nested JSON fields
├── utils/
│ ├── __init__.py
│ └── logger.py # Centralized logging setup
├── output/ # Auto-generated — not committed to version control
├── .env # Environment variables (not committed)
├── requirements.txt
└── README.md
| Skill | Usage |
|---|---|
| Python | Core pipeline development |
| REST API Integration | SFMC Content Builder API (POST query endpoint) |
| SOAP API Integration | SFMC Email Send Tracking via XML Send object |
| OAuth2 Authentication | SFMC client credentials token flow — reused across both pipelines |
| Pagination Handling | REST: smart stop via total count — SOAP: ContinueRequest with RequestID |
| XML Parsing | xml.etree.ElementTree with namespace handling |
| JSON Transformation | Nested JSON flattening with dot notation paths |
| Universal Checkpoint | Filename + dict pattern — supports REST (page + last_id) and SOAP (request_id) |
| Logging | Structured per-batch logging with timestamps and throughput metrics |
| Environment Config | .env based secrets management |
- ✅ Zero data loss — all assets and send history extracted before account shutdown
- ✅ REST pipeline: ~5.4 records/second — Content Builder assets
- ✅ SOAP pipeline: ~146 records/second — Email send tracking data
- ✅ Fault-tolerant — both pipelines resume from exact crash point automatically
- ✅ Structured output — flat JSON ready for Excel, database, or platform migration
- ✅ Subject lines, preheaders and HTML captured for all email assets
- ✅ Full send history preserved — Job IDs, send stats, bounces, opens, clicks
- ✅ Fully automated — no manual effort required
Throughput estimate: REST — divide total record count by 5.4 to get approximate seconds. SOAP — divide by 146.
This pipeline was built for a one-time data migration before account shutdown. However the architecture is intentionally designed to be reusable — swapping credentials in .env is all that's needed to run it against any SFMC account.
Planned extensions for future client work:
- Target Platform Loader — Push extracted JSON directly into BigQuery, HubSpot, or Salesforce CRM without rewriting the extraction layer
- Email-Only Filter — Filter assets by
assetTypeto isolate email records only - Multi-Account Support — Pass account credentials dynamically to run across multiple SFMC business units
- Incremental Sync — Adapt checkpoint system to support scheduled incremental pulls instead of one-time full extraction
pip install -r requirements.txtCLIENT_ID=your_sfmc_client_id
CLIENT_SECRET=your_sfmc_client_secret
SUBDOMAIN=your_sfmc_subdomain
PAGE_SIZE=100python main.pyJust re-run the same command — checkpoint handles the rest automatically:
python main.py.envfile is never committed to version controloutput/folder is excluded from version control — contains client data- SFMC tokens auto-refresh on expiry