This project analyzes Amazon's e-commerce ecosystem using network science techniques to understand consumer behavior, product relationships, and marketplace dynamics. The study leverages Amazon's product metadata from Stanford's SNAP project to construct and analyze multiple network types that reveal how products connect through customer interactions.
- Source: Amazon Product Co-Purchasing Network Metadata (Stanford SNAP Project) https://snap.stanford.edu/data/amazon-meta.html
- Collection Period: Summer 2006
- Size: 548,552 products with ~7.8 million customer reviews
- Categories: Books (71.7%), Music CDs (18.8%), Videos (4.8%), DVDs (3.6%)
- Data Types: Product metadata, co-purchase relationships, category hierarchies, customer reviews
- Structure: Product-to-product connections via Amazon's "similar" field
- Purpose: Captures Amazon's algorithmic co-purchase relationships
- Structure: Bipartite graph connecting customers to reviewed products
- Purpose: Foundation for authentic customer-product interaction analysis
- Structure: Products connected by shared reviewers
- Purpose: Customer-driven product similarity independent of Amazon's algorithms
- Structure: Customers connected by common product reviews
- Purpose: Customer segmentation and behavioral analysis
- Customer-driven networks show exceptional connectivity and cohesion (>92% in largest component)
- Direct co-purchase network exhibits extreme fragmentation (5,390 components, 0.007% largest component)
- Co-review networks demonstrate highest density (0.5550) and clustering (0.8892)
- Co-review and customer similarity networks exhibit strong small-world characteristics
- Average path lengths: Co-review (1.38), Customer similarity (2.25)
- Superior navigation efficiency compared to algorithmic approaches
- Customer-driven networks form 39-74 meaningful communities (sizes 203-264)
- Direct co-purchase network shows extreme fragmentation (5,390 tiny communities)
- Large cliques identified: Customer similarity (max 568), Co-review (max 337)
- Stratified sampling: Percentile-based across product performance levels
- Subgraph construction: 4,000-node strategic sampling using BFS from anchor nodes
- Reproducibility: Fixed random seeds (42) for all procedures
- Python Libraries: pandas, networkx, scikit-learn
- Metrics: Centrality, clustering, community detection (Louvain), clique analysis, k-core decomposition
- Validation: Multiple network types for cross-validation of structural patterns
report.tex- Complete LaTeX research reportSNA_project.ipynb- Jupyter notebook with analysis codeSNA_project.pdf- Compiled PDF reportREADME.md- This file
- Recommendation Systems: Customer-driven projections provide superior structures for product recommendations compared to algorithmic approaches
- Marketing Strategy: Natural community detection reveals authentic customer segments and product clusters
- Navigation Efficiency: Small-world properties enable rapid information flow and product discovery
- Long-tail Support: Distributed connectivity patterns support diverse product visibility beyond bestsellers
- Dataset from 2006 may not reflect current marketplace dynamics
- Static analysis - temporal evolution not captured
- Amazon-only data - external marketplace influences not considered
- Structural focus - product attributes like pricing/quality not integrated
pandas
networkx
matplotlib
seaborn
scikit-learn
numpy- Load the Jupyter notebook
SNA_project.ipynb - Ensure required Python libraries are installed
- Run cells sequentially to reproduce the analysis
- Modify sampling parameters or network types as needed
If you use this work, please cite:
Odarbashi, N., & Shahri, R. (2025). Network Analysis of Amazon's Product Co-Purchasing Ecosystem.
Social Network Analysis Project.
This project is for academic purposes. Dataset provided by Stanford SNAP Project.