|
| 1 | +# SVD Algorithm Benchmark Suite |
| 2 | + |
| 3 | +This directory contains a comprehensive benchmark test for comparing the 4 SVD algorithms implemented in ACTIONet. |
| 4 | + |
| 5 | +## Benchmark Script |
| 6 | + |
| 7 | +**File:** `benchmark_svd_algorithms.py` |
| 8 | + |
| 9 | +### Features |
| 10 | + |
| 11 | +The benchmark evaluates all 4 SVD methods (IRLB, Halko, Feng, PRIMME) with: |
| 12 | + |
| 13 | +1. **Both Input Types:** |
| 14 | + - Sparse matrices (CSR format) |
| 15 | + - Dense matrices (NumPy arrays) |
| 16 | + |
| 17 | +2. **Performance Metrics:** |
| 18 | + - Execution time (mean, std, min, max) |
| 19 | + - Peak memory usage (requires `psutil`) |
| 20 | + - Multiple runs for statistical reliability |
| 21 | + |
| 22 | +3. **Comprehensive Visualization:** |
| 23 | + - Execution time comparison (bar plot with error bars) |
| 24 | + - Memory usage comparison |
| 25 | + - Speedup factors (sparse vs dense) |
| 26 | + - Timing distribution (box plots) |
| 27 | + |
| 28 | +4. **Detailed Output:** |
| 29 | + - Console summary table |
| 30 | + - CSV export for further analysis |
| 31 | + - PNG visualization |
| 32 | + |
| 33 | +### Requirements |
| 34 | + |
| 35 | +**Required:** |
| 36 | +- numpy |
| 37 | +- scipy |
| 38 | +- matplotlib |
| 39 | +- pandas |
| 40 | +- anndata |
| 41 | +- scanpy |
| 42 | +- actionet |
| 43 | + |
| 44 | +**Optional (for memory tracking):** |
| 45 | +- psutil (recommended, install with `pip install psutil`) |
| 46 | +- memory_profiler (for detailed profiling, install with `pip install memory-profiler`) |
| 47 | + |
| 48 | +### Usage |
| 49 | + |
| 50 | +#### Basic Usage |
| 51 | +```bash |
| 52 | +python benchmark_svd_algorithms.py |
| 53 | +``` |
| 54 | + |
| 55 | +This runs with default settings: |
| 56 | +- 30 components |
| 57 | +- 3 runs per test |
| 58 | +- Standard memory tracking (if psutil is available) |
| 59 | + |
| 60 | +#### Custom Parameters |
| 61 | +```bash |
| 62 | +# Specify number of components |
| 63 | +python benchmark_svd_algorithms.py --components 50 |
| 64 | + |
| 65 | +# Specify number of runs per test |
| 66 | +python benchmark_svd_algorithms.py --runs 5 |
| 67 | + |
| 68 | +# Enable detailed memory profiling (requires memory_profiler) |
| 69 | +python benchmark_svd_algorithms.py --detailed-memory |
| 70 | +``` |
| 71 | + |
| 72 | +#### Combined Options |
| 73 | +```bash |
| 74 | +python benchmark_svd_algorithms.py --components 30 --runs 3 |
| 75 | +``` |
| 76 | + |
| 77 | +### Output Files |
| 78 | + |
| 79 | +The benchmark generates three output files in the `tests/` directory: |
| 80 | + |
| 81 | +1. **benchmark_svd_results.csv** - Detailed results in CSV format |
| 82 | +2. **benchmark_svd_results.png** - Comprehensive visualization with 4 subplots |
| 83 | +3. **benchmark_output.txt** - Full console output (if redirected) |
| 84 | + |
| 85 | +### Interpreting Results |
| 86 | + |
| 87 | +#### Execution Time |
| 88 | +- Lower is better |
| 89 | +- Compare mean times across methods |
| 90 | +- Error bars show standard deviation across runs |
| 91 | + |
| 92 | +#### Memory Usage |
| 93 | +- Lower is better |
| 94 | +- Shows peak memory increase during computation |
| 95 | +- Note: First run may show higher memory due to initial allocations |
| 96 | + |
| 97 | +#### Speedup Factor |
| 98 | +- Values > 1.0 mean dense is slower (sparse is faster) |
| 99 | +- Values < 1.0 mean sparse is slower (dense is faster) |
| 100 | +- Green bars indicate sparse advantage, red bars indicate dense advantage |
| 101 | + |
| 102 | +#### Best Practices |
| 103 | +- Run with `--runs 5` or more for production benchmarks |
| 104 | +- Close other applications to reduce memory noise |
| 105 | +- Use consistent hardware/environment for comparisons |
| 106 | +- Check that data characteristics (size, sparsity) match your use case |
| 107 | + |
| 108 | +### Example Output |
| 109 | + |
| 110 | +``` |
| 111 | +====================================================================== |
| 112 | +BENCHMARK SUMMARY |
| 113 | +====================================================================== |
| 114 | +
|
| 115 | +Method Input Type Mean Time (s) Std Time (s) Min Time (s) Max Time (s) Memory (MB) |
| 116 | + IRLB sparse 2.365 0.008 2.357 2.372 1304.7 |
| 117 | + IRLB dense 1.354 0.072 1.282 1.426 19.5 |
| 118 | + Halko sparse 4.560 0.063 4.496 4.623 44.3 |
| 119 | + Halko dense 0.896 0.001 0.896 0.897 19.6 |
| 120 | + Feng sparse 5.091 0.003 5.087 5.094 10.5 |
| 121 | + Feng dense 0.915 0.000 0.915 0.915 21.9 |
| 122 | +PRIMME sparse 3.994 0.027 3.967 4.021 3.1 |
| 123 | +PRIMME dense 2.496 0.014 2.481 2.510 0.0 |
| 124 | +
|
| 125 | +---------------------------------------------------------------------- |
| 126 | +FASTEST METHODS: |
| 127 | +---------------------------------------------------------------------- |
| 128 | +Sparse input: IRLB - 2.365 s |
| 129 | +Dense input: Halko - 0.896 s |
| 130 | +
|
| 131 | +---------------------------------------------------------------------- |
| 132 | +MEMORY EFFICIENCY: |
| 133 | +---------------------------------------------------------------------- |
| 134 | +Sparse input: PRIMME - 3.1 MB |
| 135 | +Dense input: PRIMME - 0.0 MB |
| 136 | +``` |
| 137 | + |
| 138 | +### Key Findings (Example Dataset) |
| 139 | + |
| 140 | +Based on test data (6790 cells × 14445 genes, 81% sparse): |
| 141 | + |
| 142 | +1. **For Sparse Input:** |
| 143 | + - Fastest: IRLB (~2.4s) |
| 144 | + - Most memory efficient: PRIMME (~3MB) |
| 145 | + |
| 146 | +2. **For Dense Input:** |
| 147 | + - Fastest: Halko (~0.9s) |
| 148 | + - Most memory efficient: PRIMME (~0MB) |
| 149 | + |
| 150 | +3. **Sparse vs Dense Performance:** |
| 151 | + - Dense input is generally faster for all methods |
| 152 | + - Memory advantage of sparse depends on method and sparsity |
| 153 | + |
| 154 | +### Customizing for Your Data |
| 155 | + |
| 156 | +To benchmark with your own data: |
| 157 | + |
| 158 | +1. Modify the `load_and_prepare_data()` function |
| 159 | +2. Update the data path to your AnnData file |
| 160 | +3. Adjust preprocessing steps as needed |
| 161 | +4. Ensure data is stored in a layer (default: 'logcounts') |
| 162 | + |
| 163 | +### Troubleshooting |
| 164 | + |
| 165 | +**Memory tracking not working:** |
| 166 | +- Install psutil: `pip install psutil` |
| 167 | +- Memory values may be 0.0 on some runs due to GC timing |
| 168 | + |
| 169 | +**Benchmark runs too slowly:** |
| 170 | +- Reduce `--runs` parameter |
| 171 | +- Reduce `--components` parameter |
| 172 | +- Use a smaller dataset |
| 173 | + |
| 174 | +**Out of memory errors:** |
| 175 | +- Reduce number of components |
| 176 | +- Use a smaller dataset |
| 177 | +- Close other applications |
| 178 | +- Try sparse input only |
| 179 | + |
| 180 | +**Import errors:** |
| 181 | +- Ensure actionet is built and installed |
| 182 | +- Check that all dependencies are installed |
| 183 | +- Verify Python path includes the src directory |
| 184 | + |
| 185 | +## Related Test Files |
| 186 | + |
| 187 | +- `test_svd_methods.py` - Basic validation of all SVD methods |
| 188 | +- `test_svd_sparse_vs_dense.py` - Consistency testing between sparse/dense |
| 189 | +- `test_svd_comprehensive.py` - (if exists) Additional comprehensive tests |
| 190 | + |
| 191 | +## Contributing |
| 192 | + |
| 193 | +To add new benchmark metrics or visualizations: |
| 194 | + |
| 195 | +1. Add metric collection in `benchmark_svd_method()` |
| 196 | +2. Update `create_summary_table()` to include new columns |
| 197 | +3. Add visualization in `visualize_benchmark_results()` |
| 198 | +4. Update this README with usage notes |
| 199 | + |
| 200 | +## License |
| 201 | + |
| 202 | +Same as the main ACTIONet project. |
0 commit comments