Skip to content

Commit cd4f1be

Browse files
Fix sparse/dense SVD
1 parent ade7f3d commit cd4f1be

9 files changed

Lines changed: 1579 additions & 4 deletions

src/actionet/wp_utils.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
#include <limits>
66

77
// Convert NumPy array to Armadillo dense matrix
8-
arma::mat numpy_to_arma_mat(py::array_t<double> arr) {
8+
arma::mat numpy_to_arma_mat(py::array_t<double, py::array::c_style | py::array::forcecast> arr) {
99
py::buffer_info buf = arr.request();
1010
if (buf.ndim != 2) {
1111
throw std::runtime_error("Expected 2D array");
@@ -126,7 +126,7 @@ py::object arma_sparse_to_scipy(const arma::sp_mat& sp_mat) {
126126
}
127127

128128
// Convert NumPy vector to Armadillo vector
129-
arma::vec numpy_to_arma_vec(py::array_t<double> arr) {
129+
arma::vec numpy_to_arma_vec(py::array_t<double, py::array::c_style | py::array::forcecast> arr) {
130130
py::buffer_info buf = arr.request();
131131
if (buf.ndim != 1) {
132132
throw std::runtime_error("Expected 1D array");

src/actionet/wp_utils.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
namespace py = pybind11;
1515

1616
// Convert NumPy array to Armadillo dense matrix
17-
arma::mat numpy_to_arma_mat(py::array_t<double> arr);
17+
arma::mat numpy_to_arma_mat(py::array_t<double, py::array::c_style | py::array::forcecast> arr);
1818

1919
// Convert SciPy sparse matrix to Armadillo sparse matrix
2020
arma::sp_mat scipy_to_arma_sparse(py::object scipy_sparse);
@@ -26,7 +26,7 @@ py::array_t<double> arma_mat_to_numpy(const arma::mat& mat);
2626
py::object arma_sparse_to_scipy(const arma::sp_mat& sp_mat);
2727

2828
// Convert NumPy vector to Armadillo vector
29-
arma::vec numpy_to_arma_vec(py::array_t<double> arr);
29+
arma::vec numpy_to_arma_vec(py::array_t<double, py::array::c_style | py::array::forcecast> arr);
3030

3131
// Convert Armadillo vector to NumPy array
3232
py::array_t<double> arma_vec_to_numpy(const arma::vec& vec);

tests/BENCHMARK_README.md

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# SVD Algorithm Benchmark Suite
2+
3+
This directory contains a comprehensive benchmark test for comparing the 4 SVD algorithms implemented in ACTIONet.
4+
5+
## Benchmark Script
6+
7+
**File:** `benchmark_svd_algorithms.py`
8+
9+
### Features
10+
11+
The benchmark evaluates all 4 SVD methods (IRLB, Halko, Feng, PRIMME) with:
12+
13+
1. **Both Input Types:**
14+
- Sparse matrices (CSR format)
15+
- Dense matrices (NumPy arrays)
16+
17+
2. **Performance Metrics:**
18+
- Execution time (mean, std, min, max)
19+
- Peak memory usage (requires `psutil`)
20+
- Multiple runs for statistical reliability
21+
22+
3. **Comprehensive Visualization:**
23+
- Execution time comparison (bar plot with error bars)
24+
- Memory usage comparison
25+
- Speedup factors (sparse vs dense)
26+
- Timing distribution (box plots)
27+
28+
4. **Detailed Output:**
29+
- Console summary table
30+
- CSV export for further analysis
31+
- PNG visualization
32+
33+
### Requirements
34+
35+
**Required:**
36+
- numpy
37+
- scipy
38+
- matplotlib
39+
- pandas
40+
- anndata
41+
- scanpy
42+
- actionet
43+
44+
**Optional (for memory tracking):**
45+
- psutil (recommended, install with `pip install psutil`)
46+
- memory_profiler (for detailed profiling, install with `pip install memory-profiler`)
47+
48+
### Usage
49+
50+
#### Basic Usage
51+
```bash
52+
python benchmark_svd_algorithms.py
53+
```
54+
55+
This runs with default settings:
56+
- 30 components
57+
- 3 runs per test
58+
- Standard memory tracking (if psutil is available)
59+
60+
#### Custom Parameters
61+
```bash
62+
# Specify number of components
63+
python benchmark_svd_algorithms.py --components 50
64+
65+
# Specify number of runs per test
66+
python benchmark_svd_algorithms.py --runs 5
67+
68+
# Enable detailed memory profiling (requires memory_profiler)
69+
python benchmark_svd_algorithms.py --detailed-memory
70+
```
71+
72+
#### Combined Options
73+
```bash
74+
python benchmark_svd_algorithms.py --components 30 --runs 3
75+
```
76+
77+
### Output Files
78+
79+
The benchmark generates three output files in the `tests/` directory:
80+
81+
1. **benchmark_svd_results.csv** - Detailed results in CSV format
82+
2. **benchmark_svd_results.png** - Comprehensive visualization with 4 subplots
83+
3. **benchmark_output.txt** - Full console output (if redirected)
84+
85+
### Interpreting Results
86+
87+
#### Execution Time
88+
- Lower is better
89+
- Compare mean times across methods
90+
- Error bars show standard deviation across runs
91+
92+
#### Memory Usage
93+
- Lower is better
94+
- Shows peak memory increase during computation
95+
- Note: First run may show higher memory due to initial allocations
96+
97+
#### Speedup Factor
98+
- Values > 1.0 mean dense is slower (sparse is faster)
99+
- Values < 1.0 mean sparse is slower (dense is faster)
100+
- Green bars indicate sparse advantage, red bars indicate dense advantage
101+
102+
#### Best Practices
103+
- Run with `--runs 5` or more for production benchmarks
104+
- Close other applications to reduce memory noise
105+
- Use consistent hardware/environment for comparisons
106+
- Check that data characteristics (size, sparsity) match your use case
107+
108+
### Example Output
109+
110+
```
111+
======================================================================
112+
BENCHMARK SUMMARY
113+
======================================================================
114+
115+
Method Input Type Mean Time (s) Std Time (s) Min Time (s) Max Time (s) Memory (MB)
116+
IRLB sparse 2.365 0.008 2.357 2.372 1304.7
117+
IRLB dense 1.354 0.072 1.282 1.426 19.5
118+
Halko sparse 4.560 0.063 4.496 4.623 44.3
119+
Halko dense 0.896 0.001 0.896 0.897 19.6
120+
Feng sparse 5.091 0.003 5.087 5.094 10.5
121+
Feng dense 0.915 0.000 0.915 0.915 21.9
122+
PRIMME sparse 3.994 0.027 3.967 4.021 3.1
123+
PRIMME dense 2.496 0.014 2.481 2.510 0.0
124+
125+
----------------------------------------------------------------------
126+
FASTEST METHODS:
127+
----------------------------------------------------------------------
128+
Sparse input: IRLB - 2.365 s
129+
Dense input: Halko - 0.896 s
130+
131+
----------------------------------------------------------------------
132+
MEMORY EFFICIENCY:
133+
----------------------------------------------------------------------
134+
Sparse input: PRIMME - 3.1 MB
135+
Dense input: PRIMME - 0.0 MB
136+
```
137+
138+
### Key Findings (Example Dataset)
139+
140+
Based on test data (6790 cells × 14445 genes, 81% sparse):
141+
142+
1. **For Sparse Input:**
143+
- Fastest: IRLB (~2.4s)
144+
- Most memory efficient: PRIMME (~3MB)
145+
146+
2. **For Dense Input:**
147+
- Fastest: Halko (~0.9s)
148+
- Most memory efficient: PRIMME (~0MB)
149+
150+
3. **Sparse vs Dense Performance:**
151+
- Dense input is generally faster for all methods
152+
- Memory advantage of sparse depends on method and sparsity
153+
154+
### Customizing for Your Data
155+
156+
To benchmark with your own data:
157+
158+
1. Modify the `load_and_prepare_data()` function
159+
2. Update the data path to your AnnData file
160+
3. Adjust preprocessing steps as needed
161+
4. Ensure data is stored in a layer (default: 'logcounts')
162+
163+
### Troubleshooting
164+
165+
**Memory tracking not working:**
166+
- Install psutil: `pip install psutil`
167+
- Memory values may be 0.0 on some runs due to GC timing
168+
169+
**Benchmark runs too slowly:**
170+
- Reduce `--runs` parameter
171+
- Reduce `--components` parameter
172+
- Use a smaller dataset
173+
174+
**Out of memory errors:**
175+
- Reduce number of components
176+
- Use a smaller dataset
177+
- Close other applications
178+
- Try sparse input only
179+
180+
**Import errors:**
181+
- Ensure actionet is built and installed
182+
- Check that all dependencies are installed
183+
- Verify Python path includes the src directory
184+
185+
## Related Test Files
186+
187+
- `test_svd_methods.py` - Basic validation of all SVD methods
188+
- `test_svd_sparse_vs_dense.py` - Consistency testing between sparse/dense
189+
- `test_svd_comprehensive.py` - (if exists) Additional comprehensive tests
190+
191+
## Contributing
192+
193+
To add new benchmark metrics or visualizations:
194+
195+
1. Add metric collection in `benchmark_svd_method()`
196+
2. Update `create_summary_table()` to include new columns
197+
3. Add visualization in `visualize_benchmark_results()`
198+
4. Update this README with usage notes
199+
200+
## License
201+
202+
Same as the main ACTIONet project.

0 commit comments

Comments
 (0)