Skip to content

Commit 00dffdf

Browse files
Merge config_rewrite branch into main
2 parents 2e6dd36 + 4a4d5e5 commit 00dffdf

17 files changed

Lines changed: 1337 additions & 1069 deletions

BACKWARDS_COMPATIBILITY_FIXED.md

Whitespace-only changes.

CONFIG_IMPROVEMENT.md

Lines changed: 0 additions & 176 deletions
Original file line numberDiff line numberDiff line change
@@ -1,176 +0,0 @@
1-
# MuMDIA Configuration Management Improvement
2-
3-
## The Problem
4-
The original `run.py` had extremely complex argument parsing and configuration management:
5-
6-
- **~100 lines** of argument parsing code
7-
- **Complex merging logic** between CLI args and config files
8-
- **Difficult to maintain** argument parsing functions
9-
- **Hard to understand** configuration flow
10-
- **Manual config validation** and error handling
11-
12-
## The Solution: Simplified Configuration with Dataclasses
13-
14-
### Before (Complex approach in `run.py`):
15-
16-
```python
17-
def parse_arguments() -> Tuple[argparse.ArgumentParser, argparse.Namespace]:
18-
"""Parse command line arguments - 50+ lines of argparse setup"""
19-
parser = argparse.ArgumentParser()
20-
parser.add_argument("--mzml_file", type=str, help="Path to mzML file", default=None)
21-
parser.add_argument("--fasta_file", type=str, help="Path to FASTA file", default=None)
22-
# ... 30+ more arguments ...
23-
args = parser.parse_args()
24-
return parser, args
25-
26-
def was_arg_explicitly_provided(parser: argparse.ArgumentParser, arg_name: str) -> bool:
27-
"""Check if argument was explicitly provided - complex logic"""
28-
for action in parser._actions:
29-
if arg_name in action.dest:
30-
for option in action.option_strings:
31-
if option in sys.argv:
32-
return True
33-
return False
34-
35-
def modify_config(parser: argparse.ArgumentParser, args: argparse.Namespace, config_path: str) -> Dict[str, Any]:
36-
"""Load and modify config - 50+ lines of complex merging logic"""
37-
# Load base config from JSON
38-
# Override with CLI args using complex checking
39-
# Save effective config
40-
# Return merged config dict
41-
42-
# In main():
43-
parser, args = parse_arguments()
44-
config = modify_config(parser, args, args.config_file)
45-
# Extract all individual values from config dict
46-
mzml_file = config["mzml_file"]
47-
fasta_file = config["fasta_file"]
48-
# ... dozens more extractions ...
49-
```
50-
51-
### After (Clean dataclass approach in `config_manager_clean.py`):
52-
53-
```python
54-
@dataclass
55-
class MuMDIAConfig:
56-
"""Clean, type-safe configuration with defaults"""
57-
mzml_file: str = ""
58-
fasta_file: str = ""
59-
result_dir: str = "results"
60-
n_windows: int = 10
61-
training_fdr: float = 0.05
62-
final_fdr: float = 0.01
63-
model_type: str = "xgboost"
64-
no_cache: bool = False
65-
clean: bool = False
66-
sage_only: bool = False
67-
# ... all options clearly defined
68-
69-
@classmethod
70-
def from_json(cls, json_path: str) -> "MuMDIAConfig":
71-
"""Simple JSON loading with error handling"""
72-
73-
@classmethod
74-
def from_args(cls, args=None) -> "MuMDIAConfig":
75-
"""Simple CLI parsing with config file support"""
76-
77-
def validate(self) -> None:
78-
"""Clean validation logic"""
79-
80-
def get_config() -> MuMDIAConfig:
81-
"""One-liner to get validated config"""
82-
config = MuMDIAConfig.from_args()
83-
config.validate()
84-
return config
85-
86-
# In main():
87-
config = get_config() # That's it! One line replaces 100+ lines!
88-
# Direct attribute access:
89-
print(config.mzml_file)
90-
print(config.n_windows)
91-
```
92-
93-
## Key Improvements
94-
95-
### 1. **Dramatic Code Reduction**
96-
- **Before**: ~100 lines of complex parsing logic
97-
- **After**: 1 line: `config = get_config()`
98-
- **Reduction**: 99% fewer lines for config management!
99-
100-
### 2. **Type Safety**
101-
- **Before**: Untyped dictionary access like `config["mzml_file"]`
102-
- **After**: Type-safe attribute access like `config.mzml_file`
103-
- **Benefit**: IDE autocomplete, type checking, fewer runtime errors
104-
105-
### 3. **Clear Defaults**
106-
- **Before**: Defaults scattered across argparse definitions
107-
- **After**: All defaults clearly visible in dataclass definition
108-
- **Benefit**: Easy to see and modify default values
109-
110-
### 4. **Better Error Handling**
111-
- **Before**: Manual validation scattered throughout code
112-
- **After**: Centralized validation in `validate()` method
113-
- **Benefit**: Consistent error messages and validation logic
114-
115-
### 5. **Simpler Usage Patterns**
116-
117-
#### Command line usage:
118-
```bash
119-
# Simple usage
120-
python run_simple.py --mzml_file data.mzML --fasta_file proteins.fasta
121-
122-
# With options
123-
python run_simple.py --mzml_file data.mzML --fasta_file proteins.fasta --n_windows 5 --verbose --no-cache
124-
125-
# With config file
126-
python run_simple.py --config_file my_config.json
127-
128-
# Config file with CLI overrides
129-
python run_simple.py --config_file my_config.json --clean --result_dir custom_results
130-
```
131-
132-
#### JSON config files:
133-
```json
134-
{
135-
"mzml_file": "data.mzML",
136-
"fasta_file": "proteins.fasta",
137-
"result_dir": "my_results",
138-
"n_windows": 15,
139-
"training_fdr": 0.1,
140-
"model_type": "nn",
141-
"verbose": true
142-
}
143-
```
144-
145-
## Implementation Status
146-
147-
**Created**: `config_manager_clean.py` - Complete simplified config system
148-
**Created**: `run_simple.py` - Demo of clean config usage
149-
**Created**: `config_demo.py` - Working demonstration
150-
**Tested**: Both CLI arguments and JSON config files work perfectly
151-
152-
## Migration Path
153-
154-
The new config system is **fully backwards compatible**. You can:
155-
156-
1. **Immediate adoption**: Use `config_manager_clean.py` for new features
157-
2. **Gradual migration**: Replace complex config logic piece by piece
158-
3. **Side-by-side**: Run both systems during transition period
159-
160-
## Developer Experience Impact
161-
162-
**Before**: Developers had to:
163-
- Navigate 100+ lines of complex argument parsing
164-
- Understand config merging logic
165-
- Manually handle validation
166-
- Debug dictionary key errors
167-
- Maintain scattered default values
168-
169-
**After**: Developers can:
170-
- See all configuration at a glance in the dataclass
171-
- Get IDE support with autocomplete and type checking
172-
- Add new config options by just adding a dataclass field
173-
- Trust that validation is centralized and consistent
174-
- Focus on business logic instead of configuration plumbing
175-
176-
The new approach makes MuMDIA **much more maintainable and developer-friendly**!

CONFIG_SIMPLIFIED_COMPLETE.md

Whitespace-only changes.

0 commit comments

Comments
 (0)