Skip to content

Commit e66a5f9

Browse files
authored
Merge pull request #5 from gofflab/claude/add-genome-reference-tool-t3h7n
Move config and indices to persistent user data directory
2 parents abca663 + 46b0385 commit e66a5f9

15 files changed

Lines changed: 469 additions & 47 deletions

README.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,14 @@ conda activate HCRProbeDesign
3838
pip install -e .
3939
```
4040

41+
## Data directory
42+
HCRProbeDesign stores configuration and Bowtie2 indices in a persistent user data directory
43+
at `~/.hcrprobedesign/` so they survive package upgrades. Override the location with the
44+
`HCRPROBEDESIGN_DATA_DIR` environment variable.
45+
4146
## Adding a new reference genome
42-
HCRProbeDesign keeps Bowtie2 index paths in `HCRconfig.yaml`. The `buildGenomeIndex` utility builds
43-
an index and registers it automatically.
47+
HCRProbeDesign keeps Bowtie2 index paths in `~/.hcrprobedesign/HCRconfig.yaml`. The `buildGenomeIndex`
48+
utility builds an index and registers it automatically.
4449

4550
```bash
4651
buildGenomeIndex --species zebrafish --fasta /path/to/genome.fa --threads 8
@@ -49,7 +54,7 @@ buildGenomeIndex --species zebrafish --fasta /path/to/genome.fa --threads 8
4954
Notes:
5055
- `--fasta` can be repeated and can point to a directory; all `.fa`, `.fasta`, or `.fna` files inside
5156
will be used.
52-
- By default, indices are written under the package `indices/` directory and the config is updated.
57+
- By default, indices are written under `~/.hcrprobedesign/indices/` and the config is updated.
5358
- Use `--indices-dir` to write indices elsewhere and `--config` to update a specific config file.
5459
- Use `--force` to overwrite an existing index or config entry.
5560

VERSIONINFO.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
## v0.3.1 - 04.08.2026
2+
+ Moved config and indices to persistent user data directory (`~/.hcrprobedesign/`)
3+
+ Reference genomes and configuration now survive `pip install -U` upgrades
4+
+ Automatic migration of data from old package-relative locations
5+
+ Support `HCRPROBEDESIGN_DATA_DIR` environment variable to override data location
16
## v0.3.0 - 04.08.2026
27
+ Added `listReferences` CLI tool to display installed reference genomes and default parameters
38
+ Added `HCRProbeDesign.listReferences` module with programmatic access to reference info

docs/configuration.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,24 @@
11
# Configuration
22

3-
HCRProbeDesign stores reference genome index paths in `HCRconfig.yaml`.
4-
The file is located in the package directory and is read at runtime.
3+
HCRProbeDesign stores reference genome index paths and default parameters in
4+
`HCRconfig.yaml`, located in the user data directory (`~/.hcrprobedesign/` by default).
5+
6+
## Data directory
7+
8+
All user data (configuration and Bowtie2 indices) lives in `~/.hcrprobedesign/`.
9+
This directory is created automatically on first use and **persists across package
10+
upgrades**, so you won't lose your indices when running `pip install -U hcrprobedesign`.
11+
12+
Override the location by setting the `HCRPROBEDESIGN_DATA_DIR` environment variable:
13+
```bash
14+
export HCRPROBEDESIGN_DATA_DIR=/path/to/custom/dir
15+
```
16+
17+
### Migration from older versions
18+
19+
If you are upgrading from v0.3.0 or earlier (where data was stored inside the
20+
package directory), HCRProbeDesign will automatically detect and migrate your
21+
existing indices and species registrations on first run.
522

623
## Example
724
```yaml
@@ -13,14 +30,15 @@ species:
1330
```
1431
1532
## Viewing the current configuration
16-
Run `listReferences` to display all registered species and default parameters:
33+
Run `listReferences` to display all registered species, default parameters, and the
34+
data directory location:
1735

1836
```bash
1937
listReferences
2038
```
2139

2240
## Notes
23-
- Paths can be absolute or package-relative.
41+
- Paths can be absolute or relative to the data directory.
2442
- `buildGenomeIndex` updates this file automatically.
2543
- `fetchMouseIndex` also registers the mouse index for you.
2644
- Use `--config` to write to a different config file when building indices.

docs/reference-genome.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ buildGenomeIndex --species zebrafish --fasta /path/to/genome.fa --threads 8
2525
Notes:
2626
- `--fasta` is repeatable and can point to a directory; all `.fa`, `.fasta`, or `.fna` files
2727
(including `.gz`) will be included.
28-
- Indices are written under the package `indices/` directory by default.
28+
- Indices are written under `~/.hcrprobedesign/indices/` by default.
2929
- Use `--indices-dir` to write indices elsewhere and `--config` to update a specific config file.
3030
- Use `--force` to overwrite an existing index or config entry.
3131

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[metadata]
22
# replace with your username:
33
name = hcrprobedesign
4-
version = 0.3.0
4+
version = 0.3.1
55
author = Loyal A. Goff
66
author_email = loyalgoff@jhmi.edu
77
description = Probe Design tool for Hybridization Chain Reaction

src/HCRProbeDesign/__init__.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from . import probeDesign
44
from . import thermo
55
from . import sequencelib
6+
from ._datadir import get_data_dir, get_config_path, get_indices_dir, ensure_data_dir
67

78
import os
89
try:
@@ -13,8 +14,14 @@
1314
_ROOT = os.path.abspath(os.path.dirname(__file__))
1415

1516
def index_path():
16-
"""Return the default package-relative path for Bowtie2 indices."""
17-
return os.path.join(_ROOT, 'indices')
17+
"""Return the path to the user's Bowtie2 indices directory.
18+
19+
Returns the user data directory (``~/.hcrprobedesign/indices/``) which
20+
persists across package upgrades. The data directory is created
21+
automatically on first access.
22+
"""
23+
ensure_data_dir()
24+
return get_indices_dir()
1825

1926
def _resolve_version():
2027
if importlib_metadata is not None:

src/HCRProbeDesign/_datadir.py

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
"""Manage the user data directory for HCRProbeDesign.
2+
3+
Configuration and Bowtie2 indices are stored in a persistent user directory
4+
(``~/.hcrprobedesign/`` by default) so they survive package upgrades.
5+
6+
Override the location by setting the ``HCRPROBEDESIGN_DATA_DIR`` environment
7+
variable.
8+
"""
9+
10+
import glob
11+
import os
12+
import shutil
13+
import sys
14+
15+
import yaml
16+
17+
_PACKAGE_DIRECTORY = os.path.dirname(os.path.abspath(__file__))
18+
_DEFAULT_DATA_DIR = os.path.join(os.path.expanduser("~"), ".hcrprobedesign")
19+
_SEED_CONFIG = os.path.join(_PACKAGE_DIRECTORY, "HCRconfig.yaml")
20+
21+
22+
def get_data_dir():
23+
"""
24+
Return the path to the user data directory.
25+
26+
Uses ``HCRPROBEDESIGN_DATA_DIR`` if set, otherwise ``~/.hcrprobedesign``.
27+
28+
:return: Absolute path to the data directory.
29+
"""
30+
return os.environ.get("HCRPROBEDESIGN_DATA_DIR", _DEFAULT_DATA_DIR)
31+
32+
33+
def get_config_path():
34+
"""
35+
Return the path to the user's ``HCRconfig.yaml``.
36+
37+
:return: Absolute path to the config file.
38+
"""
39+
return os.path.join(get_data_dir(), "HCRconfig.yaml")
40+
41+
42+
def get_indices_dir():
43+
"""
44+
Return the path to the user's indices directory.
45+
46+
:return: Absolute path to the indices directory.
47+
"""
48+
return os.path.join(get_data_dir(), "indices")
49+
50+
51+
def _load_yaml(path):
52+
"""Load a YAML file, returning an empty dict if missing."""
53+
if not os.path.exists(path):
54+
return {}
55+
with open(path, "r") as fh:
56+
return yaml.safe_load(fh) or {}
57+
58+
59+
def _old_package_config_path():
60+
"""Return the path to the old package-relative config file."""
61+
return os.path.join(_PACKAGE_DIRECTORY, "HCRconfig.yaml")
62+
63+
64+
def _old_package_indices_dir():
65+
"""Return the path to the old package-relative indices directory."""
66+
return os.path.join(_PACKAGE_DIRECTORY, "indices")
67+
68+
69+
def _has_old_data():
70+
"""
71+
Check whether there is user data in the old package-relative locations.
72+
73+
:return: Tuple of (has_species_entries, list_of_index_dirs).
74+
"""
75+
old_config = _load_yaml(_old_package_config_path())
76+
species = old_config.get("species", {}) or {}
77+
has_species = len(species) > 0
78+
79+
old_indices = _old_package_indices_dir()
80+
index_dirs = []
81+
if os.path.isdir(old_indices):
82+
for entry in os.listdir(old_indices):
83+
entry_path = os.path.join(old_indices, entry)
84+
if os.path.isdir(entry_path):
85+
bt2_files = glob.glob(os.path.join(entry_path, "*.bt2*"))
86+
if bt2_files:
87+
index_dirs.append(entry)
88+
89+
return has_species, index_dirs
90+
91+
92+
def _migrate_old_data():
93+
"""
94+
Migrate config entries and index files from the package directory to the
95+
user data directory.
96+
97+
Prints progress messages to stderr.
98+
"""
99+
data_dir = get_data_dir()
100+
new_config_path = get_config_path()
101+
new_indices_dir = get_indices_dir()
102+
103+
old_config_path = _old_package_config_path()
104+
old_indices_dir = _old_package_indices_dir()
105+
106+
old_config = _load_yaml(old_config_path)
107+
new_config = _load_yaml(new_config_path)
108+
109+
old_species = old_config.get("species", {}) or {}
110+
new_species = new_config.setdefault("species", {})
111+
112+
migrated_any = False
113+
114+
# Migrate index directories
115+
if os.path.isdir(old_indices_dir):
116+
for entry in os.listdir(old_indices_dir):
117+
old_entry_path = os.path.join(old_indices_dir, entry)
118+
if not os.path.isdir(old_entry_path):
119+
continue
120+
bt2_files = glob.glob(os.path.join(old_entry_path, "*.bt2*"))
121+
if not bt2_files:
122+
continue
123+
124+
new_entry_path = os.path.join(new_indices_dir, entry)
125+
if os.path.exists(new_entry_path):
126+
print(
127+
f" Skipping index '{entry}' (already exists in {new_indices_dir})",
128+
file=sys.stderr,
129+
)
130+
continue
131+
132+
print(
133+
f" Moving index '{entry}' -> {new_entry_path}",
134+
file=sys.stderr,
135+
)
136+
shutil.copytree(old_entry_path, new_entry_path)
137+
migrated_any = True
138+
139+
# Migrate species config entries
140+
for name, entry in old_species.items():
141+
if name in new_species:
142+
continue
143+
old_index = entry.get("bowtie2_index", "")
144+
# Rewrite relative paths to point to the new indices dir
145+
if not os.path.isabs(old_index):
146+
# e.g. "indices/mm10/mm10" -> new absolute path
147+
basename = old_index
148+
if basename.startswith("indices/") or basename.startswith("indices\\"):
149+
basename = basename[len("indices/"):]
150+
new_index = os.path.join(new_indices_dir, basename)
151+
else:
152+
new_index = old_index
153+
new_species[name] = {"bowtie2_index": new_index}
154+
migrated_any = True
155+
print(
156+
f" Migrated species '{name}' -> {new_index}",
157+
file=sys.stderr,
158+
)
159+
160+
# Preserve default_params from old config if not present in new
161+
if "default_params" not in new_config and "default_params" in old_config:
162+
new_config["default_params"] = old_config["default_params"]
163+
migrated_any = True
164+
165+
if migrated_any:
166+
with open(new_config_path, "w") as fh:
167+
yaml.safe_dump(new_config, fh, sort_keys=False)
168+
169+
return migrated_any
170+
171+
172+
def ensure_data_dir():
173+
"""
174+
Ensure the user data directory exists and is seeded.
175+
176+
On first run, creates ``~/.hcrprobedesign/`` with a seeded
177+
``HCRconfig.yaml`` and ``indices/`` directory. If data exists in the
178+
old package-relative location, it is migrated automatically.
179+
"""
180+
data_dir = get_data_dir()
181+
config_path = get_config_path()
182+
indices_dir = get_indices_dir()
183+
184+
if os.path.isdir(data_dir) and os.path.exists(config_path):
185+
return # Already initialized
186+
187+
first_init = not os.path.isdir(data_dir)
188+
os.makedirs(indices_dir, exist_ok=True)
189+
190+
# Seed config from package default if no config exists yet
191+
if not os.path.exists(config_path):
192+
if os.path.exists(_SEED_CONFIG):
193+
shutil.copy2(_SEED_CONFIG, config_path)
194+
else:
195+
# Minimal fallback
196+
with open(config_path, "w") as fh:
197+
yaml.safe_dump({"species": {}, "default_params": {}}, fh)
198+
199+
if first_init:
200+
print(
201+
f"Initialized HCRProbeDesign data directory: {data_dir}",
202+
file=sys.stderr,
203+
)
204+
205+
# Check for data in old package location and migrate
206+
has_species, index_dirs = _has_old_data()
207+
if has_species or index_dirs:
208+
print(
209+
"Found reference data in old package directory. Migrating...",
210+
file=sys.stderr,
211+
)
212+
_migrate_old_data()
213+
print("Migration complete.", file=sys.stderr)

0 commit comments

Comments
 (0)