-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy path.gitignore
More file actions
112 lines (93 loc) · 1.75 KB
/
.gitignore
File metadata and controls
112 lines (93 loc) · 1.75 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# Virtual environments
venv/
ENV/
env/
.venv
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
.DS_Store
# Testing
.pytest_cache/
.coverage
htmlcov/
.tox/
.mypy_cache/
.dmypy.json
dmypy.json
# Output directories
output/
dashboard/
# OAK cache
.oak_cache/
references_cache/
# UMAP embedding cache (large pickle files - DO NOT commit)
.umap_cache/
# Jupyter
.ipynb_checkpoints/
*.ipynb
# Logs
*.log
# Generated documentation
docs/_build/
docs/schema/
# Temporary files
*.tmp
*.bak
.cache/
# Data layers (see docs/DATA_LAYERS.md)
# Three-tier system: data/raw/ → data/raw_yaml/ → data/normalized_yaml/
# Layer 1: Raw data files (large, can be regenerated or fetched)
data/raw/**/*.json
data/raw/**/*.tsv
data/raw/**/*.csv
data/raw/**/*.tar.gz
data/raw/**/*.zip
data/raw/**/*.html
data/raw/**/*.sql
data/raw/**/.DS_Store
# Keep provenance documentation
!data/raw/**/README.md
# Keep MicrobeMediaParam chemical mappings (essential for CHEBI grounding)
!data/raw/microbe-media-param/compound_mappings_strict_final.tsv
!data/raw/microbe-media-param/compound_mappings_strict_final_hydrate.tsv
# NBRC scraped HTML cache
data/raw/nbrc/scraped/
# Layer 2: Raw YAML (unnormalized, regenerable from data/raw/)
data/raw_yaml/**/*.yaml
data/raw_yaml/**/.DS_Store
# Layer 3: Normalized YAML is VERSION CONTROLLED (curated recipes)
# Keep all data/normalized_yaml/ in git
!data/normalized_yaml/**/*.yaml
data/normalized_yaml/**/.DS_Store
# Processed data (regenerable)
data/processed/
dismech/
data/embeddings/*.tsv.gz
# KG-Microbe data files (too large for GitHub)
data/kgm/merged-kg_edges.tsv
data/kgm/merged-kg_nodes.tsv