You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Remove sys.path manipulation from predictor.py (use package imports directly)
- Remove unused OrderedDict import in create_label_mappings.py
- Update preprocessing defaults to 224x224 and imagenet normalization
- Update preprocess_image_batch and resize_image defaults to match
- Fix normalize_rgb docstring to accurately describe both normalization modes
- Update model_registry input_size to 224x224 and add norm_method field
- Make TreeClassifier norm_method configurable (default: imagenet)
- Fix predictor to use self.norm_method instead of hardcoded '0_1'
- Update from_checkpoint to use (224, 224) and imagenet defaults
- Add rgb_norm_method param to RGBClassifier; normalize() now reflects it
- Validate numeric_to_label_dict in upload_to_huggingface.py
- Fix docs: species_filter is inclusion filter, not exclusion
- Fix docs: add --csv_path to inspect_labels.py example commands
- Fix warning message: clarify species_filter is an inclusion filter
# Clean genus-level training (include only true genera, omit edge cases)
181
+
# species_filter keeps only rows where species code is in the list
182
+
all_codes = [...] # get from inspect_labels.py output
183
+
clean_codes = [c for c in all_codes if c notin ["PINACE"]] # drop Pinaceae
179
184
datamodule = NeonCrownDataModule(
180
185
csv_path="data/metadata/combined_dataset.csv",
181
186
hdf5_path="data/combined_dataset.h5",
182
187
modalities=["rgb"],
183
188
taxonomic_level="genus",
184
-
species_filter=["PINACE"], #Exclude Pinaceae family
189
+
species_filter=clean_codes, #include all except Pinaceae
185
190
batch_size=64,
186
191
)
187
192
# Now training on 59 true genera only
@@ -262,7 +267,7 @@ These represent unidentified species within that family.
262
267
See docs/taxonomic_levels.md for more information.
263
268
```
264
269
265
-
**These are informational** - training will proceed normally. Filter if desired using `species_filter`.
270
+
**These are informational** - training will proceed normally. To exclude them, build an inclusion list with all other codes and pass it to `species_filter` (which keeps only species in the list).
266
271
267
272
## FAQ
268
273
@@ -277,7 +282,7 @@ See docs/taxonomic_levels.md for more information.
277
282
**Q: What about Pinaceae?**
278
283
- It's a family name, not genus, but only 26 samples (0.05%)
279
284
- Keep it (recommended): Represents "unidentified conifer" class
280
-
-Filter it: Use `species_filter=["PINACE"]` if you need taxonomic purity
285
+
-Exclude it: Build an inclusion list of all codes except `"PINACE"` and pass to `species_filter`
0 commit comments