Relocating TSFM-Specifc CSV Files to CouchDB#406
Conversation
…to CouchDB Issue IBM#297: Remove tsfm specific csv from scenarios, relocate the files to couchdb Changes: - Add tsfm collection configuration to collections.json with CSV format and Timestamp primary key - Add tsfm CSV files to src/couchdb/scenarios_data/shared/tsfm/: - chiller9_annotated_small_test.csv - chiller9_finetuning_small.csv - chiller9_tsad.csv - Update all scenario manifests (default, scenario_1, scenario_2) to include tsfm data sources - Update .allowed_datafiles to include the new tsfm CSV files - Update documentation to reference CouchDB collection 'tsfm' instead of local CSV paths: - case_study_industrial_asset_management.md: Update forecasting and anomaly detection examples - ground_truth_design_guideline.md: Update ground truth patterns to use CouchDB paths - data.md: add tsfm directory to documentation This enables TSFM utterances to load data from CouchDB instead of local files. Signed-off-by: Faizan Khan <faizanakhan2003@gmail.com>
Resolved merge conflicts in: - src/couchdb/.allowed_datafiles: Combined local TSFM CSV files with upstream asset_profile_sample.json - src/couchdb/scenarios_data/default/manifest.json: Merged local tsfm data sources with upstream asset field - src/couchdb/collections.json: Added upstream asset collection while keeping local tsfm collection Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
DhavalRepo18
left a comment
There was a problem hiding this comment.
We are considering the dataset as a part of IoT (as it's just data), so the core file should go inside IoT.
Please also check other pending PR for the same problem.
Move TSFM dataset files into IoT directory as requested by reviewer feedback. This consolidates time series data under the IoT collection since it's sensor/telemetry data rather than a separate collection. Changes: - Moved chiller9_annotated_small_test.csv, chiller9_finetuning_small.csv, and chiller9_tsad.csv from shared/tsfm/ to shared/iot/ - Removed tsfm collection from collections.json - Updated all manifest.json files to reference iot collection instead of tsfm - Updated .allowed_datafiles with new IoT paths - Updated documentation references from tsfm to iot collection - Updated data.md directory structure to reflect IoT contains both JSON and CSV Signed-off-by: Faizan Khan <faizanakhan2003@gmail.com>
Sync with latest upstream changes from IBM/AssetOpsBench
|
I've moved the TSFM dataset files from shared/tsfm/ to shared/iot/ as requested, since the data Changes made: • Moved chiller9_annotated_small_test.csv, chiller9_finetuning_small.csv, and chiller9_tsad.csv Regarding other pending PRs: I checked PR #343 and found it has the same issue - the reviewer there |
Description
This PR addresses issue #297 by relocating TSFM-specific CSV files from local storage to CouchDB.Previously, TSFM-related utterances used local CSV files, which has been updated to load data
from the CouchDB tsfm collection instead. This change standardizes data access patterns across
the benchmark suite and ensures all data is managed through the CouchDB infrastructure.
Fix Details
CouchDB Infrastructure:
• Added tsfm collection configuration to collections.json with CSV format and Timestamp as
primary key
• Configured proper ID prefix (tsfm) for document identification
• Set up collection with no additional indexes (optimized for time-series data access)
Data Files:
• Added 3 TSFM CSV files to src/couchdb/scenarios_data/shared/tsfm/:
• chiller9_annotated_small_test.csv (192 records for forecasting inference)
• chiller9_finetuning_small.csv (192 records for model fine-tuning)
• chiller9_tsad.csv (192 records for time-series anomaly detection)
• All files contain timestamped sensor data for Chiller 9 Condenser Water Flow
• Data spans from 2020-04-27 to 2020-04-28 with 15-minute intervals
Configuration Updates:
• Updated all scenario manifests (default, scenario_1, scenario_2) to include tsfm data sources
• Updated .allowed_datafiles to include the new tsfm CSV files for security checks
• Ensured proper path resolution through the shared data directory structure
Documentation Updates:
• case_study_industrial_asset_management.md: Updated forecasting and anomaly detection examples
to reference CouchDB collection 'tsfm'
• ground_truth_design_guideline.md: Updated ground truth patterns to use CouchDB paths and
corrected agent actions (changed from jsonreader to csvreader, updated file paths)
• data.md: Added tsfm directory to the shared data directory structure documentation
Impact on Benchmarking
results).
This change is a data infrastructure improvement that does not affect benchmark scoring or
evaluation metrics. It standardizes how TSFM-related utterances access data, making the system
more maintainable and consistent with other data sources.
Related Issues
• Fixes: #297
Verification Steps
• CSV Parsing Validation: Tested parsing of all 3 CSV files using the loader's CSV parser
with tsfm collection config. Successfully parsed 192 documents from each file with proper
field mapping.
• Manifest Loading: Tested collection loading from the default manifest. Successfully loaded
576 total documents (192 from each of the 3 CSV files) through the manifest-based loader.
• Document Normalization: Verified document normalization properly adds dataset: 'tsfm' field
and generates correct _id fields using the Timestamp primary key (e.g., tsfm:2020-04-27T00:0
0:00).
• Path Resolution: Confirmed that relative paths in manifests (shared/tsfm/*.csv) properly
resolve against the scenario directory and shared data structure.
• Schema Validation: Verified that CSV files have correct structure with Timestamp and Chille
r_9_Condenser_Water_Flow columns matching the expected schema.
Checklist