Add missing dependencies with standalone search and GPT-NeoX integration options#3
Conversation
…sues Co-authored-by: aflah02 <72096386+aflah02@users.noreply.github.com>
Co-authored-by: aflah02 <72096386+aflah02@users.noreply.github.com>
|
@copilot Can you address the reviews? Also add README docs for different install options |
|
@copilot
|
Co-authored-by: aflah02 <72096386+aflah02@users.noreply.github.com>
All requested changes implemented in commit 06792cb: • Removed torch dependency - No longer included to avoid version conflicts with NeoX environment The dependency structure now provides:
|
|
@copilot I just realized that transformers also comes from NeoX so can you remove it too like torch since if we install it again we might end up with weird version conflicts |
Co-authored-by: aflah02 <72096386+aflah02@users.noreply.github.com>
Transformers dependency removed in commit a552ce6 to avoid version conflicts with NeoX environment. Changes made: The final core dependencies are now minimal: numpy, pandas, tqdm only. All other dependencies (torch, transformers, tokengrams, streamlit) are either optional or provided by the GPT-NeoX environment. |
|
@copilot There seems to be a fundamental misunderstanding on need for NeoX. NeoX is also needed for UI. The only thing that can work with NeoX is search only. Everything else needs NeoX installed |
Co-authored-by: aflah02 <72096386+aflah02@users.noreply.github.com>
Documentation corrected in commit 5b14cb9 to accurately reflect GPT-NeoX requirements. Key corrections made: The documentation now correctly reflects that search functionality is the only standalone feature, while UI, dataset operations, and advanced functionality all require the GPT-NeoX environment. |
Updated installation instructions and clarified dependencies for TokenSmith.
Problem
The
pyproject.tomlfile was missing essential dependencies that are used throughout the TokenSmith codebase. This caused import failures when users tried to install and use the package:Additionally, the codebase had hard imports for Megatron, tokengrams, and transformers dependencies, making it impossible to use TokenSmith functionality without installing all dependencies, which contradicts the README's guidance about modular installation options.
Solution
Added Minimal Dependency Structure
Updated
pyproject.tomlwith a minimal dependency structure that avoids version conflicts with GPT-NeoX environment:Core dependencies (always installed):
Optional dependency groups:
[search]- tokengrams for search functionality (works standalone)[ui]- streamlit, altair for web interface (requires GPT-NeoX)[docs]- mkdocs tools for documentation[all]- all optional dependencies combined (requires GPT-NeoX)Provided by GPT-NeoX environment (when needed):
Made All Optional Imports Conditional
Fixed hard dependencies by implementing conditional imports with proper error handling:
Megatron imports in
tokensmith/utils.py:Tokengrams imports in
tokensmith/search/handler.py:Transformers imports in multiple handler files:
When optional functionality is used without the required dependencies, users get clear error messages directing them to the appropriate install option or GPT-NeoX environment setup.
Clarified GPT-NeoX Requirements
Based on feedback, corrected the documentation to accurately reflect functionality requirements:
Works WITHOUT GPT-NeoX:
REQUIRES GPT-NeoX environment:
Avoided Version Conflicts
Updated Python Version
Changed minimum Python requirement from ^3.7 to ^3.8 to align with modern dependency requirements.
Comprehensive Documentation
Added detailed README sections documenting all installation options with clear separation between standalone and GPT-NeoX-dependent functionality.
Testing
All functionality has been validated:
Installation Options
Users can now install TokenSmith with fine-grained control over dependencies:
This provides users with clear options: standalone search functionality or full functionality with GPT-NeoX environment, while avoiding version conflicts and unnecessary dependencies.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.