v1.17.0-ce
v1.17.0-ce
✨ New Features
- Advanced Scheduling with Volcano
- Integrated Volcano Scheduler to enhance high-performance workload management.
- Added support for Volcano vGPU resource definitions and MIG (Multi-Instance GPU) resources with full utility and test coverage.
- Infrastructure & Compute Visibility
- Introduced Public Cluster Info for a dedicated computation power page.
- Added Cluster Resource Health Check to monitor deployment readiness.
- Enhanced cluster management APIs to support portal integration and real-time node/queue data collection.
- Deployment & Runtime Enhancements
- Added Pending Status for deployments to provide better feedback during resource allocation.
- Introduced
RuntimeFrameworkIDandDriverVersionfields to deployment and space displays for better environment tracking. - Supported updating resource and cluster IDs for existing inference and finetune tasks.
- Agent & Skills Hub
- Introduced Skills Support with multi-sync capabilities and a "User Likes" API.
- Added Agent Access Middleware that enables token usage tracking.
- Security & Authentication
- Added Basic Auth support to the Authenticator.
- Introduced a configurable bypass for sensitive content detection in the AI Gateway.
- Enabled access token support for non-MCP spaces via reverse proxy.
🚀 Enhancements & Bug Fixes
- Storage & Git Performance
- Optimized LFS Sync pointer retrieval and fixed pre-upload timeout issues for repositories with 300k+ files.
- Fixed MinIO multipart upload errors and addressed metadata EOF issues via storage gateway pre-signed URLs.
- Added
ScanFileNumLimitconfiguration to prevent overhead during file scanning.
- Reporting & Data Export
- Added Time Range Queries and CSV Export functionality for system reports.
- Updated organizations table with a UUID column and implemented conflict checks.
- System Stability
- Added configurable timeouts for Temporal
GetSystemInfocalls. - Added a Scaffold Command to streamline code generation for developers.
- Added configurable timeouts for Temporal
- Bug Fixes
- Fixed a data duplication bug in the Models API.
- Resolved AMD GPU inference issues within Kubernetes (K8s) environments.
- Fixed xnet migration status filters for models and datasets index APIs.
- Addressed various error-handling bugs, including non-existent service deletion and MinIO upload failures.
New Contributors
Full Changelog: OpenCSGs/csghub-server@v1.16.1-ce...v1.17.0-ce