Skip to content

v1.17.0-ce

Choose a tag to compare

@zhendi zhendi released this 16 Mar 02:37
· 8 commits to main since this release
2cc5075

v1.17.0-ce

✨ New Features

  • Advanced Scheduling with Volcano
    • Integrated Volcano Scheduler to enhance high-performance workload management.
    • Added support for Volcano vGPU resource definitions and MIG (Multi-Instance GPU) resources with full utility and test coverage.
  • Infrastructure & Compute Visibility
    • Introduced Public Cluster Info for a dedicated computation power page.
    • Added Cluster Resource Health Check to monitor deployment readiness.
    • Enhanced cluster management APIs to support portal integration and real-time node/queue data collection.
  • Deployment & Runtime Enhancements
    • Added Pending Status for deployments to provide better feedback during resource allocation.
    • Introduced RuntimeFrameworkID and DriverVersion fields to deployment and space displays for better environment tracking.
    • Supported updating resource and cluster IDs for existing inference and finetune tasks.
  • Agent & Skills Hub
    • Introduced Skills Support with multi-sync capabilities and a "User Likes" API.
    • Added Agent Access Middleware that enables token usage tracking.
  • Security & Authentication
    • Added Basic Auth support to the Authenticator.
    • Introduced a configurable bypass for sensitive content detection in the AI Gateway.
    • Enabled access token support for non-MCP spaces via reverse proxy.

🚀 Enhancements & Bug Fixes

  • Storage & Git Performance
    • Optimized LFS Sync pointer retrieval and fixed pre-upload timeout issues for repositories with 300k+ files.
    • Fixed MinIO multipart upload errors and addressed metadata EOF issues via storage gateway pre-signed URLs.
    • Added ScanFileNumLimit configuration to prevent overhead during file scanning.
  • Reporting & Data Export
    • Added Time Range Queries and CSV Export functionality for system reports.
    • Updated organizations table with a UUID column and implemented conflict checks.
  • System Stability
    • Added configurable timeouts for Temporal GetSystemInfo calls.
    • Added a Scaffold Command to streamline code generation for developers.
  • Bug Fixes
    • Fixed a data duplication bug in the Models API.
    • Resolved AMD GPU inference issues within Kubernetes (K8s) environments.
    • Fixed xnet migration status filters for models and datasets index APIs.
    • Addressed various error-handling bugs, including non-existent service deletion and MinIO upload failures.

New Contributors

Full Changelog: OpenCSGs/csghub-server@v1.16.1-ce...v1.17.0-ce