An API-first platform for ingesting, browsing, and analysing billion-scale AI image datasets. Built with ASP.NET Core minimal APIs and a Blazor WebAssembly client.
- API-Driven Lifecycle: Dataset creation, ingestion status, and item retrieval exposed via REST endpoints.
- Virtualized Viewing: Only render what the user sees while prefetching nearby items for buttery scrolling.
- Sliding-Window Infinite Scroll: Browse very large image datasets with a fixed-size in-memory window, loading pages ahead/behind as you scroll while evicting old items to avoid WebAssembly out-of-memory crashes.
- Streaming Ingestion (Roadmap): Designed for chunked uploads and background parsing to avoid memory spikes.
- Shared Contracts: Typed DTOs shared between client and server for end-to-end consistency.
- Modular Extensibility: Pluggable parsers, modalities, and viewers via dependency injection.
- Observability Ready: Hooks for telemetry, structured logging, and health endpoints.
- .NET 8.0 SDK or later
- Modern web browser (Chrome, Firefox, Safari, Edge)
- ~2GB RAM for development
- ~100MB disk space
git clone <your-repo-url>
cd HartsysDatasetEditordotnet restoredotnet build# Terminal 1 - Minimal API (serves dataset lifecycle routes)
dotnet run --project src/HartsysDatasetEditor.Api
# Terminal 2 - Blazor WebAssembly client
dotnet run --project src/HartsysDatasetEditor.ClientBoth projects share contracts via HartsysDatasetEditor.Contracts. The API currently uses in-memory repositories for smoke testing.
Navigate to: https://localhost:5001 (client dev server). Ensure the API is running at https://localhost:7085 (default Kestrel HTTPS port) or update the client's appsettings.Development.json accordingly.
Support for uploading and ingesting datasets is being rebuilt for the API-first architecture. The previous client-only ingestion flow has been removed. Follow the roadmap below to help implement the new streaming ingestion pipeline. For now, smoke-test the API using the built-in in-memory dataset endpoints:
POST /api/datasets // create dataset stub
GET /api/datasets // list datasets
GET /api/datasets/{id} // inspect dataset detail
GET /api/datasets/{id}/items?pageSize=100HartsysDatasetEditor/
βββ src/
β βββ HartsysDatasetEditor.Api/ # ASP.NET Core minimal APIs for dataset lifecycle + items
β β βββ Extensions/ # Service registration helpers
β β βββ Models/ # Internal persistence models
β β βββ Services/ # In-memory repositories, ingestion stubs
β βββ HartsysDatasetEditor.Client/ # Blazor WASM UI
β β βββ Components/ # Viewer, Dataset, Filter, Common UI pieces
β β βββ Services/ # State management, caching, API clients (roadmap)
β β βββ wwwroot/ # Static assets, CSS, JS
β βββ HartsysDatasetEditor.Contracts/ # Shared DTOs (pagination, datasets, filters)
β
βββ tests/ # Unit tests
βββ README.md
The editor follows a strictly API-first workflow so that every client action flows through the HTTP layer before touching storage. High-level components:
- Blazor WebAssembly Client β virtualized viewers, upload wizard, and caching services that call the API via typed
HttpClientwrappers. Prefetch and IndexedDB caching are planned per docs/architecture.md. - ASP.NET Core Minimal API β orchestrates dataset lifecycle, ingestion coordination, and cursor-based item paging. Background hosted services handle ingestion and stub persistence today.
- Backing Services β pluggable storage (blob), database (PostgreSQL/Dynamo), and search index (Elastic/OpenSearch) abstractions so we can swap implementations as we scale.
See the detailed blueprint, data flows, and phased roadmap in docs/architecture.md for deeper dives.
- Uses cursor-based paging from the API to request small, contiguous chunks of items.
- Keeps a fixed-size in-memory window (
DatasetState.Items) instead of materializing all N items on the client. - Slides the window forward and backward as you scroll, evicting old items from memory to avoid WebAssembly out-of-memory crashes.
- Rehydrates earlier or later regions of the dataset from IndexedDB (when enabled) or the API when you scroll back.
-
Start the API
dotnet run --project src/HartsysDatasetEditor.Api
By default this listens on
https://localhost:7085. Trust the dev certificate the first time. -
Start the Blazor WASM client
dotnet run --project src/HartsysDatasetEditor.Client
The dev server hosts the static client at
https://localhost:5001. -
Configure the client-to-API base address
- The client reads
DatasetApi:BaseAddressfromwwwroot/appsettings.Development.json. Leave it at the defaulthttps://localhost:7085or update it if the API port changes.
- The client reads
-
Browse the app
- Navigate to
https://localhost:5001. The client will call the API for dataset lists/items. - Verify CORS is enabled for the WASM origin once the API CORS policy is implemented (see roadmap).
- Navigate to
When deploying as an ASP.NET Core hosted app, the API project can serve the WASM assets directly; until then, the two projects run side-by-side as above.
- ASP.NET Core 8.0: Minimal API hosting and background services
- Blazor WebAssembly: Client-side SPA targeting the API
- MudBlazor: Material Design component library
- CsvHelper: Planned streaming ingestion parsing
- IndexedDB / LocalStorage: Client-side caching strategy (roadmap)
- Virtualization: Blazor's built-in
<Virtualize>component
Microsoft.AspNetCore.Components.WebAssemblyMudBlazor- Material Design UI componentsBlazored.LocalStorage- Browser storageCsvHelper- CSV/TSV parsing
- No external dependencies (lightweight by design)
- Client configuration lives in
wwwroot/appsettings*.json. Update theDatasetApi:BaseAddressonce the API host changes. - API configuration is stored in
appsettings*.jsonunder thesrc/HartsysDatasetEditor.Apiproject. Adjust logging and CORS settings here.
- Create a parser implementing
IDatasetParserin the ingestion pipeline. - Register it in DI through a parser registry service.
- Add format to
DatasetFormatenum and expose via API capability endpoint.
public class MyFormatParser : IDatasetParser
{
public bool CanParse(string data) { /* ... */ }
public IAsyncEnumerable<IDatasetItem> ParseAsync(string data) { /* ... */ }
}- Create a provider implementing
IModalityProvider - Register in
ModalityProviderRegistry - Add modality to
Modalityenum - Create viewer component
- Virtualized rendering via
<Virtualize>keeps browser memory flat while streaming new pages. - API pagination uses cursor tokens and configurable page sizes to keep server memory bounded.
- Future ingestion jobs will stream upload parsing to avoid buffering entire files.
dotnet publish -c ReleaseOutput in: src/HartsysDatasetEditor.Client/bin/Release/net8.0/publish/
- Build for production
- Copy
wwwrootcontents togh-pagesbranch - Enable GitHub Pages in repo settings
- Create Static Web App in Azure Portal
- Configure build:
- App location:
src/HartsysDatasetEditor.Client - Output location:
wwwroot
- App location:
- Deploy via GitHub Actions
- Ensure both API and client are running before testing. API defaults to HTTPS, so trust the development certificate when prompted.
- Use Swagger/OpenAPI (coming soon) or tools like
curl/httpie/Postman to verify endpoint availability. - When modifying contracts, update both server and client references to avoid serialization errors.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check existing documentation
- Review the MVP completion status document
The detailed architecture, phased roadmap, and task checklist live in docs/architecture.md. Highlights:
- Infrastructure β β API and shared contracts scaffolded; configure hosted solution + README updates.
- API Skeleton β In progress; dataset CRUD endpoints implemented with in-memory storage, upload endpoint pending.
- Client Refactor β Pending; migrate viewer to API-backed pagination and caching services.
- Ingestion & Persistence β Pending; implement streaming ingestion worker and backing database.
- Advanced Features β Pending; CDN integration, SignalR notifications, plugin architecture.
Current Version: 0.2.0-alpha
Status: API-first migration in progress
Last Updated: 2025
DatasetStudio/ βββ Docs/ β βββ Installation/ β β βββ QuickStart.md β β βββ SingleUserSetup.md β β βββ MultiUserSetup.md β βββ UserGuides/ β β βββ ViewingDatasets.md β β βββ CreatingDatasets.md β β βββ EditingDatasets.md β βββ API/ β β βββ APIReference.md β βββ Development/ β βββ ExtensionDevelopment.md β βββ Contributing.md β βββ Core/ # Shared domain logic β βββ DomainModels/ β β βββ Datasets/ β β β βββ Dataset.cs β β β βββ DatasetMetadata.cs β β βββ Items/ β β β βββ DatasetItem.cs β β β βββ ImageItem.cs β β β βββ Caption.cs β β βββ Users/ β β βββ User.cs β β βββ UserSettings.cs β βββ Enumerations/ β β βββ DatasetFormat.cs β β βββ Modality.cs β β βββ UserRole.cs β β βββ ExtensionType.cs β βββ Abstractions/ β β βββ Parsers/ β β β βββ IDatasetParser.cs β β βββ Storage/ β β β βββ IStorageProvider.cs β β βββ Extensions/ β β β βββ IExtension.cs β β β βββ IExtensionRegistry.cs β β βββ Repositories/ β β βββ IDatasetRepository.cs β βββ BusinessLogic/ β β βββ Parsers/ β β β βββ ParserRegistry.cs β β β βββ UnsplashTsvParser.cs β β β βββ ParquetParser.cs β β βββ Storage/ β β β βββ LocalStorageProvider.cs β β β βββ S3StorageProvider.cs β β βββ Extensions/ β β βββ ExtensionRegistry.cs β β βββ ExtensionLoader.cs β βββ Utilities/ β β βββ Logging/ β β β βββ Logs.cs β β βββ Helpers/ β β β βββ ImageHelper.cs β β β βββ ParquetHelper.cs β β βββ Encryption/ β β βββ ApiKeyEncryption.cs β βββ Constants/ β βββ DatasetFormats.cs β βββ Modalities.cs β βββ Contracts/ # DTOs shared between API & Client β βββ Common/ β β βββ PagedResponse.cs β β βββ FilterRequest.cs β βββ Datasets/ β β βββ DatasetDto.cs β β βββ CreateDatasetRequest.cs β βββ Users/ β β βββ UserDto.cs β β βββ LoginRequest.cs β βββ Extensions/ β βββ ExtensionInfoDto.cs β βββ APIBackend/ β βββ Configuration/ β β βββ Program.cs β β βββ appsettings.json β β βββ appsettings.Development.json β βββ Controllers/ β β βββ DatasetsController.cs β β βββ ItemsController.cs β β βββ UsersController.cs β β βββ ExtensionsController.cs β βββ Services/ β β βββ DatasetManagement/ β β β βββ DatasetService.cs β β β βββ IngestionService.cs β β βββ Authentication/ β β β βββ UserService.cs β β β βββ AuthService.cs β β βββ Extensions/ β β βββ ExtensionLoaderService.cs β βββ DataAccess/ β β βββ PostgreSQL/ β β β βββ Repositories/ β β β β βββ DatasetRepository.cs β β β β βββ UserRepository.cs β β β βββ DbContext.cs β β β βββ Migrations/ β β βββ Parquet/ β β βββ ParquetItemRepository.cs β β βββ ParquetWriter.cs β βββ Middleware/ β β βββ AuthenticationMiddleware.cs β β βββ ErrorHandlingMiddleware.cs β βββ BackgroundWorkers/ β βββ IngestionWorker.cs β βββ ThumbnailGenerationWorker.cs β βββ ClientApp/ # Blazor WASM Frontend β βββ Configuration/ β β βββ Program.cs β β βββ App.razor β β βββ _Imports.razor β β β βββ wwwroot/ # β Standard Blazor static files folder β β βββ index.html β β βββ Themes/ β β β βββ LightTheme.css β β β βββ DarkTheme.css β β β βββ CustomTheme.css β β βββ css/ β β β βββ app.css β β βββ js/ β β βββ Interop.js β β βββ IndexedDB.js β β βββ InfiniteScroll.js β β βββ Installer.js β β β βββ Features/ β β βββ Home/ β β β βββ Pages/ β β β β βββ Index.razor β β β βββ Components/ β β β βββ WelcomeCard.razor β β β β β βββ Installation/ β β β βββ Pages/ β β β β βββ Install.razor β β β βββ Components/ β β β β βββ WelcomeStep.razor β β β β βββ DeploymentModeStep.razor β β β β βββ AdminAccountStep.razor β β β β βββ ExtensionSelectionStep.razor β β β β βββ StorageConfigStep.razor β β β β βββ CompletionStep.razor β β β βββ Services/ β β β βββ InstallationService.cs β β β β β βββ Datasets/ β β β βββ Pages/ β β β β βββ DatasetLibrary.razor β β β β βββ DatasetViewer.razor β β β βββ Components/ β β β β βββ DatasetCard.razor β β β β βββ DatasetUploader.razor β β β β βββ DatasetStats.razor β β β β βββ ImageGrid.razor β β β β βββ ImageCard.razor β β β β βββ ImageGallery.razor β β β β βββ ImageDetail.razor β β β β βββ InlineEditor.razor β β β β βββ FilterPanel.razor β β β β βββ AdvancedSearch.razor β β β βββ Services/ β β β βββ DatasetCacheService.cs β β β β β βββ Authentication/ β β β βββ Pages/ β β β β βββ Login.razor β β β βββ Components/ β β β βββ LoginForm.razor β β β βββ RegisterForm.razor β β β β β βββ Administration/ β β β βββ Pages/ β β β β βββ Admin.razor β β β βββ Components/ β β β βββ UserManagement.razor β β β βββ ExtensionManager.razor β β β βββ SystemSettings.razor β β β βββ Analytics.razor β β β β β βββ Settings/ β β βββ Pages/ β β β βββ Settings.razor β β βββ Components/ β β βββ AppearanceSettings.razor β β βββ AccountSettings.razor β β βββ PrivacySettings.razor β β β βββ Shared/ # Components/layouts used across ALL features β β βββ Layout/ β β β βββ MainLayout.razor β β β βββ NavMenu.razor β β β βββ AdminLayout.razor β β βββ Components/ β β β βββ LoadingSpinner.razor β β β βββ EmptyState.razor β β β βββ ErrorBoundary.razor β β β βββ ConfirmDialog.razor β β β βββ Toast.razor β β βββ Services/ β β βββ NotificationService.cs β β βββ ThemeService.cs β β β βββ Services/ # Global app-wide services β β βββ StateManagement/ β β β βββ AppState.cs β β β βββ UserState.cs β β β βββ ExtensionState.cs β β βββ ApiClients/ β β β βββ DatasetApiClient.cs β β β βββ UserApiClient.cs β β β βββ ExtensionApiClient.cs β β β βββ AIApiClient.cs β β βββ Caching/ β β β βββ IndexedDbCache.cs β β β βββ ThumbnailCache.cs β β βββ Interop/ β β βββ IndexedDbInterop.cs β β βββ InstallerInterop.cs β β β βββ ExtensionComponents/ # UI components from loaded extensions β βββ Extensions/ β βββ SDK/ β β βββ BaseExtension.cs β β βββ ExtensionMetadata.cs β β βββ ExtensionManifest.cs β β βββ DevelopmentGuide.md β β β βββ BuiltIn/ β β βββ CoreViewer/ β β β βββ extension.manifest.json β β β βββ CoreViewerExtension.cs β β β βββ Components/ β β β βββ Services/ β β β βββ Assets/ β β β β β βββ Creator/ β β β βββ extension.manifest.json β β β βββ CreatorExtension.cs β β β βββ Components/ β β β β βββ Upload/ β β β β βββ Import/ β β β β βββ Configuration/ β β β βββ Services/ β β β β βββ ZipExtractor.cs β β β β βββ RarExtractor.cs β β β β βββ HuggingFaceImporter.cs β β β βββ Assets/ β β β β β βββ Editor/ β β β βββ extension.manifest.json β β β βββ EditorExtension.cs β β β βββ Components/ β β β β βββ Inline/ β β β β βββ Bulk/ β β β β βββ Captions/ β β β β βββ Metadata/ β β β βββ Services/ β β β β βββ EditService.cs β β β β βββ BulkOperationService.cs β β β β βββ CaptionService.cs β β β βββ Assets/ β β β β β βββ AITools/ β β β βββ extension.manifest.json β β β βββ AIToolsExtension.cs β β β βββ Components/ β β β β βββ Captioning/ β β β β βββ ModelSelection/ β β β β βββ Scoring/ β β β β βββ BatchProcessing/ β β β βββ Services/ β β β β βββ Engines/ β β β β β βββ BlipEngine.cs β β β β β βββ ClipEngine.cs β β β β β βββ OpenAIEngine.cs β β β β β βββ AnthropicEngine.cs β β β β β βββ LocalLLMEngine.cs β β β β βββ ScoringService.cs β β β β βββ BatchProcessor.cs β β β βββ Models/ β β β β βββ Florence2/ β β β β βββ ONNX/ β β β β βββ CLIP/ β β β β βββ LocalLLM/ β β β βββ Assets/ β β β β β βββ AdvancedTools/ β β βββ extension.manifest.json β β βββ AdvancedToolsExtension.cs β β βββ Components/ β β β βββ Conversion/ β β β βββ Merging/ β β β βββ Deduplication/ β β β βββ Analysis/ β β βββ Services/ β β β βββ FormatConverter.cs β β β βββ DatasetMerger.cs β β β βββ Deduplicator.cs β β β βββ QualityAnalyzer.cs β β βββ Assets/ β β β βββ UserExtensions/ # Third-party extensions β βββ README.md # How to add user extensions β βββ ExampleExtension/ β βββ extension.manifest.json β βββ ExampleExtension.cs β βββ Components/ β βββ Services/ β βββ Assets/ β βββ Tests/ β βββ CoreTests/ β βββ APIBackendTests/ β βββ ClientAppTests/ β βββ IntegrationTests/ β βββ Scripts/ β βββ Setup.sh β βββ Setup.ps1 β βββ MigrateDatabase.sh β βββ README.md βββ ARCHITECTURE.md βββ LICENSE βββ .gitignore