PoC of repository ai bootstrap (#7585)

JanKrivanek · Copilot · kotlarmilos · web-flow · commit 9d809f16570f · 2026-03-20T20:22:44.000+01:00
* initiate ai * Update .github/workflows/inclusive-heat-sensor.yml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add issue triage workflow for labeling and commenting on new issues (#7592) This workflow automates the process of analyzing newly opened issues, applying appropriate labels, and providing comments based on the issue content. It includes definitions for various labels, instructions for triaging, and hints for relevant project areas. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Milos Kotlar <kotlarmilos@gmail.com>
diff --git a/.gitattributes b/.gitattributes
@@ -6,3 +6,5 @@
 # Force bash scripts to always use lf line endings so that if a repo is accessed
 # in Unix via a file share from Windows, the scripts will work.
 *.sh text eol=lf
+
+.github/workflows/*.lock.yml linguist-generated=true merge=ours
diff --git a/.github/aw/actions-lock.json b/.github/aw/actions-lock.json
@@ -5,10 +5,10 @@
       "version": "v8",
       "sha": "ed597411d8f924073f98dfc5c65a23a2325f34cd"
     },
-    "github/gh-aw/actions/setup@v0.45.4": {
+    "github/gh-aw/actions/setup@v0.58.0": {
       "repo": "github/gh-aw/actions/setup",
-      "version": "v0.45.4",
-      "sha": "ac090214a48a1938f7abafe132460b66752261af"
+      "version": "v0.58.0",
+      "sha": "cb7966564184443e601bd6135d5fbb534300070e"
     }
   }
 }
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,188 @@
+---
+description: "Guidance for GitHub Copilot when working on ML.NET (dotnet/machinelearning). Use for any task in this repo: code changes, test writing, PR reviews, issue investigation, build troubleshooting, or documentation."
+---
+
+# ML.NET Development Guide
+
+## Repository Overview
+
+ML.NET is a cross-platform, open-source machine learning framework for .NET. It provides APIs for training, evaluating, and deploying ML models across classification, regression, clustering, ranking, anomaly detection, time series, recommendation, and generative AI (LLaMA, Phi, Mistral via TorchSharp).
+
+### Key Technologies
+
+- .NET SDK 10.0.100 (see `global.json`)
+- Build system: Microsoft Arcade SDK (`eng/common/`)
+- Test framework: xUnit (with `AwesomeAssertions`, `Xunit.Combinatorial`)
+- Native dependencies: MKL, OpenMP, libmf, oneDNN
+- Major dependencies: TorchSharp, ONNX Runtime, TensorFlow, LightGBM, Semantic Kernel
+- Central package management: `Directory.Packages.props`
+
+## Build & Test
+
+### Build
+
+```bash
+# Linux/macOS
+./build.sh
+
+# Windows
+build.cmd
+
+# Build specific project
+dotnet build src/Microsoft.ML.Core/Microsoft.ML.Core.csproj
+```
+
+The repo uses Arcade SDK. `build.sh`/`build.cmd` wraps `eng/common/build.sh`/`eng/common/build.ps1` with `--restore --build`. On Linux, native dependencies require `eng/common/native/install-dependencies.sh`.
+
+### Test
+
+```bash
+# Run tests for a specific project
+dotnet test test/Microsoft.ML.Tests/Microsoft.ML.Tests.csproj
+
+# Run tests with filter
+dotnet test test/Microsoft.ML.Tests/Microsoft.ML.Tests.csproj --filter "FullyQualifiedName~ClassName.MethodName"
+
+# Run all tests (slow, prefer specific projects)
+dotnet test Microsoft.ML.sln
+```
+
+Test projects multi-target `net8.0;net48;net9.0` on Windows, `net8.0` only on Linux/macOS/arm64.
+
+### Format
+
+```bash
+dotnet format Microsoft.ML.sln --no-restore
+```
+
+The repo has `.editorconfig` and `EnforceCodeStyleInBuild=true`.
+
+## Project Structure
+
+```
+src/
+├── Microsoft.ML.Core/              # Core types, contracts, host environment
+├── Microsoft.ML.Data/              # Data pipeline, DataView, schema
+├── Microsoft.ML/                   # MLContext, public API surface
+├── Microsoft.ML.StandardTrainers/  # Built-in trainers (logistic regression, SVM, etc.)
+├── Microsoft.ML.Transforms/        # Data transforms (normalize, featurize, etc.)
+├── Microsoft.ML.AutoML/            # Automated ML pipeline selection
+├── Microsoft.ML.FastTree/          # Tree-based trainers
+├── Microsoft.ML.LightGbm/          # LightGBM integration
+├── Microsoft.ML.Recommender/       # Matrix factorization recommenders
+├── Microsoft.ML.TimeSeries/        # Time series analysis
+├── Microsoft.ML.Tokenizers/        # BPE/WordPiece/SentencePiece tokenizers
+├── Microsoft.ML.GenAI.Core/        # GenAI base types (CausalLM pipeline)
+├── Microsoft.ML.GenAI.LLaMA/       # LLaMA model support
+├── Microsoft.ML.GenAI.Phi/         # Phi model support
+├── Microsoft.ML.GenAI.Mistral/     # Mistral model support
+├── Microsoft.ML.TorchSharp/        # TorchSharp-based trainers
+├── Microsoft.ML.OnnxTransformer/   # ONNX model inference
+├── Microsoft.ML.TensorFlow/        # TensorFlow model inference
+├── Microsoft.ML.Vision/            # Image classification
+├── Microsoft.ML.ImageAnalytics/    # Image transforms
+├── Microsoft.ML.CpuMath/           # SIMD-optimized math operations
+├── Microsoft.Data.Analysis/        # DataFrame API
+├── Native/                         # C/C++ native library sources
+└── Common/                         # Shared internal code
+test/
+├── Microsoft.ML.TestFramework/      # Base test classes and helpers
+├── Microsoft.ML.TestFrameworkCommon/ # Shared test utilities
+├── Microsoft.ML.Tests/              # Main functional tests
+├── Microsoft.ML.Core.Tests/         # Core unit tests
+├── Microsoft.ML.IntegrationTests/   # End-to-end integration tests
+├── Microsoft.ML.Tokenizers.Tests/   # Tokenizer tests
+├── Microsoft.ML.GenAI.*.Tests/      # GenAI component tests
+└── ... (30+ test projects)
+```
+
+## Conventions
+
+### Code Style
+
+Every `.cs` file starts with the 3-line .NET Foundation MIT license header. This is enforced across the codebase and must not be omitted.
+
+Namespaces match assembly name (`Microsoft.ML`, `Microsoft.ML.Data`, `Microsoft.ML.Trainers`). Order usings as `System.*` first, then `Microsoft.*`, then others.
+
+Use `[BestFriend]` attribute for internal members shared across assemblies. The repo has many assemblies that need to share types without making them public; `[BestFriend]` provides controlled cross-assembly visibility for this.
+
+Use `Contracts.Check*` / `Contracts.Except*` for argument and state validation rather than raw `throw` statements. This ensures consistent error messages and lets the ML.NET host environment intercept validation failures.
+
+XML docs with `<summary>` tags are required on all public types and members.
+
+When editing an existing file, match its style even if it differs from general guidelines. Consistency within a file matters more than global uniformity.
+
+Follow [dotnet/runtime coding-style](https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/coding-style.md).
+
+### Test Conventions
+
+Framework: xUnit (`[Fact]`, `[Theory]`, `[InlineData]`).
+
+Inherit from `TestDataPipeBase` (for data pipeline tests) or `BaseTestClass` (for simpler tests). Both provide `ITestOutputHelper`, test data paths, and locale pinning to `en-US`.
+
+```csharp
+public class MyFeatureTests : TestDataPipeBase
+{
+    public MyFeatureTests(ITestOutputHelper output) : base(output) { }
+
+    [Fact]
+    public void MyFeatureBasicTest()
+    {
+        // ...
+    }
+}
+```
+
+Name test classes as `{Feature}Tests`, test methods as PascalCase descriptive names (e.g., `RandomizedPcaTrainerBaselineTest`). Do not use `Test_` prefixes or `_Should_` patterns.
+
+Use `Assert.*` (xUnit) or `AwesomeAssertions` for fluent assertions. Do not use `Assert.That` (NUnit style).
+
+Test data: use `Microsoft.ML.TestDatabases` package or files in `test/data/`, referenced via `GetDataPath("filename")` from the base class. Baseline output comparison uses files in `test/BaselineOutput/`. Update baselines carefully since they are the source of truth for output format stability.
+
+Gotchas: the base class pins locale to `en-US` (don't override). `AllowUnsafeBlocks` is enabled in test projects for native interop testing. XML doc warnings (CS1573, CS1591, CS1712) are suppressed in test code.
+
+### Architecture
+
+`MLContext` is the main entry point, exposing catalogs for each ML task (classification, regression, etc.).
+
+Data flows through `IDataView`, a lazy, columnar, cursor-based data pipeline. This design avoids loading entire datasets into memory, which matters for ML workloads.
+
+Trainers implement the `IEstimator<T>` to `ITransformer` pattern: call `Fit()` to train, then `Transform()` to apply. New trainers go in their own project under `src/`. New test projects mirror source naming: `Microsoft.ML.Foo` to `Microsoft.ML.Foo.Tests`.
+
+## Git Workflow
+
+- Default branch: `main`
+- Never commit directly to `main`, always create a feature branch
+- Branch naming: `feature/description`, `fix/description`
+- PRs are squash-merged
+- Reference a filed issue in PR description
+- Address review feedback in additional commits (don't amend/force-push)
+- Use `git rebase` for conflict resolution, not merge commits
+
+## CI
+
+Primary CI: Azure DevOps Pipelines (`build/vsts-ci.yml`), the official signed build. Builds run on Windows, Linux (Ubuntu 22.04), and macOS, covering both managed (.NET) and native components. Code coverage uses `coverlet.collector`. A custom internal Roslyn analyzer (`Microsoft.ML.InternalCodeAnalyzer`) runs on all test projects.
+
+## AI Infrastructure
+
+### Workflows
+
+GitHub Actions in `.github/workflows/`:
+
+| Workflow | Trigger | Purpose |
+|----------|---------|---------|
+| `copilot-setup-steps.yml` | Manual | Remote Copilot Coding Agent build environment |
+| `find-similar-issues.yml` | Issue opened | AI-powered duplicate detection for new issues |
+| `inclusive-heat-sensor.yml` | Comments | Detect heated language in issue/PR comments |
+
+### Prompts
+
+Reusable prompt templates in `.github/prompts/`:
+
+| Prompt | Purpose |
+|--------|---------|
+| `release-notes.prompt.md` | Generate classified release notes between commits |
+
+### Issue Triage
+
+For issue triage workflows (automated milestone assignment, priority labeling, investigation), use [GitHub Agentic Workflows](https://github.github.com/gh-aw/). Define triage automation as natural-language workflow files rather than custom scripts.
diff --git a/.github/prompts/release-notes.prompt.md b/.github/prompts/release-notes.prompt.md
@@ -0,0 +1,20 @@
+# ML.NET Release Notes
+
+Generate classified release notes between two commits.
+
+## Categories
+
+1. **Product** — Bug fixes, features, improvements
+2. **Dependencies** — Package/SDK updates
+3. **Testing** — Test changes and infrastructure
+4. **Documentation** — Docs, samples
+5. **Housekeeping** — Build, CI, cleanup
+
+## Process
+
+```bash
+# Get commits between two points
+git log --pretty=format:"%h - %s (%an)" BRANCH1..BRANCH2 > commits.txt
+```
+
+Classify each commit. When uncertain, default to Housekeeping. Group related commits. Flag breaking changes with ⚠️.
diff --git a/.github/workflows/find-similar-issues.yml b/.github/workflows/find-similar-issues.yml
@@ -0,0 +1,92 @@
+name: "Find Similar Issues with AI"
+
+on:
+  issues:
+    types: [opened]
+
+permissions:
+  contents: read
+  issues: write
+  models: read
+
+jobs:
+  find-similar-issues:
+    runs-on: ubuntu-latest
+    if: github.event_name == 'issues'
+    steps:
+      - uses: actions/setup-node@v4
+        with:
+          node-version: '20'
+
+      - run: npm init -y && npm install @octokit/rest
+
+      - name: Find and post similar issues
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          ISSUE_NUMBER: ${{ github.event.issue.number }}
+          ISSUE_TITLE: ${{ github.event.issue.title }}
+          ISSUE_BODY: ${{ github.event.issue.body }}
+        run: |
+          node << 'SCRIPT'
+          const { Octokit } = require("@octokit/rest");
+          const fs = require('fs');
+          const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });
+          const endpoint = "https://models.inference.ai.azure.com";
+          const model = "gpt-4o-mini";
+          const token = process.env.GITHUB_TOKEN;
+          const issueNum = parseInt(process.env.ISSUE_NUMBER);
+          const title = process.env.ISSUE_TITLE;
+          const body = process.env.ISSUE_BODY || '';
+          const [owner, repo] = process.env.GITHUB_REPOSITORY.split('/');
+
+          function extractWords(text) {
+            const stop = new Set(['the','and','for','with','this','that','from','have','not','are','was','will','can','when','what','how','use','does','issue','error','work']);
+            return [...new Set(text.replace(/```[\s\S]*?```/g,'').replace(/https?:\/\/\S+/g,'').replace(/[^a-z0-9\s]/gi,' ').toLowerCase().split(/\s+/).filter(w=>w.length>3&&!stop.has(w)))];
+          }
+          function jaccard(a,b) { const i=a.filter(w=>b.includes(w)); const u=[...new Set([...a,...b])]; return u.length?i.length/u.length:0; }
+
+          (async()=>{
+            const issues=[];
+            for(let p=1;p<=10;p++){
+              const r=await octokit.issues.listForRepo({owner,repo,state:'all',per_page:100,page:p,sort:'updated',direction:'desc'});
+              if(!r.data.length)break;
+              issues.push(...r.data.filter(i=>i.number!==issueNum&&!i.pull_request));
+            }
+            const words=extractWords(`${title}\n${body}`);
+            const candidates=issues.map(i=>({issue:i,score:jaccard(words,extractWords(`${i.title}\n${i.body||''}`))}))
+              .filter(c=>c.score>0.1).sort((a,b)=>b.score-a.score).slice(0,30);
+
+            const results=[];
+            for(const{issue}of candidates){
+              try{
+                const r=await fetch(`${endpoint}/chat/completions`,{method:"POST",headers:{"Content-Type":"application/json","Authorization":`Bearer ${token}`},
+                  body:JSON.stringify({model,temperature:0.3,max_tokens:150,messages:[
+                    {role:"system",content:'Analyze GitHub issue similarity. Return JSON only: {"score":0.0,"reason":"brief"}'},
+                    {role:"user",content:`Current:\nTitle: ${title}\nBody: ${body}\n\nCompare:\nTitle: ${issue.title}\nBody: ${issue.body||'None'}`}
+                  ]})});
+                const d=await r.json();
+                if(!d.choices?.[0])continue;
+                const parsed=JSON.parse(d.choices[0].message.content.trim().replace(/^```json?\s*/gm,'').replace(/```$/gm,''));
+                if(parsed.score>=0.6) results.push({number:issue.number,title:issue.title,state:issue.state,url:issue.html_url,score:parsed.score,reason:parsed.reason,labels:issue.labels.map(l=>l.name)});
+                await new Promise(r=>setTimeout(r,100));
+              }catch(e){console.error(`#${issue.number}:`,e.message)}
+            }
+            results.sort((a,b)=>b.score-a.score);
+            const top=results.slice(0,5);
+
+            let comment='';
+            if(top.length){
+              comment=`## 🔍 Similar Issues Found\n\n`;
+              top.forEach((s,i)=>{
+                comment+=`<details><summary><strong>${i+1}. <a href="${s.url}">#${s.number}</a>: ${s.title}</strong> (${Math.round(s.score*100)}%)</summary>\n\n`;
+                comment+=`**State:** ${s.state==='open'?'🟢 Open':'🔴 Closed'}  \n**Labels:** ${s.labels.slice(0,5).map(l=>'`'+l+'`').join(', ')||'None'}\n`;
+                if(s.reason) comment+=`**Why:** ${s.reason}\n`;
+                comment+=`</details>\n\n`;
+              });
+              comment+=`---\n*AI-powered similar issue detection*`;
+            } else {
+              comment=`## 🔍 No similar issues found with high confidence.\n\n---\n*AI-powered similar issue detection*`;
+            }
+            await octokit.issues.createComment({owner,repo,issue_number:issueNum,body:comment});
+          })();
+          SCRIPT
diff --git a/.github/workflows/inclusive-heat-sensor.yml b/.github/workflows/inclusive-heat-sensor.yml
@@ -0,0 +1,21 @@
+name: Inclusive Heat Sensor
+on:
+  issues:
+    types: [opened, reopened]
+  issue_comment:
+    types: [created, edited]
+  pull_request_review_comment:
+    types: [created, edited]
+
+permissions:
+  contents: read
+  issues: write
+  pull-requests: write
+
+jobs:
+  detect-heat:
+    uses: jonathanpeppers/inclusive-heat-sensor/.github/workflows/comments.yml@v0.1.2
+    with:
+      minimizeComment: true
+      offensiveThreshold: 9
+      angerThreshold: 9
diff --git a/.github/workflows/issue-triage.agent.lock.yml b/.github/workflows/issue-triage.agent.lock.yml
diff --git a/.github/workflows/issue-triage.agent.md b/.github/workflows/issue-triage.agent.md

Original file line number	Diff line number	Diff line change
`@@ -5,10 +5,10 @@`
`5`	`5`	`"version": "v8",`
`6`	`6`	`"sha": "ed597411d8f924073f98dfc5c65a23a2325f34cd"`
`7`	`7`	`},`
`8`		`- "github/gh-aw/actions/setup@v0.45.4": {`
	`8`	`+ "github/gh-aw/actions/setup@v0.58.0": {`
`9`	`9`	`"repo": "github/gh-aw/actions/setup",`
`10`		`- "version": "v0.45.4",`
`11`		`- "sha": "ac090214a48a1938f7abafe132460b66752261af"`
	`10`	`+ "version": "v0.58.0",`
	`11`	`+ "sha": "cb7966564184443e601bd6135d5fbb534300070e"`
`12`	`12`	`}`
`13`	`13`	`}`
`14`	`14`	`}`