Skip to content

Commit 9d809f1

Browse files
JanKrivanekCopilotkotlarmilos
authored
PoC of repository ai bootstrap (#7585)
* initiate ai * Update .github/workflows/inclusive-heat-sensor.yml Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add issue triage workflow for labeling and commenting on new issues (#7592) This workflow automates the process of analyzing newly opened issues, applying appropriate labels, and providing comments based on the issue content. It includes definitions for various labels, instructions for triaging, and hints for relevant project areas. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Milos Kotlar <kotlarmilos@gmail.com>
1 parent d25ef12 commit 9d809f1

8 files changed

Lines changed: 1557 additions & 3 deletions

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,5 @@
66
# Force bash scripts to always use lf line endings so that if a repo is accessed
77
# in Unix via a file share from Windows, the scripts will work.
88
*.sh text eol=lf
9+
10+
.github/workflows/*.lock.yml linguist-generated=true merge=ours

.github/aw/actions-lock.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@
55
"version": "v8",
66
"sha": "ed597411d8f924073f98dfc5c65a23a2325f34cd"
77
},
8-
"github/gh-aw/actions/setup@v0.45.4": {
8+
"github/gh-aw/actions/setup@v0.58.0": {
99
"repo": "github/gh-aw/actions/setup",
10-
"version": "v0.45.4",
11-
"sha": "ac090214a48a1938f7abafe132460b66752261af"
10+
"version": "v0.58.0",
11+
"sha": "cb7966564184443e601bd6135d5fbb534300070e"
1212
}
1313
}
1414
}

.github/copilot-instructions.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
---
2+
description: "Guidance for GitHub Copilot when working on ML.NET (dotnet/machinelearning). Use for any task in this repo: code changes, test writing, PR reviews, issue investigation, build troubleshooting, or documentation."
3+
---
4+
5+
# ML.NET Development Guide
6+
7+
## Repository Overview
8+
9+
ML.NET is a cross-platform, open-source machine learning framework for .NET. It provides APIs for training, evaluating, and deploying ML models across classification, regression, clustering, ranking, anomaly detection, time series, recommendation, and generative AI (LLaMA, Phi, Mistral via TorchSharp).
10+
11+
### Key Technologies
12+
13+
- .NET SDK 10.0.100 (see `global.json`)
14+
- Build system: Microsoft Arcade SDK (`eng/common/`)
15+
- Test framework: xUnit (with `AwesomeAssertions`, `Xunit.Combinatorial`)
16+
- Native dependencies: MKL, OpenMP, libmf, oneDNN
17+
- Major dependencies: TorchSharp, ONNX Runtime, TensorFlow, LightGBM, Semantic Kernel
18+
- Central package management: `Directory.Packages.props`
19+
20+
## Build & Test
21+
22+
### Build
23+
24+
```bash
25+
# Linux/macOS
26+
./build.sh
27+
28+
# Windows
29+
build.cmd
30+
31+
# Build specific project
32+
dotnet build src/Microsoft.ML.Core/Microsoft.ML.Core.csproj
33+
```
34+
35+
The repo uses Arcade SDK. `build.sh`/`build.cmd` wraps `eng/common/build.sh`/`eng/common/build.ps1` with `--restore --build`. On Linux, native dependencies require `eng/common/native/install-dependencies.sh`.
36+
37+
### Test
38+
39+
```bash
40+
# Run tests for a specific project
41+
dotnet test test/Microsoft.ML.Tests/Microsoft.ML.Tests.csproj
42+
43+
# Run tests with filter
44+
dotnet test test/Microsoft.ML.Tests/Microsoft.ML.Tests.csproj --filter "FullyQualifiedName~ClassName.MethodName"
45+
46+
# Run all tests (slow, prefer specific projects)
47+
dotnet test Microsoft.ML.sln
48+
```
49+
50+
Test projects multi-target `net8.0;net48;net9.0` on Windows, `net8.0` only on Linux/macOS/arm64.
51+
52+
### Format
53+
54+
```bash
55+
dotnet format Microsoft.ML.sln --no-restore
56+
```
57+
58+
The repo has `.editorconfig` and `EnforceCodeStyleInBuild=true`.
59+
60+
## Project Structure
61+
62+
```
63+
src/
64+
├── Microsoft.ML.Core/ # Core types, contracts, host environment
65+
├── Microsoft.ML.Data/ # Data pipeline, DataView, schema
66+
├── Microsoft.ML/ # MLContext, public API surface
67+
├── Microsoft.ML.StandardTrainers/ # Built-in trainers (logistic regression, SVM, etc.)
68+
├── Microsoft.ML.Transforms/ # Data transforms (normalize, featurize, etc.)
69+
├── Microsoft.ML.AutoML/ # Automated ML pipeline selection
70+
├── Microsoft.ML.FastTree/ # Tree-based trainers
71+
├── Microsoft.ML.LightGbm/ # LightGBM integration
72+
├── Microsoft.ML.Recommender/ # Matrix factorization recommenders
73+
├── Microsoft.ML.TimeSeries/ # Time series analysis
74+
├── Microsoft.ML.Tokenizers/ # BPE/WordPiece/SentencePiece tokenizers
75+
├── Microsoft.ML.GenAI.Core/ # GenAI base types (CausalLM pipeline)
76+
├── Microsoft.ML.GenAI.LLaMA/ # LLaMA model support
77+
├── Microsoft.ML.GenAI.Phi/ # Phi model support
78+
├── Microsoft.ML.GenAI.Mistral/ # Mistral model support
79+
├── Microsoft.ML.TorchSharp/ # TorchSharp-based trainers
80+
├── Microsoft.ML.OnnxTransformer/ # ONNX model inference
81+
├── Microsoft.ML.TensorFlow/ # TensorFlow model inference
82+
├── Microsoft.ML.Vision/ # Image classification
83+
├── Microsoft.ML.ImageAnalytics/ # Image transforms
84+
├── Microsoft.ML.CpuMath/ # SIMD-optimized math operations
85+
├── Microsoft.Data.Analysis/ # DataFrame API
86+
├── Native/ # C/C++ native library sources
87+
└── Common/ # Shared internal code
88+
test/
89+
├── Microsoft.ML.TestFramework/ # Base test classes and helpers
90+
├── Microsoft.ML.TestFrameworkCommon/ # Shared test utilities
91+
├── Microsoft.ML.Tests/ # Main functional tests
92+
├── Microsoft.ML.Core.Tests/ # Core unit tests
93+
├── Microsoft.ML.IntegrationTests/ # End-to-end integration tests
94+
├── Microsoft.ML.Tokenizers.Tests/ # Tokenizer tests
95+
├── Microsoft.ML.GenAI.*.Tests/ # GenAI component tests
96+
└── ... (30+ test projects)
97+
```
98+
99+
## Conventions
100+
101+
### Code Style
102+
103+
Every `.cs` file starts with the 3-line .NET Foundation MIT license header. This is enforced across the codebase and must not be omitted.
104+
105+
Namespaces match assembly name (`Microsoft.ML`, `Microsoft.ML.Data`, `Microsoft.ML.Trainers`). Order usings as `System.*` first, then `Microsoft.*`, then others.
106+
107+
Use `[BestFriend]` attribute for internal members shared across assemblies. The repo has many assemblies that need to share types without making them public; `[BestFriend]` provides controlled cross-assembly visibility for this.
108+
109+
Use `Contracts.Check*` / `Contracts.Except*` for argument and state validation rather than raw `throw` statements. This ensures consistent error messages and lets the ML.NET host environment intercept validation failures.
110+
111+
XML docs with `<summary>` tags are required on all public types and members.
112+
113+
When editing an existing file, match its style even if it differs from general guidelines. Consistency within a file matters more than global uniformity.
114+
115+
Follow [dotnet/runtime coding-style](https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/coding-style.md).
116+
117+
### Test Conventions
118+
119+
Framework: xUnit (`[Fact]`, `[Theory]`, `[InlineData]`).
120+
121+
Inherit from `TestDataPipeBase` (for data pipeline tests) or `BaseTestClass` (for simpler tests). Both provide `ITestOutputHelper`, test data paths, and locale pinning to `en-US`.
122+
123+
```csharp
124+
public class MyFeatureTests : TestDataPipeBase
125+
{
126+
public MyFeatureTests(ITestOutputHelper output) : base(output) { }
127+
128+
[Fact]
129+
public void MyFeatureBasicTest()
130+
{
131+
// ...
132+
}
133+
}
134+
```
135+
136+
Name test classes as `{Feature}Tests`, test methods as PascalCase descriptive names (e.g., `RandomizedPcaTrainerBaselineTest`). Do not use `Test_` prefixes or `_Should_` patterns.
137+
138+
Use `Assert.*` (xUnit) or `AwesomeAssertions` for fluent assertions. Do not use `Assert.That` (NUnit style).
139+
140+
Test data: use `Microsoft.ML.TestDatabases` package or files in `test/data/`, referenced via `GetDataPath("filename")` from the base class. Baseline output comparison uses files in `test/BaselineOutput/`. Update baselines carefully since they are the source of truth for output format stability.
141+
142+
Gotchas: the base class pins locale to `en-US` (don't override). `AllowUnsafeBlocks` is enabled in test projects for native interop testing. XML doc warnings (CS1573, CS1591, CS1712) are suppressed in test code.
143+
144+
### Architecture
145+
146+
`MLContext` is the main entry point, exposing catalogs for each ML task (classification, regression, etc.).
147+
148+
Data flows through `IDataView`, a lazy, columnar, cursor-based data pipeline. This design avoids loading entire datasets into memory, which matters for ML workloads.
149+
150+
Trainers implement the `IEstimator<T>` to `ITransformer` pattern: call `Fit()` to train, then `Transform()` to apply. New trainers go in their own project under `src/`. New test projects mirror source naming: `Microsoft.ML.Foo` to `Microsoft.ML.Foo.Tests`.
151+
152+
## Git Workflow
153+
154+
- Default branch: `main`
155+
- Never commit directly to `main`, always create a feature branch
156+
- Branch naming: `feature/description`, `fix/description`
157+
- PRs are squash-merged
158+
- Reference a filed issue in PR description
159+
- Address review feedback in additional commits (don't amend/force-push)
160+
- Use `git rebase` for conflict resolution, not merge commits
161+
162+
## CI
163+
164+
Primary CI: Azure DevOps Pipelines (`build/vsts-ci.yml`), the official signed build. Builds run on Windows, Linux (Ubuntu 22.04), and macOS, covering both managed (.NET) and native components. Code coverage uses `coverlet.collector`. A custom internal Roslyn analyzer (`Microsoft.ML.InternalCodeAnalyzer`) runs on all test projects.
165+
166+
## AI Infrastructure
167+
168+
### Workflows
169+
170+
GitHub Actions in `.github/workflows/`:
171+
172+
| Workflow | Trigger | Purpose |
173+
|----------|---------|---------|
174+
| `copilot-setup-steps.yml` | Manual | Remote Copilot Coding Agent build environment |
175+
| `find-similar-issues.yml` | Issue opened | AI-powered duplicate detection for new issues |
176+
| `inclusive-heat-sensor.yml` | Comments | Detect heated language in issue/PR comments |
177+
178+
### Prompts
179+
180+
Reusable prompt templates in `.github/prompts/`:
181+
182+
| Prompt | Purpose |
183+
|--------|---------|
184+
| `release-notes.prompt.md` | Generate classified release notes between commits |
185+
186+
### Issue Triage
187+
188+
For issue triage workflows (automated milestone assignment, priority labeling, investigation), use [GitHub Agentic Workflows](https://github.github.com/gh-aw/). Define triage automation as natural-language workflow files rather than custom scripts.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# ML.NET Release Notes
2+
3+
Generate classified release notes between two commits.
4+
5+
## Categories
6+
7+
1. **Product** — Bug fixes, features, improvements
8+
2. **Dependencies** — Package/SDK updates
9+
3. **Testing** — Test changes and infrastructure
10+
4. **Documentation** — Docs, samples
11+
5. **Housekeeping** — Build, CI, cleanup
12+
13+
## Process
14+
15+
```bash
16+
# Get commits between two points
17+
git log --pretty=format:"%h - %s (%an)" BRANCH1..BRANCH2 > commits.txt
18+
```
19+
20+
Classify each commit. When uncertain, default to Housekeeping. Group related commits. Flag breaking changes with ⚠️.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
name: "Find Similar Issues with AI"
2+
3+
on:
4+
issues:
5+
types: [opened]
6+
7+
permissions:
8+
contents: read
9+
issues: write
10+
models: read
11+
12+
jobs:
13+
find-similar-issues:
14+
runs-on: ubuntu-latest
15+
if: github.event_name == 'issues'
16+
steps:
17+
- uses: actions/setup-node@v4
18+
with:
19+
node-version: '20'
20+
21+
- run: npm init -y && npm install @octokit/rest
22+
23+
- name: Find and post similar issues
24+
env:
25+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
26+
ISSUE_NUMBER: ${{ github.event.issue.number }}
27+
ISSUE_TITLE: ${{ github.event.issue.title }}
28+
ISSUE_BODY: ${{ github.event.issue.body }}
29+
run: |
30+
node << 'SCRIPT'
31+
const { Octokit } = require("@octokit/rest");
32+
const fs = require('fs');
33+
const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });
34+
const endpoint = "https://models.inference.ai.azure.com";
35+
const model = "gpt-4o-mini";
36+
const token = process.env.GITHUB_TOKEN;
37+
const issueNum = parseInt(process.env.ISSUE_NUMBER);
38+
const title = process.env.ISSUE_TITLE;
39+
const body = process.env.ISSUE_BODY || '';
40+
const [owner, repo] = process.env.GITHUB_REPOSITORY.split('/');
41+
42+
function extractWords(text) {
43+
const stop = new Set(['the','and','for','with','this','that','from','have','not','are','was','will','can','when','what','how','use','does','issue','error','work']);
44+
return [...new Set(text.replace(/```[\s\S]*?```/g,'').replace(/https?:\/\/\S+/g,'').replace(/[^a-z0-9\s]/gi,' ').toLowerCase().split(/\s+/).filter(w=>w.length>3&&!stop.has(w)))];
45+
}
46+
function jaccard(a,b) { const i=a.filter(w=>b.includes(w)); const u=[...new Set([...a,...b])]; return u.length?i.length/u.length:0; }
47+
48+
(async()=>{
49+
const issues=[];
50+
for(let p=1;p<=10;p++){
51+
const r=await octokit.issues.listForRepo({owner,repo,state:'all',per_page:100,page:p,sort:'updated',direction:'desc'});
52+
if(!r.data.length)break;
53+
issues.push(...r.data.filter(i=>i.number!==issueNum&&!i.pull_request));
54+
}
55+
const words=extractWords(`${title}\n${body}`);
56+
const candidates=issues.map(i=>({issue:i,score:jaccard(words,extractWords(`${i.title}\n${i.body||''}`))}))
57+
.filter(c=>c.score>0.1).sort((a,b)=>b.score-a.score).slice(0,30);
58+
59+
const results=[];
60+
for(const{issue}of candidates){
61+
try{
62+
const r=await fetch(`${endpoint}/chat/completions`,{method:"POST",headers:{"Content-Type":"application/json","Authorization":`Bearer ${token}`},
63+
body:JSON.stringify({model,temperature:0.3,max_tokens:150,messages:[
64+
{role:"system",content:'Analyze GitHub issue similarity. Return JSON only: {"score":0.0,"reason":"brief"}'},
65+
{role:"user",content:`Current:\nTitle: ${title}\nBody: ${body}\n\nCompare:\nTitle: ${issue.title}\nBody: ${issue.body||'None'}`}
66+
]})});
67+
const d=await r.json();
68+
if(!d.choices?.[0])continue;
69+
const parsed=JSON.parse(d.choices[0].message.content.trim().replace(/^```json?\s*/gm,'').replace(/```$/gm,''));
70+
if(parsed.score>=0.6) results.push({number:issue.number,title:issue.title,state:issue.state,url:issue.html_url,score:parsed.score,reason:parsed.reason,labels:issue.labels.map(l=>l.name)});
71+
await new Promise(r=>setTimeout(r,100));
72+
}catch(e){console.error(`#${issue.number}:`,e.message)}
73+
}
74+
results.sort((a,b)=>b.score-a.score);
75+
const top=results.slice(0,5);
76+
77+
let comment='';
78+
if(top.length){
79+
comment=`## 🔍 Similar Issues Found\n\n`;
80+
top.forEach((s,i)=>{
81+
comment+=`<details><summary><strong>${i+1}. <a href="${s.url}">#${s.number}</a>: ${s.title}</strong> (${Math.round(s.score*100)}%)</summary>\n\n`;
82+
comment+=`**State:** ${s.state==='open'?'🟢 Open':'🔴 Closed'} \n**Labels:** ${s.labels.slice(0,5).map(l=>'`'+l+'`').join(', ')||'None'}\n`;
83+
if(s.reason) comment+=`**Why:** ${s.reason}\n`;
84+
comment+=`</details>\n\n`;
85+
});
86+
comment+=`---\n*AI-powered similar issue detection*`;
87+
} else {
88+
comment=`## 🔍 No similar issues found with high confidence.\n\n---\n*AI-powered similar issue detection*`;
89+
}
90+
await octokit.issues.createComment({owner,repo,issue_number:issueNum,body:comment});
91+
})();
92+
SCRIPT
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: Inclusive Heat Sensor
2+
on:
3+
issues:
4+
types: [opened, reopened]
5+
issue_comment:
6+
types: [created, edited]
7+
pull_request_review_comment:
8+
types: [created, edited]
9+
10+
permissions:
11+
contents: read
12+
issues: write
13+
pull-requests: write
14+
15+
jobs:
16+
detect-heat:
17+
uses: jonathanpeppers/inclusive-heat-sensor/.github/workflows/comments.yml@v0.1.2
18+
with:
19+
minimizeComment: true
20+
offensiveThreshold: 9
21+
angerThreshold: 9

0 commit comments

Comments
 (0)