AI Code Reviewer is a SaaS-style engineering intelligence platform built with Next.js, TypeScript, Supabase, GitHub OAuth, GitHub repository integration, and Groq-powered AI analysis. It started as an AI pull request reviewer and now includes full-codebase repository analysis similar in spirit to CodeRabbit, DeepCode, and SonarQube.
The platform lets a user sign in with GitHub, connect repositories, review pull requests, and run a complete engineering audit of an entire repository even when the repository has no pull requests.
- Users sign in through GitHub OAuth using Supabase Auth.
- The app fetches GitHub repositories using the authenticated provider token.
- Users can connect repositories and store them in Supabase.
- Connected repositories are used for PR diagnostics and full repository intelligence scans.
- Users can open a repository page and view pull requests.
- The app fetches pull request files from GitHub.
- Changed file patches are combined into a review context.
- Groq LLaMA generates a structured AI review covering bugs, security, performance, readability, and best practices.
The repository intelligence system does not depend on pull requests.
Users can click Analyze Repository and the platform will:
- Fetch the repository tree recursively from GitHub.
- Filter generated files, lock files, binaries, and build output.
- Download high-signal source/configuration files.
- Run static analysis rules.
- Parse TypeScript/JavaScript code with the TypeScript compiler API.
- Build an import/dependency graph.
- Detect circular dependencies and coupling hotspots.
- Chunk high-risk files for AI review.
- Generate an AI architecture audit.
- Persist scores, file analyses, reports, graph data, and findings in Supabase.
- Render a repository intelligence dashboard.
- Frontend: Next.js App Router, React, TypeScript
- Styling: Tailwind CSS
- Authentication: Supabase Auth with GitHub OAuth
- Database: Supabase Postgres with Row Level Security
- GitHub Integration: GitHub REST API
- AI: Groq SDK with
llama-3.3-70b-versatile - Charts: Recharts
- Animations: Framer Motion
- Icons: Lucide React
- Static Analysis: Custom rules plus TypeScript compiler API
ai-code-reviewer/
├── database/
│ ├── supabase.sql
│ ├── newSupabase.sql
│ └── repository_intelligence.sql
├── frontend/
│ ├── app/
│ │ ├── api/repository-scans/route.ts
│ │ ├── login/page.tsx
│ │ ├── page.tsx
│ │ ├── repository/[owner]/[repo]/page.tsx
│ │ ├── repository/[owner]/[repo]/analysis/page.tsx
│ │ └── review/[owner]/[repo]/[pr]/page.tsx
│ ├── lib/
│ │ ├── ai.ts
│ │ ├── github.ts
│ │ ├── supabase.ts
│ │ └── repository-analysis/
│ │ ├── ai-audit.ts
│ │ ├── filters.ts
│ │ ├── github-scanner.ts
│ │ ├── graph.ts
│ │ ├── pipeline.ts
│ │ ├── static-analysis.ts
│ │ └── types.ts
│ ├── package.json
│ └── tsconfig.json
└── README.md
Main authenticated dashboard. It loads the current Supabase user, syncs the user profile into the public database, fetches GitHub repositories, and lets users connect repositories.
Repository detail page. It shows repository intelligence entry points and existing pull request review workflows.
Pull request review page. It fetches changed PR files, combines patches, and sends them to the AI review function.
Full repository intelligence dashboard. It shows health score, security score, architecture score, technical debt score, risk distribution, complexity heatmap, high-risk files, graph findings, and AI-generated recommendations.
Server-side API for repository scans.
GET: loads the latest scan for a repository.POST: creates and runs a full repository scan.
The route validates the Supabase session, uses the GitHub provider token, fetches files, runs analysis, stores results, and returns the completed scan.
Fetches repository metadata, branch information, recursive Git tree data, and file blobs from GitHub.
Controls which files are analyzed. It ignores folders and files such as:
node_modules.nextdistbuildcoverage- lock files
- binary assets
- large files
- source maps
- minified files
Runs rule-based and AST-based analysis.
It detects:
- exposed secret patterns
- unsafe
evalornew Function - unsafe HTML rendering
- possible SQL injection patterns
- API routes without obvious auth checks
- debug logging
- large files
- deeply nested logic
- async effects without cleanup
- repeated collection traversal
- missing pagination
- explicit
any - complex functions
- duplicated code blocks
- missing test/lint/CI/Docker setup
Builds a repository import graph.
It detects:
- internal import edges
- circular dependencies
- over-coupled modules
- folder relationships
Sends high-signal repository chunks to Groq and asks the model to generate a senior-engineer architecture audit. If the Groq key is missing, the system falls back to a deterministic static summary so the scan pipeline can still run.
Coordinates the full analysis flow:
Repository files
↓
Static analysis
↓
Graph analysis
↓
Preliminary scoring
↓
AI architecture audit
↓
Issue merge and deduplication
↓
Final scores and dashboard data
Adds the full repository intelligence database schema:
repository_scansfile_analysesarchitecture_reportssecurity_reportstechnical_debt_reports
It also adds indexes and Row Level Security policies so users only access their own scan data.
users: mirrors authenticated Supabase users.repositories: stores connected GitHub repositories.reviews: stores PR review summaries.issues: stores PR review issues.
Stores one full repository analysis run.
Important fields:
- repository owner/name/branch
- status and progress
- commit SHA
- analyzed/skipped file counts
- total lines
- health, security, architecture, scalability, maintainability, engineering quality, and technical debt scores
- risk distribution
- complexity heatmap
- dependency graph JSON
Stores file-level intelligence.
Important fields:
- file path
- language
- purpose
- lines
- complexity
- risk score
- maintainability score
- imports
- exports
- file-specific issues
Stores AI-generated architecture review data.
Important fields:
- architecture summary
- production readiness
- priority fixes
- refactoring suggestions
- recommendations
Stores security-focused findings and recommendations.
Stores technical debt hotspots and refactoring plans.
Create frontend/.env.local:
NEXT_PUBLIC_SUPABASE_URL=your_supabase_project_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
GROQ_API_KEY=your_groq_api_keyThe previous PR review code also supports NEXT_PUBLIC_GROQ_API_KEY, but server-side GROQ_API_KEY is safer because public environment variables can be exposed to the browser.
- Install dependencies:
cd frontend
npm install- Configure Supabase:
Run these SQL files in Supabase:
database/newSupabase.sql
database/repository_intelligence.sql
-
Configure GitHub OAuth in Supabase Auth.
-
Add environment variables in
frontend/.env.local. -
Start the app:
npm run dev- Open:
http://localhost:3000
Inside frontend:
npm run dev
npm run build
npm run start
npm run lintUser signs in with GitHub
↓
Supabase stores authenticated session
↓
App fetches GitHub repositories
↓
User connects a repository
↓
User opens repository page
↓
User runs PR review or full repository analysis
Analyze Repository button
↓
POST /api/repository-scans
↓
Validate Supabase user session
↓
Use GitHub provider token
↓
Upsert repository record
↓
Create repository_scans row
↓
Fetch repository tree recursively
↓
Filter files
↓
Fetch file contents
↓
Run static analysis
↓
Run TypeScript AST analysis
↓
Build dependency graph
↓
Calculate preliminary scores
↓
Chunk high-risk files
↓
Run AI architecture audit
↓
Merge static, graph, and AI findings
↓
Persist scan, file analyses, architecture report, security report, and technical debt report
↓
Render dashboard
The platform calculates:
- Repository health score
- Architecture score
- Engineering quality score
- Maintainability score
- Scalability score
- Security score
- Technical debt score
Scores are based on:
- severity of findings
- number of security issues
- circular dependency count
- average file risk
- maintainability of files
- technical debt indicators
- DevOps readiness
The static analysis engine combines:
- regex-based security and quality rules
- TypeScript compiler API AST traversal
- duplicate code block hashing
- file size and nesting heuristics
- package/config inspection
- API route authentication heuristics
This is not a replacement for a full commercial static analyzer, but it gives the app a production-oriented foundation that can be extended with ESLint, Semgrep, CodeQL, and language-specific analyzers.
The AI layer does not blindly send the entire repository at once.
It:
- prioritizes high-risk and high-signal files
- chunks files into model-sized contexts
- includes static findings as context
- asks the model for architecture, security, scalability, refactoring, and production-readiness analysis
- merges AI findings with rule-based findings
This hybrid approach is more reliable than using only AI because deterministic rules catch obvious problems and AI provides higher-level architectural reasoning.
The implementation includes:
- scan status tracking
- progress updates
- chunked repository fetching
- file filtering
- file size limits
- high-signal prioritization
- persistent scan records
- JSONB report storage
- retry-safe API structure
- fallback audit when AI key is missing
Future production improvements:
- move scan execution to a queue worker
- use Supabase Edge Functions or a separate worker service
- add Redis or Upstash for job state and caching
- cache GitHub file blobs by SHA
- add webhook-triggered rescans
- add Semgrep CLI/Cloud integration
- add ESLint execution in sandboxed workers
- add vector embeddings for semantic repository search
- add scan diffs and historical trend charts
- Supabase RLS protects user data.
- GitHub provider tokens are read from the authenticated session.
- Full scans run on the server API route.
- Public secrets are flagged by the analyzer.
NEXT_PUBLIC_*variables are treated carefully because they are browser-exposed.- Repository scan tables require ownership checks through RLS.
- The scan currently runs inside a Next.js API route. For large repositories, a real background worker or queue is better.
- Static analysis is custom and heuristic-based. Semgrep/ESLint integration can be deepened.
- The AI scan is limited by token budget and file prioritization.
- Binary and very large files are skipped.
- GitHub API rate limits still apply.
- The repository graph currently focuses on TypeScript/JavaScript-style imports.
Q1. What problem does this project solve?
It helps developers and teams review code automatically. It supports both pull request reviews and full repository audits, so users can understand code quality, security, architecture, scalability, and technical debt even without active pull requests.
Q2. How is this different from a normal AI code review app?
Most simple AI code review apps only analyze PR diffs. This platform adds full-codebase intelligence by recursively fetching repository files, running static and graph analysis, generating file-level scores, and producing a complete architecture audit.
Q3. What are the main modules of the system?
The main modules are authentication, GitHub repository integration, PR review, repository scan API, static analysis engine, dependency graph engine, AI audit engine, Supabase persistence, and the dashboard UI.
Q4. What is the end-to-end flow of a repository scan?
The user clicks Analyze Repository, the API validates the session, fetches repository files from GitHub, filters files, runs static and AST analysis, builds the import graph, sends prioritized chunks to AI, merges findings, stores reports in Supabase, and renders the dashboard.
Q5. Why did you use Next.js App Router?
Next.js App Router supports file-based routing, server APIs, dynamic routes, and modern React patterns. It fits this app because the platform needs dashboard pages, dynamic repository routes, and server-side API routes.
Q6. Which routes are dynamic?
/repository/[owner]/[repo], /repository/[owner]/[repo]/analysis, and /review/[owner]/[repo]/[pr] are dynamic routes based on GitHub repository and PR parameters.
Q7. Why is the repository scan implemented as an API route?
The scan needs server-side access to environment variables, Supabase session validation, GitHub API calls, and AI calls. Running it in an API route avoids exposing sensitive logic directly to the browser.
Q8. Why use Recharts?
Recharts provides React-friendly charts for scorecards, risk distribution, and complexity heatmap visualizations.
Q9. Why use Framer Motion?
Framer Motion adds smooth transitions to dashboard elements and improves perceived polish without complicating business logic.
Q10. How does authentication work?
The app uses Supabase Auth with GitHub OAuth. After sign-in, Supabase provides the authenticated user session and GitHub provider token.
Q11. Why store users in a public users table if Supabase already has auth.users?
The public table is useful for app-level foreign keys, profile display, and RLS relationships while auth.users remains managed by Supabase Auth.
Q12. What is RLS and why is it important here?
Row Level Security ensures users can only access their own repositories, scans, reports, and file analyses. This is critical in a multi-tenant SaaS app.
Q13. How do repository scan tables enforce ownership?
The RLS policies check that the scan or report belongs to a repository owned by the authenticated auth.uid().
Q14. What data is stored for a repository scan?
Status, progress, branch, commit SHA, file counts, line counts, scores, graph JSON, risk distribution, complexity heatmap, timestamps, and report links.
Q15. How does the app fetch repositories?
It calls GitHub’s /user/repos endpoint with the GitHub OAuth provider token.
Q16. How does the app fetch pull requests?
It calls GitHub’s /repos/{owner}/{repo}/pulls endpoint.
Q17. How does the full repository scan fetch files?
It fetches repository metadata, resolves the default branch, fetches the recursive Git tree, filters blobs, and fetches selected blob contents by SHA.
Q18. Why use the Git tree API instead of recursively calling contents endpoints?
The Git tree API can return the repository tree recursively in one request, which is more efficient than traversing folder-by-folder with many requests.
Q19. How do you handle GitHub API limits?
The current implementation limits analyzed files, skips large/generated files, and prioritizes high-signal files. A production system should also cache blobs by SHA and add retry/backoff handling.
Q20. What kinds of security issues does the static analyzer detect?
It detects public secret-like environment variables, hardcoded credentials, eval, new Function, unsafe HTML rendering, possible SQL injection patterns, and API routes without obvious auth checks.
Q21. What code quality issues does it detect?
It detects large files, deeply nested logic, debug logging, explicit any, complex functions, duplicate logic blocks, and missing return types on function declarations.
Q22. What performance issues does it detect?
It flags possible async effect cleanup problems, repeated collection traversal, and other patterns that may cause unnecessary work.
Q23. What scalability issues does it detect?
It flags missing pagination, hard-to-maintain large files, tight coupling, and repository architecture concerns.
Q24. What DevOps issues does it detect?
It checks for missing test scripts, lint scripts, GitHub Actions workflows, and Docker/runtime definitions.
Q25. Why use the TypeScript compiler API?
The compiler API lets the app parse code structurally instead of relying only on regex. This helps identify functions, return types, variable declarations, and complexity hotspots more accurately.
Q26. Why not rely only on AI for analysis?
AI can miss deterministic issues and may be inconsistent. Static analysis provides repeatable findings, while AI adds architectural reasoning. The hybrid approach is stronger.
Q27. What is repository graph analysis?
It maps files as nodes and imports as edges, then analyzes relationships between modules and folders.
Q28. How are circular dependencies detected?
The graph builder performs depth-first traversal and detects when a node appears again in the active traversal stack.
Q29. What is an over-coupled module?
A module with many imports and/or many dependents. These modules become architecture bottlenecks because many parts of the system rely on them.
Q30. Why are circular dependencies bad?
They make code harder to reason about, increase the risk of runtime initialization bugs, and often indicate poor separation of concerns.
Q31. Which AI model is used?
The app uses Groq with llama-3.3-70b-versatile.
Q32. What does the AI architecture review generate?
It generates an architecture summary, production-readiness analysis, priority fixes, refactoring suggestions, recommendations, and AI-detected issues.
Q33. How does the app avoid sending too much code to AI?
It prioritizes high-risk files, truncates file content excerpts, and chunks files into model-sized contexts.
Q34. What happens if the Groq API key is missing?
The repository scan still runs using static and graph analysis. The AI audit falls back to a deterministic summary.
Q35. How are AI findings combined with static findings?
The pipeline merges findings from static rules, graph analysis, and AI, then deduplicates them by category, severity, title, and file path.
Q36. How is the health score calculated?
It combines security, architecture, scalability, maintainability, and engineering quality scores.
Q37. How is the security score calculated?
Security findings apply penalties based on severity. Critical and high security issues reduce the score more heavily.
Q38. How is technical debt calculated?
Technical debt is based on code quality and DevOps findings plus average file risk.
Q39. What is file risk score?
It is a score from 0 to 100 based on issue severity and complexity for an individual file.
Q40. What is maintainability score?
It estimates how easy a file is to maintain based on risk, complexity, and size.
Q41. What endpoints were added?
GET /api/repository-scans loads the latest scan, and POST /api/repository-scans starts a new full repository scan.
Q42. Why does the scan API need both Supabase and GitHub tokens?
The Supabase token validates the platform user, while the GitHub token authorizes access to the user’s repositories.
Q43. How is scan progress tracked?
The API updates repository_scans.progress and repository_scans.current_stage as the scan moves through fetching, static analysis, graph analysis, AI analysis, persisting, and completion.
Q44. How does the API handle scan failures?
It catches errors, updates the scan row to failed, stores the error message, and returns an error response.
Q45. Why store reports instead of generating them every time?
Repository scans are expensive. Persisting reports allows dashboards to load quickly, supports historical trends, and avoids repeated GitHub/AI calls.
Q46. Why store graph and report data as JSONB?
Graph data and issue arrays are nested and flexible. JSONB allows fast iteration during development while still storing structured data in Postgres.
Q47. What would you normalize further in a larger system?
Issue records, graph nodes, graph edges, scan events, and trend metrics could be normalized into separate tables for querying and analytics.
Q48. What are the major security concerns in this project?
Protecting OAuth tokens, enforcing RLS, avoiding public API keys, validating API access, preventing data leaks between users, and safely handling repository source code.
Q49. Why is NEXT_PUBLIC_GROQ_API_KEY risky?
Any variable prefixed with NEXT_PUBLIC_ can be exposed to browser code. AI provider keys should generally be server-only.
Q50. How would you improve secret handling?
Use server-only environment variables, never expose AI keys to the browser, rotate keys, and store long-lived secrets in a managed secret store.
Q51. Why is running scans inside a Next.js API route limited?
Large repositories may exceed serverless timeouts or memory limits. Long-running jobs are better handled by background workers.
Q52. How would you make this production-grade for large repositories?
Move scans to a queue worker, cache GitHub blobs by SHA, process files in batches, stream progress events, add retries/backoff, and store scan events.
Q53. How would you add background jobs?
Use a queue such as BullMQ, Inngest, Trigger.dev, Supabase Edge Functions, or a custom worker service. The API would enqueue a job and return immediately.
Q54. How would you implement real-time progress?
Use Supabase Realtime, Server-Sent Events, WebSockets, or polling against the repository_scans progress fields.
Q55. How would you reduce AI cost?
Cache summaries by file SHA, analyze only changed files after the first scan, prioritize risky files, summarize chunks before final audit, and reuse previous scan context.
Q56. How would you test the GitHub scanner?
Mock GitHub API responses for repository metadata, branches, trees, and blobs. Verify filtering, prioritization, decoding, and error handling.
Q57. How would you test static analysis rules?
Use small code fixtures with known patterns and assert that the expected issues are generated.
Q58. How would you test graph cycle detection?
Pass fake file analyses with imports that form cycles and assert that the graph reports the correct circular dependencies.
Q59. How would you test the scan API?
Mock Supabase and GitHub calls, run the route handler, and verify scan rows, report rows, error states, and response payloads.
Q60. What tests are currently missing?
Unit tests for filters, static rules, graph analysis, scoring, GitHub scanner, AI parsing, and API route persistence should be added.
Q61. How would you support multiple languages?
Add language-specific parsers and analyzers, such as Python AST, Go parser, Java parser, Semgrep rules, and language-aware import graph builders.
Q62. How would you integrate Semgrep?
Run Semgrep in a sandboxed worker against fetched files or a temporary checkout, parse JSON output, and convert findings into the platform’s issue format.
Q63. How would you integrate ESLint?
Run ESLint programmatically or in a worker for JS/TS repositories, parse results, and store them as static findings.
Q64. How would you add historical trends?
Store every scan, compare score changes over time, and chart scan history by date, branch, and commit SHA.
Q65. How would you support GitHub webhooks?
Register webhooks for push and pull request events, verify signatures, enqueue repository or PR scans, and update dashboards automatically.
Q66. How would you deploy this app?
Deploy the Next.js frontend/API to Vercel or another Node-compatible host, Supabase for database/auth, and a separate worker for production scans.
Q67. What environment variables are required in production?
NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_ANON_KEY, and server-side GROQ_API_KEY.
Q68. What CI checks should run before deployment?
Install dependencies, lint, typecheck, unit tests, integration tests, and production build.
Q69. Why use custom static analysis instead of only Semgrep/SonarQube?
Custom analysis keeps the product self-contained and tailored to repository intelligence. Semgrep/SonarQube can be added later for deeper coverage.
Q70. Why prioritize high-signal files instead of scanning everything with AI?
AI context windows and cost are limited. Prioritization gives useful results faster and avoids wasting tokens on generated or low-value files.
Q71. What is the biggest technical risk in this project?
Long-running scans inside API routes. Moving scans to background workers is the most important production scalability improvement.
Q72. What is the biggest security risk?
Improper token handling or weak authorization around repository scan data. Strong RLS and server-only secrets are essential.
Q73. What is the biggest product challenge?
Reducing false positives while still surfacing meaningful architectural and security findings.
Q74. How would you explain this project in one minute?
It is an AI engineering intelligence platform for GitHub repositories. Users sign in with GitHub, connect repositories, review pull requests, and run full-codebase audits. The system combines static analysis, TypeScript AST parsing, dependency graph analysis, and AI reasoning to produce repository health scores, file-level risk analysis, architecture reviews, security findings, technical debt reports, and actionable recommendations.
Q75. What future features would you add?
Background workers, scan history, webhook-triggered scans, Semgrep integration, ESLint execution, embeddings-based repository Q&A, PR comments, GitHub Checks integration, team dashboards, organization-level reporting, and auto-generated refactor plans.
The current implementation has been checked with:
npm run lint
npm run buildLint passes with existing image optimization warnings for raw <img> usage. The production build passes.