Skip to content

kareemsalama206/CodeAlign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeAlign

Broken Code Indentation Repair Tool

CodeAlign repairs common indentation damage in pasted code. It is built for code that was copied from chat, email, PDFs, docs, terminals, or broken snippets where leading whitespace was lost or mixed.

This is not a normal formatter or beautifier. CodeAlign focuses on indentation repair and keeps code tokens intact wherever possible.

Why This Exists

Badly pasted code often loses structure before it ever reaches a formatter. Formatters expect syntactically valid source, but damaged indentation can be semantic, especially in Python. CodeAlign provides a practical repair pass with warnings, confidence scoring, changed-line counts, and before/after review.

Key Features

  • Paste broken or badly copied code
  • Automatically detect language and indentation style
  • Repair indentation through a FastAPI backend
  • Review original code, repaired code, and side-by-side changed lines
  • See repair confidence, language/indentation detection confidence, warnings, summary items, line counts, and ambiguous block count
  • Copy repaired code
  • Download repaired code as a file
  • Try built-in examples for Python, JavaScript, HTML, valid JSON, and invalid JSON
  • Handles large code blocks, including 3,000+ lines
  • No login, payment, AI, database, accounts, or cloud storage

Supported Languages

  • Python
  • JavaScript
  • TypeScript
  • Java
  • C
  • C++
  • C#
  • HTML
  • XML
  • JSON

Repair Strategy

CodeAlign uses deterministic heuristics only. It does not use AI, does not execute submitted code, and does not store submitted code.

Language detection looks for concrete syntax signals such as Python block openers, JavaScript/TypeScript declarations, Java and C# class/runtime calls, C/C++ include and namespace usage, HTML/XML tags, and parseable JSON.

Indentation detection samples existing leading whitespace. Tabs are preserved when most indented lines start with tabs. Space indentation is inferred from common leading-space multiples. When indentation cannot be inferred, CodeAlign defaults conservatively: 4 spaces for Python, 2 spaces for JSON/HTML/JavaScript/TypeScript, and 4 spaces for other brace-style languages.

Detection confidence measures how confident CodeAlign is about the detected language and indentation style. Repair confidence measures how reliable the repaired indentation appears after applying ambiguity checks and repair rules. These values may differ, especially for Python because indentation affects program logic.

Python repair uses language-aware heuristics:

  • Normalizes tabs/spaces
  • Indents after block openers ending in :
  • Dedents before else, elif, except, and finally
  • Preserves comments and blank lines
  • Detects mixed indentation
  • Detects suspicious flat blocks after block openers
  • Reports ambiguous blocks instead of claiming exact reconstruction

Brace-language repair supports JavaScript, TypeScript, Java, C, C++, and C#:

  • Indents after {
  • Dedents before }
  • Handles } else {
  • Handles else, catch, and finally blocks through brace structure
  • Handles switch, case, and default indentation
  • Preserves comments and blank lines
  • Removes trailing whitespace

HTML/XML repair:

  • Indents nested tags
  • Dedents closing tags
  • Handles self-closing tags
  • Handles common HTML void elements such as br, img, input, meta, and link
  • Preserves comments and text content

JSON repair:

  • Parses JSON safely
  • Pretty-indents valid JSON
  • Returns a clear warning for invalid JSON
  • Does not guess missing braces, brackets, commas, or quotes

Python Ambiguity Warning

CodeAlign repairs common indentation damage in pasted code. For Python, indentation affects program logic, so CodeAlign provides best-effort repair with confidence scoring and warnings instead of claiming perfect reconstruction.

When Python structure is ambiguous, the API returns:

Python indentation is semantic. CodeAlign repaired common indentation damage, but ambiguous blocks may require manual review.

Large Input Support

  • Backend validates a maximum input size of 1 MB
  • Repair engines process line-by-line where possible
  • The frontend shows loading state during repair
  • No submitted code is stored

Privacy

CodeAlign processes pasted code for repair and does not store it.

The backend does not execute submitted code and does not write submitted code to a database, file, or cloud service.

Tech Stack

  • Frontend: React, TypeScript, Tailwind CSS, Vite
  • Backend: FastAPI, Python
  • Testing: Pytest
  • Runtime: Docker Compose

API Endpoints

  • GET /health - service health
  • POST /repair - repair pasted code indentation

Local URLs:

Example repair request:

{
  "code": "def hello():\nprint(\"hi\")"
}

Optional advanced override fields are still accepted:

{
  "code": "def hello():\nprint(\"hi\")",
  "language": "python",
  "indent_style": "spaces",
  "indent_size": 4
}

Example repair response:

{
  "language": "python",
  "detected_language": "python",
  "language_confidence": 98,
  "language_detection_reason": "Found a Python function definition; Found indentation-sensitive colon blocks; Found Python-style print calls.",
  "indent_style": "spaces",
  "indent_size": 4,
  "indentation_confidence": 45,
  "indentation_detection_reason": "No existing indentation found; defaulted to 4 spaces for Python.",
  "original_line_count": 2,
  "repaired_line_count": 2,
  "lines_changed": 1,
  "confidence": 82,
  "ambiguous_blocks": 1,
  "warnings": [
    "Python indentation is semantic. CodeAlign repaired common indentation damage, but ambiguous blocks may require manual review."
  ],
  "summary": [
    "Repaired indentation after block openers",
    "Detected ambiguous Python block structure",
    "Removed trailing whitespace"
  ],
  "repaired_code": "def hello():\n    print(\"hi\")"
}

Docker Setup

The current Docker setup is intended for local development and demos. It runs the frontend through the Vite dev server and the backend through Uvicorn.

From the project root:

docker compose up --build

Then open:

The browser loads the frontend at http://localhost:5177. Frontend API calls default to http://localhost:8004, which is the backend port exposed to the host browser. Do not set the browser-facing API URL to http://backend:8000; Docker service names are available inside the Compose network, but normal browsers cannot resolve them.

VITE_API_URL defaults to http://localhost:8004 for local use. For non-local deployment, rebuild the frontend with the correct backend API URL or add a runtime configuration layer. A production deployment should use a built static frontend setup, such as nginx or another static host, rather than the Vite dev server.

Stop services:

docker compose down

Local Development

Backend:

cd backend
python -m pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

Frontend:

cd frontend
npm install
npm run dev -- --host 0.0.0.0

When running locally outside Docker, the frontend dev server listens on Vite port 5173. Docker Compose maps it to 5177.

Testing

Backend:

cd backend
python -m pytest

Frontend build:

cd frontend
npm install
npm run build

Clean Export

Create a source-only zip from the project root:

.\scripts\export.ps1

The export script creates CodeAlign.zip from a temporary clean copy and verifies that generated artifacts are excluded, including node_modules, dist, __pycache__, .pytest_cache, .pyc files, virtual environments, build/coverage output, .env, and nested zip files.

Screenshots

Home / Empty Repair Workspace

CodeAlign Home

Repaired Python Example

CodeAlign Repaired Python

TypeScript Repair with Auto Detection

CodeAlign TypeScript Repair

Future Improvements

  • Optional Monaco or CodeMirror editor integration
  • More nuanced Python AST/token-assisted repair hints
  • More complete HTML mixed-content formatting
  • Exportable repair report
  • Additional languages
  • Client-side diff virtualization for very large files
  • Production frontend Docker image using a static build and nginx
  • More advanced parser-based repair for supported languages
  • VS Code extension version

CV Bullet Point

CodeAlign — Broken Code Indentation Repair Tool. Built a Dockerized developer utility that repairs damaged pasted code indentation across Python, JavaScript/TypeScript, C/C++/C#, Java, HTML/XML, and JSON using language-aware heuristics, confidence scoring, warnings, before/after comparison, copy/download actions, and large-input support.

About

Broken code indentation repair tool with automatic language detection

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors