Disclaimer: This tool performs an analysis of legal documents. It does not provide legal advice, and its use does not establish an attorney-client relationship or attorney-client privilege. If you require attorney-client privilege or legal counsel, please consult with a qualified attorney. The results generated by this should not be relied upon as a substitute for professional legal advice.
This tool automates the analysis of legal contracts (PDF, Word, Excel, Images) using Large Language Models (LLMs). It supports Azure OpenAI (GPT-4) and Google Gemini (2.0 Flash), allowing for flexible, high-capacity document review.
- Hybrid Provider Support: Switch between Azure OpenAI and Google Gemini.
- Dynamic Context Window: Automatically adjusts analysis chunk sizes based on the selected model's capacity (maximizing "Vik's Law" of 20-30% utilization).
- Multi-Format Support: Extracts text from PDFs (OCR enabled), Word docs (
.docx,.doc), Excel files (.xlsx), and common image formats. - Excel Reporting: Generates a structured Excel report (
ai_results.xlsx) with analysis results for each contract. - GUI & CLI: Modern, user-friendly graphical interface and robust command-line interface.
- Python 3.10+ (Recommended)
- Tesseract OCR (for scanned PDFs/Images)
- Windows: Install from UB-Mannheim/tesseract and ensure
tesseract.exeis in your system PATH.
- Windows: Install from UB-Mannheim/tesseract and ensure
- Poppler (for PDF processing)
- Windows: Download from poppler-windows, extract, and add the
binfolder to your system PATH.
- Windows: Download from poppler-windows, extract, and add the
Install the required Python packages:
pip install -r requirements.txtCreate a .env file or set these variables in your system:
For Azure OpenAI:
AZURE_OPENAI_API_KEY: (Optional) If using API Key auth. If not set, the tool attempts Azure AD authentication (CLI, VS Code, Browser).
For Google Gemini:
- API Key: Set
GOOGLE_API_KEYenvironment variable. - OAuth (Recommended): Download
client_secret.jsonfrom Google Cloud Console.- Create a Project > APIs & Services > Credentials.
- Create "OAuth 2.0 Client ID" (Desktop App).
- Download the JSON file and save it (e.g., as
client_secret.json).
- The tool uses
LOI_prompts.yaml(formerlyprompts.yaml) to define the analysis questions. - Format:
prompts: - column_header: "Parties Involved" prompt: "Identify the buyer and seller in this contract." - column_header: "Effective Date" prompt: "What is the effective date of the agreement?"
Simply run the script without arguments to launch the GUI:
python LOI_Analysis.py- Select Input: Local folder or file.
- Provider: Choose between Google (Gemini 2.0 Flash - Default) and Azure (GPT-4o).
- Auth Method:
- Google: OAuth (Recommended), API Key, or ADC.
- Azure: RBAC or API Key.
Run headlessly for automation:
# Analyze a folder using Google Gemini (Default)
python LOI_Analysis.py --input "C:\Contracts" --output "ai_results.xlsx"
# Analyze using Azure OpenAI
python LOI_Analysis.py --input "C:\Contracts" --provider azure
# Specify a specific model
python LOI_Analysis.py --input "C:\Contracts" --provider google --model gemini-2.0-flash-001The tool dynamically calculates the safe amount of text to send based on the model:
- GPT-4o: ~128k tokens context -> Process ~30k tokens (~100k chars) per chunk.
- Gemini 2.0 Flash: ~1M tokens context -> Process huge documents in single pass.
LOI_Analysis.py: Main application logic.auth.py: Authentication handlers for Azure and Google.LOI_prompts.yaml: Analysis questions configuration.requirements.txt: Python dependencies.Contracts/: Default input directory.
You can create a standalone executable of the application using PyInstaller.
-
Install PyInstaller:
pip install pyinstaller
-
Build the Executable:
pyinstaller --onefile LOI_Analysis.py
This command will generate a single executable file in the
distfolder.
- Tesseract Not Found: Ensure Tesseract is installed and added to PATH. You may need to restart your terminal/IDE.
- Azure Auth Errors: Try running
az loginin your terminal if using RBAC. - Google Auth Errors: Ensure
GOOGLE_API_KEYis set correctly. - PDF Extraction Issues: If OCR fails, check the logs for Poppler/Tesseract errors.