Official starter repo for Kaggle competiton https://www.kaggle.com/competitions/llm-agentic-legal-information-retrieval/host/launch-checklist
(Tested with Ubuntu-24.04 in WSL)
# Clone the repository
git clone https://github.com/Omnilex-AI/Omnilex-Agentic-Retrieval-Competition.git
cd Omnilex-Agentic-Retrieval-Competition
# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt # for testing/linting
# Install package in development mode
pip install -e .Get it from Kaggle into data directory
Two baseline notebooks are provided:
-
Direct Generation (
notebooks/01_direct_generation_baseline.ipynb)- Prompts LLM to directly generate citations
- Simple but prone to hallucination
-
Agentic Retrieval (
notebooks/02_agentic_retrieval_baseline.ipynb)- Uses ReAct-style agent with search tools
- Grounded in actual legal documents
Both notebooks work in VSCode and can be submitted to Kaggle.
python scripts/validate_submission.py submission.csvSee Kaggle
├── src/omnilex/ # Core library
│ ├── citations/ # Citation parsing & normalization
│ ├── evaluation/ # Metrics & scoring
│ ├── retrieval/ # BM25 search & tools
│ └── llm/ # LLM loading & prompts
├── notebooks/ # Baseline notebooks
├── utils/ # Data & utility scripts
├── tests/ # Test suite
└── data/ # Data directory
- Python >= 3.10
- llama-cpp-python (for local LLM inference)
- rank-bm25 (for keyword search)
- pandas, numpy, scikit-learn
For Kaggle submissions, you may need to (depending on your solution):
- Upload your GGUF model as a Kaggle dataset
- Upload pre-built indices as a Kaggle dataset
- Package the
omnilexlibrary
Apache 2.0 - See LICENSE
For public questions about the competition please use the "Discussion" tab or open an issue on this repository. For private questions reahc out to host on Kaggle or ari.jordan@omnilex.ai