Training a small language model from random weights, using curriculum-driven multi-agent conversations, reward-weighted imitation, and staged LoRA adaptation
machine-learning pytorch artificial-intelligence multi-agent transformer lora language-model imitation-learning synthetic-data curriculum-learning fine-tuning conversation-ai huggingface dialogue-system agent-framework llm tiny-llm reward-modeling developmental-ai baby-llm
-
Updated
Mar 23, 2026 - Python