Skip to content

fagenorn/handcrafted-persona-engine

Repository files navigation

Persona Engine

Persona Engine Dancing mascot

An AI-driven voice, animation, and personality stack for your Live2D character.

Latest release Downloads Discord Follow on X
Platform .NET GPU License

At a glance

What it is What you need How long to first pixel
Voice-driven Live2D character with LLM brain, real-time TTS, and streaming-ready output. Windows x64, NVIDIA GPU with CUDA, ~16 GB free disk. Download → double-click → pick a profile.

Table of contents

Overview

Persona Engine listens through your microphone, thinks with an LLM guided by a personality file, speaks back with real-time TTS (optionally voice-cloned), and drives a Live2D avatar in sync. You can watch the character inside the built-in transparent overlay, or pipe it into OBS over Spout for streaming.

The included Aria model is rigged for the engine's lip-sync and expression pipeline out of the box. You can bring your own model too — see the Live2D Integration Guide.

Important

Persona Engine feels most natural with a fine-tuned LLM trained on the engine's communication format. Standard OpenAI-compatible models (Groq, OpenAI, Ollama, …) work too, but you'll want to put care into personality.txt. A template (personality_example.txt) ships in the repo, and the fine-tuned model is available in Discord.

See it in action

Persona Engine demo video

Click to watch the demo on YouTube.

cca80664d7326396e47acbc402b67ff3_1_compressed.mp4

Getting started

Important

Requires NVIDIA GPU with CUDA (Windows x64). ASR, TTS, and RVC all run on CUDA via ONNX Runtime — CPU/AMD/Intel are not supported.

  1. Download PersonaEngine-<version>-win-x64.zip from Releases.
  2. Extract somewhere with ≥ 16 GB free. Models land in a Resources/ folder next to the exe.
  3. Double-click PersonaEngine.exe and pick an install profile when prompted. Models and the NVIDIA runtime are downloaded, hash-verified, and installed automatically.

Re-run the picker

PersonaEngine.exe --reinstall
Other CLI flags
Flag Purpose
--profile=try|stream|build Skip the picker and use the named profile
--repair Re-download anything that fails hash verification
--verify Re-hash installed assets and report mismatches (no downloads)
--offline Refuse to touch the network — fail fast if assets are missing
--non-interactive Treat any prompt as fatal (pair with --profile=…)
--skip-gpu-check Bypass the GPU capability gate (not recommended)
Upgrading from a pre-installer build

The asset directory layout changed when the in-app installer landed. Existing Resources/Models/ and Resources/Live2D/Avatars/ trees from older builds are ignored — the installer re-downloads into the new locations on first launch. Free up ~16 GB before starting; delete the old folders once the bootstrapper finishes.

Install profiles

Try it out Stream with it Build with it
Best for First look, small downloads Everyday streaming Production, highest quality
Listening (Whisper) Tiny Small Large-v3 Turbo
Voice (TTS) Kokoro Kokoro Kokoro + Qwen3 expressive
Lip-sync VBridger VBridger VBridger + Audio2Face
Approx. download Smallest Mid Largest (≈ 16 GB)

Tip

Picked Build-with-it? You still have to flip the switches. The profile downloads the bigger models, but the UI defaults keep the light ones active until you toggle them:

  • Voice panel → set mode to Expressive (Qwen3)
  • Listening panel → pick the Accurate Whisper template
  • Avatar panel → enable Audio2Face lip-sync

Full walkthrough in INSTALLATION.md.

Screenshots

Dashboard with presence strip
Dashboard — presence strip, LLM probe, quick toggles.
Voice panel
Voice — Clear / Expressive modes, RVC, audition.
Listening panel
Listening — Whisper template chips, VAD tuning.
Avatar & lip-sync panel
Avatar — VBridger / Audio2Face lip-sync, emotions.
Transparent overlay on desktop
Overlay — transparent, always-on-top, drag to reposition.

Features

Mascot with wand

Live2D avatar

Real-time rendering with emotion-driven motions and VBridger lip-sync. Includes the rigged Aria model; custom models supported.

LLM conversation

Any OpenAI-compatible endpoint (local or cloud). Personality driven by personality.txt, with a built-in connection probe.

Voice in (ASR)

Dual-Whisper pipeline via Silero VAD: a fast model for barge-in detection, a large model for accurate transcription.

Voice out (TTS)

Two engines: Kokoro (clear, fast) and Qwen3 (expressive). Optional real-time RVC voice cloning on top.

Lip-sync

VBridger by default, or the higher-fidelity Audio2Face solver for Build-with-it setups.

Built-in overlay

Transparent, always-on-top window that mirrors the avatar. No OBS needed for desktop use.

OBS-ready output

Dedicated Spout streams for avatar, subtitles, and roulette — no window capture required.

Control panel

Dashboard, per-subsystem panels, live metrics (LLM / TTS / audio latency), conversation viewer, theming.

In-app installer

Profile picker, SHA-256 verification, repair and verify modes. Ships CUDA 12.4 + cuDNN 9.1.1 + CUDA 13 redists.

Extras

Subtitle rendering, interactive roulette wheel, experimental screen awareness, keyword + ML profanity filtering.

How it works

A single turn flows through these stages:

  1. Listen — microphone audio, Silero VAD picks out speech.
  2. Understand — fast Whisper watches for barge-in; accurate Whisper transcribes the final utterance.
  3. Contextualize (optional) — Vision module reads text from a chosen window.
  4. Think — transcription + history + context + personality.txt go to the LLM.
  5. Respond — LLM streams text, optionally tagged with emotions like [EMOTION:😊].
  6. Filter (optional) — keyword + ML profanity pass.
  7. Speak — TTS (Kokoro or Qwen3) synthesizes the response; espeak-ng fills phoneme gaps.
  8. Clone (optional) — RVC retargets the voice in real time.
  9. Animate — phonemes drive lip-sync, emotion tags trigger Live2D expressions, idle animations run between turns.
  10. Display — subtitles, avatar, and roulette render to the built-in overlay and/or Spout outputs for OBS; audio plays through the selected device.
  11. Loop — back to listening.
Pipeline diagram

Use cases

Mascot with lightbulb
  • VTubing & streaming — AI co-host, chat-reactive character, fully AI-driven persona.
  • Virtual assistant — animated desktop companion that actually talks back.
  • Interactive kiosks — guides for museums, trade shows, retail.
  • Education — language practice partner, historical-figure Q&A, tutor.
  • Games — more conversational NPCs and companions.
  • Character chatbots — immersive chats with fictional characters.

Deeper docs

  • INSTALLATION.md — profile picker, CLI flags, LLM + personality setup, overlay vs Spout, building from source, upgrading, bootstrapper troubleshooting.
  • CONFIGURATION.md — every appsettings.json field, annotated.
  • Live2D.md — rigging requirements and the VBridger parameter spec for custom avatars.

Community

Need help getting started? Want to try the fine-tuned LLM, trade rigging tips, or just chat with the engine live? Come say hi.

Join Discord

Join Discord

Bugs and feature requests live on GitHub Issues.

Contributing

PRs are welcome. The short version:

  1. For anything non-trivial, open an Issue first to align on direction.
  2. Fork, branch (feature/your-thing), code, commit, push.
  3. Open a PR against main with a clear description of the change.

Formatting is enforced in CI via CSharpier (dotnet csharpier check . from src/PersonaEngine/).

Support


Tip

Custom avatars → Live2D.md. Every config knob → CONFIGURATION.md. Full setup walkthrough → INSTALLATION.md.

About

An AI-powered interactive avatar engine using Live2D, LLM, ASR, TTS, and RVC. Ideal for VTubing, streaming, and virtual assistant applications.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages