A beautiful web interface for interacting with Large Language Models via Hugging Face's router API.
- 🎨 Beautiful UI: Responsive web interface with gradient design
- 🔄 Multiple Models: Support for various LLMs including Qwen, Llama, Mistral
- ⚙️ Parameter Control: Adjust temperature, max tokens, top-p, and penalties
- 📊 Detailed Responses: View metadata including token usage and model info
- 🚀 Real-time: Live loading states and error handling
- 🔒 Secure: Environment variables for API keys
- Qwen/Qwen2.5-1.5B-Instruct:featherless-ai
- meta-llama/Meta-Llama-3-8B-Instruct
- microsoft/DialoGPT-medium
- mistralai/Mistral-7B-Instruct-v0.1
- HuggingFaceH4/zephyr-7b-beta
- Node.js (v18 or higher)
- Hugging Face API token
-
Clone the repository
git clone https://github.com/herbowicz/huggingface-llm-chat.git cd huggingface-llm-chat -
Install dependencies
npm install
-
Set up environment variables Create a
.envfile in the root directory:HF_TOKEN=your_hugging_face_token_here
-
Start the server
npm start
-
Open your browser Navigate to
http://localhost:3000
├── index.html # Web interface
├── server.mjs # Express server
├── my-llm.mjs # Original CLI script
├── test-llm.mjs # Test script
├── package.json # Dependencies
├── .env # Environment variables (create this)
└── .gitignore # Git ignore rules
-
Temperature (0-2): Controls randomness in responses
Lower values (0.1-0.3) make responses more focused and predictable, higher values (0.7-1.0) make them more creative and varied -
Max Tokens (1-2048): Maximum length of response
Tokens are pieces of words - roughly 1 token ≈ 0.75 words, so 100 tokens ≈ 75 words -
Top P (0-1): Nucleus sampling parameter
Controls diversity by considering only the most likely words that make up the top P% probability mass -
Frequency Penalty (-1 to 1): Reduces repetition
Positive values discourage the model from repeating the same words/phrases it already used -
Presence Penalty (-1 to 1): Encourages topic diversity
Positive values encourage the model to talk about new topics rather than staying on the same subject
- General questions: Temperature: 0.7, Max Tokens: 150, Top P: 0.9, Penalties: 0.0
- Creative writing: Temperature: 0.9, Max Tokens: 300, Top P: 0.9, Penalties: 0.0
- Factual answers: Temperature: 0.3, Max Tokens: 100, Top P: 0.9, Penalties: 0.0
- Detailed explanations: Temperature: 0.5, Max Tokens: 500, Top P: 0.9, Penalties: 0.0
- Go to Hugging Face Settings
- Create a new token with "Read" and "Inference" permissions
- Copy the token to your
.envfile
HF_TOKEN=hf_your_token_here # Required: Your Hugging Face API token
PORT=3000 # Optional: Server port (default: 3000)- Enter your question in the text area
- Select a model from the dropdown
- Adjust parameters if needed (defaults work well)
- Click "Ask Question" and wait for the response
For direct API testing:
node my-llm.mjs- API tokens are stored in environment variables
.envfile is excluded from git- No sensitive data in client-side code
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - feel free to use this project however you'd like!
Made with ❤️ by Grzegorz Herbowicz