Skip to content

rodneylab/local-ai-llm-playground

Rodney Lab Local A I L L M Playground Git Hub banner

Rodney Lab logo

local-ai-llm-playground

Experiments running offline LLMs in Python and Rust locally using Ollama and llama.cpp

Introduction

Collection of local AI experiments that should run on a recent home computer. Makes use of local Ollama and llama.cpp servers for running completion and chat tasks with Gemma4 and Mistral models. Code is written in Python and Rust and each example has a short description detailing how you can download the model and run the code.

Ollama is an open-source programming language designed for rapid prototyping, education, and research in the field of artificial intelligence (AI). Ollama:

llama.cpp is a C++ implementation of LLaMA. It is:

  • extremely memory-efficient through quantisation;
  • works well on CPU-only setups; and
  • is available as a library for integration with other applications.

Source: Running Local LLMs

Contents

Setup

Prerequisites

Example run on Ollama or llama.cpp. Here’s a quick guide to getting those setup on macOS with Homebrew. Follow links for more detailed instructions, and for other operating systems. You will also need Rust or Python set up on your system (depending on which examples you want to run).

Ollama

brew install ollama

For other operating systems, or more details, see the Official Ollama Quickstart\Guide.

llama.cpp

brew install llama.cpp

For other operating systems, or more details, see the LLaMA.cpp HTTP Server Quick Start Guide.

Installation

Nothing to install beyond the prerequisites.

Examples

llamacpp-gemma4-e4b-completion
Terminal animation shows the user entering the following command: 'cargo run --bin gemma-3-4b-it-qat-q4_0-gguf'. The app starts running and a prompt appears. At the prompt, the user types 'Building a website can be done in 10 simple steps:', then hits enter. The app responds with a stream of output ending with a copy of the input prompt, then the text 'Sending prompt to llama.cpp and awaiting response...'. After a short delay, of a few seconds, the model response starts streaming, it reads ' 1. Choose a domain name and web hosting provider. 2. Select a website builder or CMS. 3. Design your website layout. 4. Create your content. 5. Add images and videos. 6. Optimize your website for search engines. 7. Test your website on different devices. 8. Launch your website. 9. Promote your website. 10. Maintain and update your website regularly. Do you want me to elaborate on any of these steps, or perhaps provide links to resources for each?'. The app presents a new prompt, ready for a new question, though the animation restarts. Gemma4 LLM completion demo calling local llama.cpp server from Rust code.
llamacpp_tts
Large Language Model text-to-speech (TTS) demo with voice cloning.
ollama-mistral-instruct-chat

🤔 Why run local LLMs?

  • Data sovereignty: you have more control over your data.
  • Offline support: great if you have an unstable connection or are temporarily offline.
  • Model fine-tuning: you also have more control over the model run.

You don't need the latest GPU: with llama.cpp or Ollama, smaller models (up to around 7 billion parameters) can run comfortably on a typical home computer.

For balance, however, running locally, you pay the one-off cost of downloading the model you want to run, you might not be able to run the largest models, depending on your machine’s spec. Also, a cloud service would be more scalable if you needed to step up model usage.

☎️ Issues and Support

Open an issue if something does not work as expected or if you have some improvements.

Feel free to jump into the Rodney Lab matrix chat room.

Feature requests

New feature suggestions are always welcome and will be considered, though please keep in mind that some of them may be out of scope for what the project is trying to achieve (or is reasonably capable of). If you have an idea for a new feature and would like to share it, you can create a feature request.

Feature requests are tagged with one of the following:

  • Roadmap - will be implemented in a future release
  • Backlog - may be implemented in the future but needs further feedback or interest from the community
  • Icebox - no plans to implement as it doesn't currently align with the project's goals or capabilities, may be revised at a later date

Contributions

Contributions welcome, write a short issue with your idea, before spending to much time on more involved additions.

Contributing guidelines

  • Before working on a new feature it's preferable to submit a feature request first and state that you'd like to implement it yourself
  • Please don't submit PRs for feature requests that are either in the roadmap[1], backlog[2] or icebox[3]
  • Avoid introducing new dependencies
  • Avoid making backwards-incompatible configuration changes
[1] [2] [3]

[1] The feature likely already has work put into it that may conflict with your implementation

[2] The demand, implementation or functionality for this feature is not yet clear

[3] No plans to add this feature for the time being

Acknowledgements

Inspired by:

License

The project is licensed under BSD 3-Clause License — see the LICENSE file for details.

About

Experiments running offline LLMs in Python and Rust locally using Ollama and llama.cpp

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors