Skip to content

Latest commit

 

History

History
251 lines (190 loc) · 7.03 KB

File metadata and controls

251 lines (190 loc) · 7.03 KB
title Lesson 2
weight 10
draft true

TO BE REVIEWED!

📘 Lesson 2 – LLM Access & Streaming Implementation

Welcome to the second lesson of the OpenLLM course with Apache Open Serverless! In this lesson, we'll learn how to actually communicate with Large Language Models by creating an LLM chat with streaming capabilities.


🚀 Step 1: Environment Setup

Fork the Repository

  1. Go to GitHub → Mastro GPT: https://github.com/mastrogpt
  2. Fork the repository to your account (create a copy)
  3. Launch the Code Space from your forked repository

Why fork? This gives you write permissions to save changes using Git version control without depending on the original repository.

Download Lesson Files

  1. Select Lesson 2 from the lessons menu
  2. This downloads all lesson files (both markdown and PDF formats)
  3. Each lesson is independent - you don't need previous lesson files

🔧 Pro Tips

Setting Up the Terminal Shortcut

Before starting the exercises, configure a keyboard shortcut to copy text from the editor directly to the terminal. This is extremely useful for this course:

  1. Open Settings: Click the gear icon (⚙️) in the bottom left corner
  2. Select Keyboard Shortcuts: Choose "Keyboard Shortcuts" from the menu
  3. Search for the Command: Type "run selected text in active terminal"
  4. Set Your Shortcut: Click on the command and set your preferred key combination
    • Recommended: Ctrl+Enter (as used in the course)
    • This allows you to copy text from files directly to the terminal

Why this shortcut? It's extremely comfortable and saves time when you need to run commands from the lesson files. This feature is only available in VS Code and is not enabled by default.

🔐 Step 2: Accessing LLMs & Managing Secrets

Understanding Credentials

LLMs are protected with the same credentials as Open Serverless:

  • olima_host: The URL to access the LLM service
  • AUTH: Authentication credentials from your Open Serverless login

Secret Management Strategy

Always use this pattern in your code:

def get_secret(args, env_var_name):
    return args.get(env_var_name) or os.environ.get(env_var_name)

This allows your code to work in both test and production environments.

Testing LLM Access

  1. Open terminal and run: obs ai cli
  2. Import required modules and set credentials:
import os
olima_host = args.get('olima_host')
auth = args.get('auth')
base_url = f"https://{auth}@{olima_host}"
  1. Test the connection:
curl "$base_url"
# Should return: "olima is running"

💬 Step 3: Basic LLM Communication

Simple API Call

Create a message with this structure:

message = {
    "model": "llama-3.1-8b",  # Meta's 3.1B parameter model
    "prompt": "Who are you?",
    "stream": False,  # Start without streaming
    "input": "Who are you?"
}

Making the Request

import requests
url = f"{base_url}/api/generate"
response = requests.post(url, json=message)
result = response.json()
print(result['response'])

🌊 Step 4: Implementing Streaming

What is Streaming?

Streaming allows you to see LLM responses in real-time instead of waiting for complete generation. This makes interactions much more immediate.

Streaming from Command Line

message["stream"] = True  # Enable streaming
response = requests.post(url, json=message, stream=True)

# Response is now an iterator
for chunk in response.iter_lines():
    if chunk:
        data = json.loads(chunk.decode('utf-8'))
        print(data['response'], end='', flush=True)

Stream Format

Each streaming response contains:

  • model: Model name
  • response: Text chunk
  • done: Boolean flag (true when complete)
  • Additional metadata (context, duration, etc.)

🔌 Step 5: Serverless Streaming Implementation

The Challenge

Actions in serverless environments are asynchronous and lose contact with the web server, making streaming complex.

Solution: The Streamer Component

The streamer component provides:

  • stream_host: Host for streaming connection
  • stream_port: Port for streaming connection
  • A socket to receive intermediate results

Implementation Pattern

def stream_to_socket(iterator, args):
    stream_host = args.get('stream_host')
    stream_port = args.get('stream_port')
    
    # Connect to socket
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((stream_host, int(stream_port)))
    
    # Stream results
    for item in iterator:
        sock.send(f"{item}\n".encode())
    
    sock.close()
    return "Streaming completed"

Testing with Stream Mock

Use stream_mock for local testing:

from test.stream_mock import StreamMock

args = {}
mock = StreamMock()
mock.start(args)

# Your streaming function here
result = stream_to_socket(countdown(10), args)

mock.stop()
print(result)

🎯 Step 6: Practical Exercises

Exercise 1: Add Authentication Parameters

Location: Look for TODO E2.1 Task: Add the required parameters to access and authorize Ollama

  • Add @perm olima_host
  • Add @perm auth
  • Use args.get('olima_host') and args.get('auth')

Exercise 2: Fix Streaming Implementation

Location: Look for TODO E2.2 Task: Fix the streaming to handle Ollama's response format

# Decode the JSON response
data = json.loads(chunk.decode('utf-8'))
# Extract only the response part
response_text = data.get('response', 'error')

Exercise 3: Model Switcher

Location: Look for TODO E2.3 Task: Add model switching functionality

if input_text == "llama":
    model = "llama-3.1-8b"
elif input_text == "deepseek":
    model = "deepseek-coder:6.7b"

🚀 Step 7: Deployment

Deploy Commands

# Deploy all functions
obs deploy

# Deploy specific function
obs ide deploy function-name

# Incremental deployment (recommended)
obs ide dev

Important Notes

  • Always return streaming: true for streaming functions
  • Use @perm annotations to pass secrets
  • The CLI sees test environment variables
  • Production uses configuration + packages environment

🎉 What You've Learned

  1. LLM Access: How to connect to and communicate with Large Language Models
  2. Secret Management: Proper handling of credentials in both test and production
  3. Streaming Implementation: Real-time response streaming for better user experience
  4. Model Switching: Dynamic model selection in your applications
  5. Testing: Using mocks to test streaming functionality locally

Next Steps

  • Practice with the exercises
  • Experiment with different models
  • Build upon this foundation for more complex LLM applications

🔧 Tips

  • Environment Variables: Use obs config dump to see all available configuration
  • Incremental Deploy: Use obs ide dev for automatic deployment on save
  • Timeout Handling: Be aware of execution time limits (default: 3 minutes)

Remember: Always test your streaming functions with the mock before deploying to production!