A production-grade observability stack for RHEL 10, deploying Grafana, InfluxDB, Prometheus, Loki, and Alloy using Podman Quadlets (systemd-managed containers).
- π― Overview
- β¨ Features
- ποΈ Architecture
- π¦ Components
- βοΈ Prerequisites
- π Installation
- π§ Configuration
- π Integration Setup
- β Health Check
- ποΈ Uninstallation
- π Troubleshooting
- π Tuning and Scaling
- π Security
- π Documentation
- π License
This project provides a complete, production-ready observability stack that unifies monitoring data from Zabbix, LibreNMS, and Prometheus into a single Grafana visualization platform, with centralized logging via Loki.
- β Podman Quadlets - Systemd-managed containers (no Docker, no docker-compose)
- β Idempotent - Safe to run installation multiple times
- β SELinux Enforcing - Production-grade security
- β
Bind Mounts - Persistent data storage under
/srv/obs - β 1-Year Retention - Configured for long-term data storage
- β RHEL 10 Native - Built for Red Hat Enterprise Linux 10
| Platform | Version | Status |
|---|---|---|
| RHEL 10 | 10.x | β Tested |
| CentOS Stream | 9 |
- Single pane of glass for all monitoring data
- Integrates existing Zabbix and LibreNMS deployments
- Prometheus-based metrics collection
- Centralized log aggregation with Loki
- SELinux enforcing mode support
- Systemd service management
- Automatic container updates
- Health checks and monitoring
- Resource limits and quotas
- One-command installation
- Idempotent and safe to re-run
- Clean uninstallation with data preservation option
- Comprehensive health checking
- Zabbix - Via API and optional direct database access
- LibreNMS - Via InfluxDB push integration
- Prometheus - Native scraping of exporters
- Loki - Systemd journal log collection
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Grafana VM (RHEL 10) β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Podman Network: obs-net (bridge) β β
β β β β
β β ββββββββββββ βββββββββββββ ββββββββββββ βββββββββββ β β
β β β π¨ β β πΎ β β π β β π β β β
β β β Grafana ββββΊβ InfluxDB ββββΊβPrometheusββββΊβ Loki β β β
β β β :3000 β β :8086 β β :9090 β β :3100 β β β
β β ββββββ¬ββββββ βββββββ²ββββββ ββββββ²ββββββ ββββββ²βββββ β β
β β β β β β β β
β β β β β ββββββ΄βββββ β β
β β β β β β π β β β
β β β β β β Alloy β β β
β β β β β β (agent) β β β
β β β β β βββββββββββ β β
β ββββββββββΌββββββββββββββββΌββββββββββββββββΌββββββββββββββββββββββ β
β β β β β
β /srv/obs/* /srv/obs/* /srv/obs/* β
β (bind mounts with SELinux labels) β
βββββββββββββΌββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ
β β
ββββββΌββββββ βββββΌβββββββββββ
β π‘ β β π‘ β
β Zabbix β β LibreNMS β
β VM β β VM β
β (API + β β (MariaDB) β
β DB) β β β
ββββββββββββ ββββββββ¬ββββββββ
β
Metrics Push βββββββ
via InfluxDB API β
βΌ
External Systems β Grafana Stack β Visualization
βββββββββββββββββββββββββββββββββββββββββββββββββ
Zabbix VM
ββ API βββββββββββββββΊ Grafana (alexanderzobnin-zabbix-app plugin)
ββ MariaDB (optional)ββΊ Grafana (direct DB queries for history)
LibreNMS VM
ββ Metrics Push βββββββΊ InfluxDB βββΊ Grafana (Flux queries)
Exporters (future)
ββ Metrics Scrape βββββΊ Prometheus βββΊ Grafana
Host System
ββ Systemd Journal ββββΊ Alloy βββΊ Loki βββΊ Grafana
| Component | Purpose | Port | Exposure |
|---|---|---|---|
| Grafana | Visualization & dashboards | 3000 | β Public |
| InfluxDB 2.x | LibreNMS metrics storage | 8086 | β Internal |
| Prometheus | Metrics collection & storage | 9090 | π§ Configurable |
| Loki | Log aggregation & storage | 3100 | β Internal |
| Alloy | Log & metrics agent | - | β Internal |
| Component | Image | Base OS |
|---|---|---|
| Grafana | docker.io/grafana/grafana:latest |
Ubuntu |
| InfluxDB | docker.io/influxdb:2.7 |
Debian |
| Prometheus | quay.io/prometheus/prometheus:latest |
Alpine* |
| Loki | docker.io/grafana/loki:latest |
Alpine* |
| Alloy | docker.io/grafana/alloy:latest |
Alpine* |
*Alpine images are acceptable per project requirements when CentOS Stream 9 or RHEL UBI alternatives are not available.
| Resource | Minimum | Recommended |
|---|---|---|
| OS | RHEL 10 | RHEL 10 |
| CPU | 4 vCPU | 8 vCPU |
| RAM | 16 GB | 24 GB |
| Disk | 250 GB | 500 GB SSD |
| Network | 1 Gbps | 10 Gbps |
# Required packages
- podman >= 4.0
- systemd >= 252
- policycoreutils-python-utils (for semanage)
- container-selinux
# Optional but recommended
- curl (for health checks)
- openssl (for token generation)- RHEL 10 system with root access
- Podman installed and configured
- SELinux in enforcing mode
- Firewall configured (ports 3000, 8086, and optionally 9090 restricted to trusted sources)
- Network connectivity to Zabbix and LibreNMS VMs
- At least 500 GB available in
/srv
git clone https://github.com/yourusername/containerized-grafana-deploy.git
cd containerized-grafana-deploy# Copy example environment file
cp .env.example .env
# Edit configuration with your values
vi .envImportant: Update these critical values in .env BEFORE running the install script:
-
INFLUXDB_TOKEN- YOU MUST GENERATE THIS MANUALLY:openssl rand -base64 32
This token is set during InfluxDB first-time initialization and cannot be auto-generated. Copy the output and paste it into your
.envfile. -
GRAFANA_ADMIN_PASSWORD- Strong password (16+ chars) -
INFLUXDB_ADMIN_PASSWORD- Strong password (16+ chars) -
ZABBIX_URL- Your Zabbix API endpoint (e.g.,http://zabbix.example.com/api_jsonrpc.php) -
ZABBIX_API_TOKEN- Generate in Zabbix UI (see Zabbix Integration section below)
Firewall Configuration (Automatic):
The install script will automatically configure firewall rules if CONFIGURE_FIREWALL=true:
CONFIGURE_FIREWALL- Set totrueto auto-configure firewall (default:true)GRAFANA_ADMIN_SUBNET- Subnet allowed to access Grafana (e.g.,10.1.10.0/24)LIBRENMS_VM_IP- IP address of LibreNMS VM (e.g.,10.2.2.100)
Example configuration:
# Automatic firewall configuration
CONFIGURE_FIREWALL=true
GRAFANA_ADMIN_SUBNET=10.1.10.0/24
LIBRENMS_VM_IP=10.2.2.100To skip automatic firewall configuration, set CONFIGURE_FIREWALL=false and configure manually later.
# Run as root
sudo ./scripts/install.shThe installation script will:
- β Check prerequisites
- β
Create directory structure (
/srv/obs/*) - β Copy configuration files
- β Set permissions and SELinux labels
- β Install Quadlet unit files
- β Pull container images
- β Start all services
- β
Configure firewall rules (if
CONFIGURE_FIREWALL=true)
# Run health check
sudo ./scripts/health-check.shOpen your browser and navigate to:
With HTTPS (if TLS_ENABLED=true):
https://grafana.lab:3000
With HTTP (if TLS_ENABLED=false or not set):
http://<your-server-ip>:3000
Note: Self-signed certificates will trigger a browser security warning. Accept the risk to proceed, or see HTTPS/TLS Configuration for more details.
Login with credentials from .env:
- Username:
${GRAFANA_ADMIN_USER} - Password:
${GRAFANA_ADMIN_PASSWORD}
/srv/obs/
βββ grafana/
β βββ data/ # Grafana database and plugins
β βββ provisioning/ # Auto-provisioned datasources
β β βββ datasources/
β β β βββ datasources.yaml
β β βββ plugins/
β β βββ plugins.yaml
β βββ tls/ # TLS certificates (if enabled)
β βββ grafana.crt # Self-signed certificate
β βββ grafana.key # Private key
βββ influxdb/
β βββ data/ # InfluxDB time-series data
β βββ config/ # InfluxDB configuration
βββ prometheus/
β βββ data/ # Prometheus TSDB
β βββ config/
β βββ prometheus.yml # Scrape configuration
βββ loki/
β βββ data/ # Loki chunks and indexes
β βββ config/
β βββ loki.yaml # Loki configuration
βββ alloy/
βββ data/ # Alloy state
βββ config/
βββ config.alloy # Log collection config
Located in /etc/containers/systemd/:
obs-network.network # Podman bridge network
grafana.container # Grafana service
influxdb.container # InfluxDB service
prometheus.container # Prometheus service
loki.container # Loki service
alloy.container # Alloy agent
# Check status
systemctl status grafana.service
systemctl status prometheus.service
systemctl status loki.service
systemctl status influxdb.service
systemctl status alloy.service
# View logs
journalctl -u grafana.service -f
journalctl -u prometheus.service -n 100
# Restart a service
systemctl restart grafana.service
# Stop/Start all services
systemctl stop grafana alloy loki prometheus influxdb
systemctl start influxdb prometheus loki alloy grafanaGrafana supports HTTPS using self-signed certificates for secure access.
The installation script automatically generates a 10-year self-signed certificate when TLS_ENABLED=true.
Certificate Specifications:
- Algorithm: RSA 4096-bit
- Hash: SHA-256
- Validity: 3650 days (10 years)
- Subject Alternative Names (SANs): Configurable via environment variables
Edit .env to enable HTTPS:
# Enable HTTPS
TLS_ENABLED=true
# Certificate Common Name (must be grafana.lab)
TLS_CERT_CN=grafana.lab
# Subject Alternative Names (comma-separated)
TLS_CERT_SANS=DNS:grafana.lab,DNS:grafana,DNS:localhost,IP:10.1.10.100
# Certificate storage location
TLS_DIR=/srv/obs/grafana/tls
# Certificate validity (days)
TLS_CERT_VALIDITY_DAYS=3650
# RSA key size
TLS_KEY_SIZE=4096Generated certificates are stored in ${TLS_DIR} (default: /srv/obs/grafana/tls/):
/srv/obs/grafana/tls/
βββ grafana.crt # Certificate (644)
βββ grafana.key # Private key (600)
When TLS is enabled, access Grafana at:
https://grafana.lab:3000
Browser Warning: Self-signed certificates will trigger a browser security warning. You can:
- Accept the risk and proceed (recommended for lab/internal use)
- Import the certificate into your browser's trusted certificate store
- Use a proper CA-signed certificate for production environments
The installation script (scripts/install.sh) includes intelligent certificate management:
When you run scripts/install.sh:
- β Certificate exists and valid β Skips generation, uses existing certificate
- π Certificate missing β Generates new certificate
- π Certificate invalid β Regenerates certificate
- π Certificate expires within 30 days β Regenerates certificate
This means you can safely re-run the installer without regenerating certificates unnecessarily. The TLS generation step is idempotent and will preserve valid certificates.
Example output when certificate already exists:
[INFO] TLS is enabled - generating self-signed certificates...
[SUCCESS] Valid certificate already exists
[INFO] Certificate details:
Subject: C=US, ST=Lab, L=Lab, O=Lab, OU=Observability, CN=grafana.lab
Valid Until: Jan 26 17:44:55 2036 GMT
Days Remaining: 3650
[INFO] No action needed - certificate is valid and not near expiry
To regenerate certificates manually (outside of the install script):
sudo bash -c 'set -a; source .env; set +a; scripts/generate-selfsigned-tls.sh'Or if you need to force regeneration, delete the existing certificate first:
sudo rm -f /srv/obs/grafana/tls/grafana.{crt,key}
sudo bash -c 'set -a; source .env; set +a; scripts/generate-selfsigned-tls.sh'To use HTTP instead:
# In .env file
TLS_ENABLED=falseThen re-run the installation:
sudo scripts/install.shGrafana will be accessible at http://grafana.lab:3000 or http://<your-ip>:3000.
To check certificate expiry:
openssl x509 -in /srv/obs/grafana/tls/grafana.crt -noout -enddateFor automated monitoring, add a cron job or use Grafana's built-in certificate monitoring dashboards.
All configuration is managed through environment variables defined in the .env file.
This section provides a comprehensive reference of all variables used by the deployment.
These variables must be set in .env before running the installation:
| Variable | Used By | Purpose | Notes |
|---|---|---|---|
INFLUXDB_ADMIN_USER |
InfluxDB | Initial admin username | Set during first-time setup |
INFLUXDB_ADMIN_PASSWORD |
InfluxDB | Initial admin password | Minimum 16 characters recommended |
INFLUXDB_ORG |
InfluxDB, Grafana | Organization name | Default: observability |
INFLUXDB_BUCKET |
InfluxDB, Grafana | Bucket name for LibreNMS data | Default: librenms |
INFLUXDB_TOKEN |
InfluxDB, Grafana | Admin API token | Generate manually: openssl rand -base64 32 |
GRAFANA_ADMIN_USER |
Grafana | Admin username | Default: admin |
GRAFANA_ADMIN_PASSWORD |
Grafana | Admin password | Change from default immediately |
GRAFANA_DOMAIN |
Grafana | Server domain or IP | Used for root_url configuration |
These variables configure the Zabbix integration via the alexanderzobnin-zabbix-app plugin:
| Variable | Required | Used By | Purpose | Notes |
|---|---|---|---|---|
GRAFANA_INSTALL_ZABBIX_PLUGIN |
Yes | install.sh, Grafana | Enable Zabbix plugin | Set to true or false |
GRAFANA_ZABBIX_PLUGIN_ID |
Conditional | install.sh, Grafana | Plugin identifier | alexanderzobnin-zabbix-app |
GRAFANA_ZABBIX_TRENDS_THRESHOLD_DAYS |
Conditional | Grafana | Trends threshold (days) | Recommended: 7 |
ZABBIX_URL |
Conditional | Grafana datasource | Zabbix API endpoint | http://host/api_jsonrpc.php |
ZABBIX_API_TOKEN |
Conditional | Grafana datasource | API auth token | Generate in Zabbix UI |
Conditional: Required only if GRAFANA_INSTALL_ZABBIX_PLUGIN=true
Important: Username/password authentication is not supported. Zabbix integration requires API token authentication only. See Zabbix Integration for token generation instructions.
These variables control automatic firewall configuration during installation:
| Variable | Required | Used By | Purpose | Notes |
|---|---|---|---|---|
CONFIGURE_FIREWALL |
No | install.sh, uninstall.sh | Enable firewall automation | Skip if not set |
GRAFANA_ADMIN_SUBNET |
Conditional | install.sh | Grafana access subnet | CIDR: 10.1.10.0/24 |
LIBRENMS_VM_IP |
Conditional | install.sh | InfluxDB access IP | Single IP: 10.2.2.100 |
Conditional: Required only if CONFIGURE_FIREWALL=true
These variables control HTTPS/TLS certificate generation and configuration for Grafana:
| Variable | Required | Used By | Purpose | Notes |
|---|---|---|---|---|
TLS_ENABLED |
No | install.sh, Grafana | Enable HTTPS | Set to true or false (default: false) |
TLS_DIR |
Conditional | install.sh, Grafana | Certificate storage directory | Default: /srv/obs/grafana/tls |
TLS_CERT_CN |
Conditional | TLS script | Certificate Common Name | Must be: grafana.lab |
TLS_CERT_SANS |
Conditional | TLS script | Subject Alternative Names | Comma-separated DNS/IP list |
TLS_CERT_VALIDITY_DAYS |
Conditional | TLS script | Certificate validity period | Default: 3650 (10 years) |
TLS_KEY_SIZE |
Conditional | TLS script | RSA key size in bits | Default: 4096 |
Conditional: Required only if TLS_ENABLED=true
Important: The Common Name (
TLS_CERT_CN) must be set tografana.labas per requirements. Include this in yourTLS_CERT_SANSalong with any additional DNS names or IP addresses.
These variables are not currently consumed by the deployment but exist for potential future extensions or as reference values:
| Variable | Purpose | Current Status |
|---|---|---|
INFLUXDB_URL |
External InfluxDB URL | Documentation only - configure on LibreNMS side |
INFLUXDB_PORT |
InfluxDB port | Documentation only - hardcoded to 8086 |
ZABBIX_DB_HOST |
Zabbix MySQL hostname | Reserved for optional direct DB access (commented out) |
ZABBIX_DB_NAME |
Zabbix database name | Reserved for optional direct DB access (commented out) |
ZABBIX_DB_USER |
Zabbix database user | Reserved for optional direct DB access (commented out) |
ZABBIX_DB_PASSWORD |
Zabbix database password | Reserved for optional direct DB access (commented out) |
Note: The Zabbix database variables reference an optional datasource configuration that is commented out in
datasources.yaml. The deployment uses API-only access by default. Direct database access can be enabled by uncommenting theZabbix-DBdatasource.
This table shows which components consume each variable:
| Variable | Quadlets | Configs | Scripts | Runtime |
|---|---|---|---|---|
INFLUXDB_* (admin/org/bucket/token) |
β | β | β | β |
GRAFANA_ADMIN_* |
β | β | β | β |
GRAFANA_DOMAIN |
β | β | β | β |
GRAFANA_INSTALL_ZABBIX_PLUGIN |
β | β | β | β |
GRAFANA_ZABBIX_PLUGIN_ID |
β | β | β | β |
GRAFANA_ZABBIX_TRENDS_THRESHOLD_DAYS |
β | β | β | β |
ZABBIX_URL |
β | β | β | β |
ZABBIX_API_TOKEN |
β | β | β | β |
TLS_ENABLED |
β | β | β | β |
TLS_DIR |
β | β | β | β |
TLS_CERT_CN |
β | β | β | β |
TLS_CERT_SANS |
β | β | β | β |
TLS_CERT_VALIDITY_DAYS |
β | β | β | β |
TLS_KEY_SIZE |
β | β | β | β |
CONFIGURE_FIREWALL |
β | β | β | β |
GRAFANA_ADMIN_SUBNET |
β | β | β | β |
LIBRENMS_VM_IP |
β | β | β | β |
Legend:
- β = Used by this component
- β = Not used by this component
βββββββββββββββββββ
β .env.example β β Template with defaults and documentation
ββββββββββ¬βββββββββ
β User copies and customizes
β
βββββββββββββββββββ
β .env β β SINGLE SOURCE OF TRUTH (not in git)
ββββββββββ¬βββββββββ
β
ββββ scripts/install.sh (validates required vars)
β
ββββ quadlets/*.container (envsubst replaces ${VAR})
β
ββββ Grafana provisioning (${VAR} interpolation)
-
Never commit
.envto version control - it's in.gitignorefor security -
Generate strong tokens and passwords:
openssl rand -base64 32 # For INFLUXDB_TOKEN openssl rand -base64 24 # For passwords
-
Set restrictive permissions:
chmod 600 .env
-
Rotate credentials regularly - especially API tokens
-
Use API tokens, not passwords - for Zabbix integration
-
Validate
.envbefore deployment:diff .env .env.example # Check for missing variables
The Zabbix datasource is auto-provisioned using API token authentication.
Configure in .env:
ZABBIX_URL=http://zabbix.example.com/api_jsonrpc.php
ZABBIX_API_TOKEN=<your-api-token> # Generate in Zabbix UIGenerate Zabbix API Token:
- Login to Zabbix web interface as admin
- Navigate to: Administration β Users
- Select the user for Grafana integration (or create new user with read permissions)
- Go to "API tokens" tab
- Click "Create API token"
- Set description:
Grafana Integration - Set expiration: Leave empty for no expiration (or set as needed)
- Click "Add" and copy the generated token
- Paste token into
.envasZABBIX_API_TOKENvalue
Important Notes:
- β API token authentication is required (username/password not supported)
- β User must have read permissions to required host groups
- β Token never expires if expiration is not set
- β Trends threshold set to 7 days for optimal performance
Plugin: alexanderzobnin-zabbix-app (auto-installed)
LibreNMS pushes metrics to InfluxDB on this VM.
Configure in .env:
INFLUXDB_URL=http://grafana.example.com:8086 # Change to your VM hostname/IP
INFLUXDB_PORT=8086
INFLUXDB_ORG=observability
INFLUXDB_BUCKET=librenms
INFLUXDB_TOKEN=<generated-token>Configure LibreNMS:
-
Navigate to LibreNMS Settings β Plugins β InfluxDB
-
Enable InfluxDB export
-
Configure:
URL: http://<your-grafana-vm-ip>:8086 # Use INFLUXDB_URL value Organization: observability # Use INFLUXDB_ORG value Bucket: librenms # Use INFLUXDB_BUCKET value Token: <INFLUXDB_TOKEN from .env> # Copy from .env -
Test connection and save
Verify Connectivity:
# From LibreNMS VM
curl http://<grafana-vm-ip>:8086/health
# Expected: {"status":"pass","message":"ready for queries and writes"}Grafana Datasource: Auto-provisioned as InfluxDB-LibreNMS
Prometheus scrapes metrics from exporters.
Add Targets:
Edit /srv/obs/prometheus/config/prometheus.yml:
scrape_configs:
- job_name: 'node-exporter'
static_configs:
- targets: ['host.containers.internal:9100']
labels:
instance: 'grafana-vm'Reload configuration:
systemctl restart prometheus.serviceAlloy collects systemd journal logs automatically.
View logs in Grafana:
- Navigate to Explore
- Select "Loki" datasource
- Query example:
{unit="grafana.service"}
Run the comprehensive health check script:
sudo ./scripts/health-check.shChecks performed:
- β Systemd service status (all 6 services)
- β Container running state
- β Podman network connectivity
- β Bind mount directories and permissions
- β HTTP health endpoints (internal and external)
- β Disk usage warnings
- β Container resource usage
- β Configuration file presence
- β Port listening status
- β SELinux labels
- β Quadlet file presence
- β Firewall rules
- β Grafana datasources
- β InfluxDB initialization
- β Grafana plugins
- β Service log errors
Example output:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Observability Stack Health Check Report β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[INFO] Checking systemd services...
[β] Service obs-network-network is running
[β] Service influxdb is running
[β] Service prometheus is running
[β] Service loki is running
[β] Service alloy is running
[β] Service grafana is running
[INFO] Checking HTTP endpoints...
[β] Grafana endpoint is healthy (HTTP 200)
[β] Prometheus endpoint is healthy
[β] Loki endpoint is healthy
[β] InfluxDB endpoint is healthy
...
Overall Status: HEALTHY (59/60 checks passed)
The uninstall script supports multiple modes for safe and flexible removal.
Preview what will be removed without making any changes:
sudo ./scripts/uninstall.sh --dry-runSafe to run anytime. No confirmation required.
sudo ./scripts/uninstall.sh
# or explicitly:
sudo ./scripts/uninstall.sh --preserve-dataThis will:
- Stop and remove all containers
- Remove Quadlet configuration files
- Remove Podman network
- Remove firewall rules for ports 3000, 8086, and 9090 (if configured)
- Preserve data in
/srv/obs(Grafana dashboards, metrics, logs)
sudo ./scripts/uninstall.sh --remove-dataRemoves everything above plus:
- All data in
/srv/obs/ - SELinux file contexts
| Option | Short | Description |
|---|---|---|
--dry-run |
-d |
Preview changes without executing |
--preserve-data |
-p |
Keep data (default behavior) |
--remove-data |
-r |
Permanently delete all data |
--help |
-h |
Show usage information |
# Preview full uninstall with data removal
sudo ./scripts/uninstall.sh --dry-run --remove-data
# Quick reinstall (keep existing data)
sudo ./scripts/uninstall.sh && sudo ./scripts/install.sh
# Complete fresh start
sudo ./scripts/uninstall.sh --remove-data && sudo ./scripts/install.sh# Check service status
systemctl status grafana.service
# View full logs
journalctl -u grafana.service -n 100
# Check container logs
podman logs grafana# Verify SELinux labels
ls -lZ /srv/obs/
# Re-apply SELinux labels
sudo restorecon -Rv /srv/obs/Symptoms:
- Import fails when entering a Dashboard ID
- Plugin installation hangs or fails
- "Too many open files" or process limit errors in logs
Solutions:
-
Increase Grafana PID limit (default 2048 may be insufficient):
Edit
quadlets/grafana.containerand add under Resource limits:PidsLimit=4096Then re-run install or manually update
/etc/containers/systemd/grafana.containerand restart:sudo systemctl daemon-reload sudo systemctl restart grafana.service
-
Verify connectivity from container:
podman exec -it grafana curl -I https://grafana.comExpected:
HTTP/2 200 -
Check Grafana logs for errors:
journalctl -u grafana.service -n 100 --no-pager | grep -i "error\|fail"
# Check what's using port 3000
sudo ss -tulpn | grep 3000
# Stop conflicting service
sudo systemctl stop <conflicting-service># Verify network exists
podman network ls
# Inspect network
podman network inspect obs-net
# Recreate network
podman network rm obs-net
systemctl restart obs-network-network.serviceSymptoms:
- No LibreNMS data in Grafana
- InfluxDB health check fails from LibreNMS VM
- Connection timeouts from LibreNMS
Solutions:
-
Verify InfluxDB is running and healthy:
# On Grafana VM systemctl status influxdb.service curl http://localhost:8086/health # Expected: {"status":"pass","message":"ready for queries and writes"}
-
Test connectivity from LibreNMS VM:
# From LibreNMS VM (replace with your Grafana VM IP) curl http://<grafana-vm-ip>:8086/health # If timeout or connection refused, check firewall
-
Check firewall allows port 8086:
# On Grafana VM sudo firewall-cmd --list-ports # Add rule if missing (LibreNMS VM IP: x.x.x.x) sudo firewall-cmd --permanent \ --add-rich-rule='rule family="ipv4" source address="x.x.x.x/32" \ port port="8086" protocol="tcp" accept' sudo firewall-cmd --reload
-
Verify InfluxDB configuration in LibreNMS:
- URL must be:
http://<grafana-vm-ip>:8086 - Organization: Value from
INFLUXDB_ORGin.env - Bucket: Value from
INFLUXDB_BUCKETin.env - Token: Value from
INFLUXDB_TOKENin.env
- URL must be:
-
Check InfluxDB logs for errors:
journalctl -u influxdb.service -f
-
Test write access with curl:
# From LibreNMS VM (replace values) INFLUXDB_URL="http://<grafana-vm-ip>:8086" WRITE_URL="${INFLUXDB_URL}/api/v2/write?org=observability&bucket=librenms" curl -X POST "${WRITE_URL}" \ -H "Authorization: Token <your-influxdb-token>" \ -H "Content-Type: text/plain" \ --data-raw "test_metric value=1" # If successful, you should see HTTP 204
| Component | Log Command |
|---|---|
| Grafana | journalctl -u grafana.service -f |
| Prometheus | journalctl -u prometheus.service -f |
| Loki | journalctl -u loki.service -f |
| InfluxDB | journalctl -u influxdb.service -f |
| Alloy | journalctl -u alloy.service -f |
# List all containers
podman ps -a
# Inspect container
podman inspect grafana
# Check resource usage
podman stats
# Test HTTP endpoints
curl http://localhost:3000/api/health
curl http://localhost:9090/-/healthy
curl http://localhost:3100/ready
curl http://localhost:8086/healthSee docs/TUNING.md for detailed tuning and scaling guidance.
Prometheus (in Quadlet file):
--storage.tsdb.retention.time=365d
--storage.tsdb.retention.size=200GB
Loki (in loki.yaml):
limits_config:
retention_period: 8760h # 365 daysInfluxDB (in environment):
DOCKER_INFLUXDB_INIT_RETENTION=8760hEdit Quadlet files in /etc/containers/systemd/*.container:
[Container]
Memory=8G
MemorySwap=8G
CPUQuota=400% # 4 CPU cores
PidsLimit=4096 # Grafana: required for dashboard/plugin installs from grafana.comReload after changes:
systemctl daemon-reload
systemctl restart <service>.service# Check usage
df -h /srv/obs
# Prometheus - Clean old data manually if needed
podman exec prometheus promtool tsdb clean --timestamp=<unix-timestamp> /prometheus
# Loki - Compaction happens automatically
# Check compactor logs
journalctl -u loki.service | grep compactor- β SELinux enforcing mode enabled
- β
No secrets in git -
.envis gitignored - β Strong passwords - Minimum 16 characters
- β Token rotation - Regularly rotate InfluxDB tokens
- β Least privilege - Containers run as non-root where possible
- β Firewall - Ports 3000 and 8086 restricted to trusted sources
- Grafana admin credentials
- InfluxDB admin credentials and token
- Zabbix API credentials
- Optional database credentials
File permissions:
chmod 600 .env
chown root:root .envExternally Exposed Ports:
3000/tcp- Grafana UI (restricted via firewall toGRAFANA_ADMIN_SUBNET)8086/tcp- InfluxDB API (restricted via firewall toLIBRENMS_VM_IP/32)9090/tcp- Prometheus UI/API (optional, restricted via firewall toPROMETHEUS_ADMIN_SUBNET)
Internal-only Ports:
3100/tcp- Loki (bind to container network only)
Firewall Example:
# Allow Grafana from admin subnet
firewall-cmd --permanent \
--add-rich-rule='rule family="ipv4" source address="${GRAFANA_ADMIN_SUBNET}" \
port port="3000" protocol="tcp" accept'
# Allow InfluxDB from LibreNMS VM only
firewall-cmd --permanent \
--add-rich-rule='rule family="ipv4" source address="${LIBRENMS_VM_IP}/32" \
port port="8086" protocol="tcp" accept'
# Allow Prometheus from monitoring subnet (optional)
firewall-cmd --permanent \
--add-rich-rule='rule family="ipv4" \
source address="${PROMETHEUS_ADMIN_SUBNET}" \
port port="9090" protocol="tcp" accept'
firewall-cmd --reloadAll bind mounts use container_file_t:
semanage fcontext -a -t container_file_t "/srv/obs(/.*)?"
restorecon -Rv /srv/obs- docs/TUNING.md - Performance tuning and scaling
- docs/requirements.md - Complete requirements specification
- docs/ai/CONTEXT.md - AI engineering standards
- Podman Documentation
- Grafana Documentation
- Prometheus Documentation
- Loki Documentation
- Alloy Documentation
- InfluxDB Documentation
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Copyright 2026 Your Organization
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Contributions are welcome! Please ensure:
- All scripts follow bash standards (see
template/docs/ai/CONTEXT.md) - Run pre-commit hooks:
./scripts/run-precommit.sh - No secrets in commits
- Update documentation for new features
For issues, questions, or contributions:
- Open an issue on GitHub
- Review existing documentation
- Check troubleshooting section above