|
| 1 | +# Docker Monitoring Implementation |
| 2 | + |
| 3 | +This document describes the monitoring implementation that addresses the "Docker monitoring problem" as referenced in the DataDog blog post. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The monitoring system implements multi-level monitoring across three isolation levels: |
| 8 | + |
| 9 | +1. **Process Level** - Individual process monitoring within containers |
| 10 | +2. **Container Level** - Container-specific metrics and isolation monitoring |
| 11 | +3. **Host Level** - System-wide host metrics and resource monitoring |
| 12 | + |
| 13 | +## Architecture |
| 14 | + |
| 15 | +The monitoring addresses the gap between different isolation levels as described in the monitoring problem: |
| 16 | + |
| 17 | +| Aspect | Process | Container | Host | |
| 18 | +|--------|---------|-----------|------| |
| 19 | +| Spec | Source | Dockerfile | Kickstart | |
| 20 | +| On disk | .TEXT | /var/lib/docker | / | |
| 21 | +| In memory | PID | Container ID | Hostname | |
| 22 | +| In network | Socket | veth* | eth* | |
| 23 | +| Runtime context | server core | host | data center | |
| 24 | +| Isolation | moderate: memory space, etc. | private OS view: own PID space, file system, network interfaces | full: including own page caches and kernel | |
| 25 | + |
| 26 | +## Usage |
| 27 | + |
| 28 | +### Monitor Host Level |
| 29 | + |
| 30 | +```bash |
| 31 | +./basic-docker monitor host |
| 32 | +``` |
| 33 | + |
| 34 | +Shows system-wide metrics including: |
| 35 | +- Hostname and uptime |
| 36 | +- Memory usage and availability |
| 37 | +- CPU count and load average |
| 38 | +- Disk usage |
| 39 | +- Network interfaces (eth*) |
| 40 | +- All containers on the host |
| 41 | + |
| 42 | +### Monitor Process Level |
| 43 | + |
| 44 | +```bash |
| 45 | +./basic-docker monitor process <PID> |
| 46 | +``` |
| 47 | + |
| 48 | +Shows process-specific metrics including: |
| 49 | +- Process ID, name, and status |
| 50 | +- Memory usage (RSS and virtual) |
| 51 | +- CPU time and percentage |
| 52 | +- Thread count |
| 53 | +- Open file descriptors |
| 54 | +- Socket information |
| 55 | + |
| 56 | +### Monitor Container Level |
| 57 | + |
| 58 | +```bash |
| 59 | +./basic-docker monitor container <container-id> |
| 60 | +``` |
| 61 | + |
| 62 | +Shows container-specific metrics including: |
| 63 | +- Container ID, name, and status |
| 64 | +- Memory usage and limits |
| 65 | +- Network statistics (veth interfaces) |
| 66 | +- Process list within container |
| 67 | +- Namespace information |
| 68 | +- Docker storage path |
| 69 | + |
| 70 | +### Monitor All Levels |
| 71 | + |
| 72 | +```bash |
| 73 | +./basic-docker monitor all |
| 74 | +``` |
| 75 | + |
| 76 | +Aggregates metrics from all monitoring levels in a single JSON output. |
| 77 | + |
| 78 | +### Gap Analysis |
| 79 | + |
| 80 | +```bash |
| 81 | +./basic-docker monitor gap |
| 82 | +``` |
| 83 | + |
| 84 | +Analyzes monitoring gaps between isolation levels: |
| 85 | +- Process to container correlation gaps |
| 86 | +- Container to host visibility gaps |
| 87 | +- Cross-level monitoring challenges |
| 88 | + |
| 89 | +### Correlation Analysis |
| 90 | + |
| 91 | +```bash |
| 92 | +./basic-docker monitor correlation <container-id> |
| 93 | +``` |
| 94 | + |
| 95 | +Shows correlation between monitoring levels for a specific container, displaying the mapping table and detailed metrics. |
| 96 | + |
| 97 | +## Implementation Details |
| 98 | + |
| 99 | +### Monitors |
| 100 | + |
| 101 | +- `ProcessMonitor` - Reads from `/proc/[pid]/` files to gather process metrics |
| 102 | +- `ContainerMonitor` - Combines process monitoring with container metadata |
| 103 | +- `HostMonitor` - Aggregates system-wide statistics from `/proc/` and `/sys/` |
| 104 | + |
| 105 | +### Metrics Collection |
| 106 | + |
| 107 | +- **Process metrics**: Read from `/proc/[pid]/stat`, `/proc/[pid]/status`, and `/proc/[pid]/fd/` |
| 108 | +- **Container metrics**: Combine process metrics with container directory information |
| 109 | +- **Host metrics**: Read from `/proc/meminfo`, `/proc/loadavg`, `/proc/uptime`, and filesystem stats |
| 110 | + |
| 111 | +### Gap Analysis |
| 112 | + |
| 113 | +The monitoring system identifies three categories of gaps: |
| 114 | + |
| 115 | +1. **Process to Container**: PID mapping, namespace isolation visibility, resource limit enforcement |
| 116 | +2. **Container to Host**: Network isolation vs visibility, filesystem overlay access, resource allocation |
| 117 | +3. **Cross-Level**: Transaction tracing, performance correlation, security event correlation |
| 118 | + |
| 119 | +## Testing |
| 120 | + |
| 121 | +Run monitoring tests: |
| 122 | + |
| 123 | +```bash |
| 124 | +go test -v -run ".*Monitor.*" |
| 125 | +``` |
| 126 | + |
| 127 | +Run benchmarks: |
| 128 | + |
| 129 | +```bash |
| 130 | +go test -bench=BenchmarkMonitoring |
| 131 | +``` |
| 132 | + |
| 133 | +## References |
| 134 | + |
| 135 | +- [The Docker Monitoring Problem](https://www.datadoghq.com/blog/the-docker-monitoring-problem/) |
| 136 | +- Process isolation and namespace documentation |
| 137 | +- Container runtime specifications |
0 commit comments