Skip to content

Commit e6877ea

Browse files
Copilotj143
andcommitted
Implement Docker monitoring system addressing isolation level gaps
Co-authored-by: j143 <53068787+j143@users.noreply.github.com>
1 parent d099dee commit e6877ea

4 files changed

Lines changed: 1237 additions & 0 deletions

File tree

MONITORING.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Docker Monitoring Implementation
2+
3+
This document describes the monitoring implementation that addresses the "Docker monitoring problem" as referenced in the DataDog blog post.
4+
5+
## Overview
6+
7+
The monitoring system implements multi-level monitoring across three isolation levels:
8+
9+
1. **Process Level** - Individual process monitoring within containers
10+
2. **Container Level** - Container-specific metrics and isolation monitoring
11+
3. **Host Level** - System-wide host metrics and resource monitoring
12+
13+
## Architecture
14+
15+
The monitoring addresses the gap between different isolation levels as described in the monitoring problem:
16+
17+
| Aspect | Process | Container | Host |
18+
|--------|---------|-----------|------|
19+
| Spec | Source | Dockerfile | Kickstart |
20+
| On disk | .TEXT | /var/lib/docker | / |
21+
| In memory | PID | Container ID | Hostname |
22+
| In network | Socket | veth* | eth* |
23+
| Runtime context | server core | host | data center |
24+
| Isolation | moderate: memory space, etc. | private OS view: own PID space, file system, network interfaces | full: including own page caches and kernel |
25+
26+
## Usage
27+
28+
### Monitor Host Level
29+
30+
```bash
31+
./basic-docker monitor host
32+
```
33+
34+
Shows system-wide metrics including:
35+
- Hostname and uptime
36+
- Memory usage and availability
37+
- CPU count and load average
38+
- Disk usage
39+
- Network interfaces (eth*)
40+
- All containers on the host
41+
42+
### Monitor Process Level
43+
44+
```bash
45+
./basic-docker monitor process <PID>
46+
```
47+
48+
Shows process-specific metrics including:
49+
- Process ID, name, and status
50+
- Memory usage (RSS and virtual)
51+
- CPU time and percentage
52+
- Thread count
53+
- Open file descriptors
54+
- Socket information
55+
56+
### Monitor Container Level
57+
58+
```bash
59+
./basic-docker monitor container <container-id>
60+
```
61+
62+
Shows container-specific metrics including:
63+
- Container ID, name, and status
64+
- Memory usage and limits
65+
- Network statistics (veth interfaces)
66+
- Process list within container
67+
- Namespace information
68+
- Docker storage path
69+
70+
### Monitor All Levels
71+
72+
```bash
73+
./basic-docker monitor all
74+
```
75+
76+
Aggregates metrics from all monitoring levels in a single JSON output.
77+
78+
### Gap Analysis
79+
80+
```bash
81+
./basic-docker monitor gap
82+
```
83+
84+
Analyzes monitoring gaps between isolation levels:
85+
- Process to container correlation gaps
86+
- Container to host visibility gaps
87+
- Cross-level monitoring challenges
88+
89+
### Correlation Analysis
90+
91+
```bash
92+
./basic-docker monitor correlation <container-id>
93+
```
94+
95+
Shows correlation between monitoring levels for a specific container, displaying the mapping table and detailed metrics.
96+
97+
## Implementation Details
98+
99+
### Monitors
100+
101+
- `ProcessMonitor` - Reads from `/proc/[pid]/` files to gather process metrics
102+
- `ContainerMonitor` - Combines process monitoring with container metadata
103+
- `HostMonitor` - Aggregates system-wide statistics from `/proc/` and `/sys/`
104+
105+
### Metrics Collection
106+
107+
- **Process metrics**: Read from `/proc/[pid]/stat`, `/proc/[pid]/status`, and `/proc/[pid]/fd/`
108+
- **Container metrics**: Combine process metrics with container directory information
109+
- **Host metrics**: Read from `/proc/meminfo`, `/proc/loadavg`, `/proc/uptime`, and filesystem stats
110+
111+
### Gap Analysis
112+
113+
The monitoring system identifies three categories of gaps:
114+
115+
1. **Process to Container**: PID mapping, namespace isolation visibility, resource limit enforcement
116+
2. **Container to Host**: Network isolation vs visibility, filesystem overlay access, resource allocation
117+
3. **Cross-Level**: Transaction tracing, performance correlation, security event correlation
118+
119+
## Testing
120+
121+
Run monitoring tests:
122+
123+
```bash
124+
go test -v -run ".*Monitor.*"
125+
```
126+
127+
Run benchmarks:
128+
129+
```bash
130+
go test -bench=BenchmarkMonitoring
131+
```
132+
133+
## References
134+
135+
- [The Docker Monitoring Problem](https://www.datadoghq.com/blog/the-docker-monitoring-problem/)
136+
- Process isolation and namespace documentation
137+
- Container runtime specifications

0 commit comments

Comments
 (0)