Skip to content

Commit e125844

Browse files
committed
Style: Format code with ruff
1 parent 5dc43ad commit e125844

2 files changed

Lines changed: 281 additions & 1 deletion

File tree

Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
---
2+
name: Logfire Integration Research
3+
overview: Research findings on Pydantic Logfire capabilities and a comparison with the current abstract-validation-base tracking system, with recommendations for potential integration.
4+
todos: []
5+
---
6+
7+
# Pydantic Logfire Research: Feasibility for abstract-validation-base
8+
9+
## Executive Summary
10+
11+
Pydantic Logfire is a promising observability platform that could enhance or partially replace the current tracking system. However, there are important trade-offs to consider around local usage, data ownership, and the specific needs of this validation library.
12+
13+
---
14+
15+
## What is Pydantic Logfire?
16+
17+
Logfire is an observability platform built by the Pydantic team, designed for Python applications. Key characteristics:
18+
19+
- **Built on OpenTelemetry**: Uses open standards for traces, logs, and metrics
20+
- **Automatic Pydantic instrumentation**: One-line integration with `logfire.instrument_pydantic()`
21+
- **Structured logging**: Native support for logging Pydantic models
22+
- **SQL query interface**: Query your observability data using SQL
23+
24+
---
25+
26+
## Deployment Options
27+
28+
| Mode | Description | Cost | Data Location |
29+
30+
|------|-------------|------|---------------|
31+
32+
| **Cloud (Free Tier)** | Data sent to Pydantic's servers | Free with limits | Pydantic-hosted |
33+
34+
| **Cloud (Paid)** | Higher limits, more features | Paid | Pydantic-hosted |
35+
36+
| **Self-Hosted** | Run on your infrastructure | Enterprise pricing | Your infrastructure |
37+
38+
| **Console-Only** | Local development mode, no cloud | Free | Local terminal only |
39+
40+
**Local Development**: Logfire can run in console-only mode using:
41+
42+
```python
43+
logfire.configure(send_to_logfire=False, console=True)
44+
```
45+
46+
This outputs traces/logs to the terminal without sending data to any backend.
47+
48+
---
49+
50+
## Current Tracking System in abstract-validation-base
51+
52+
The project currently has a well-designed custom tracking system:
53+
54+
### 1. Process Log ([`process_log.py`](src/abstract_validation_base/process_log.py))
55+
56+
- `ProcessEntry`: Individual cleaning/error entries with timestamps, field names, values
57+
- `ProcessLog`: Aggregates cleaning operations and errors per model
58+
59+
### 2. Event System ([`events.py`](src/abstract_validation_base/events.py))
60+
61+
- `ValidationEventType`: ERROR_ADDED, CLEANING_ADDED, VALIDATION_STARTED/COMPLETED, ROW_PROCESSED, BATCH_STARTED/COMPLETED
62+
- `ValidationEvent`: Event payload with type, source, and data dict
63+
- `ObservableMixin`: Observer pattern for subscribing to events
64+
65+
### 3. Runner Statistics ([`runner.py`](src/abstract_validation_base/runner.py))
66+
67+
- `RunnerStats`: Tracks total/valid/failed rows, timing, error counts
68+
- Top errors analysis, failed sample collection
69+
- `audit_report()` for comprehensive summaries
70+
71+
### 4. Base Model Tracking ([`base.py`](src/abstract_validation_base/base.py))
72+
73+
- `ValidationBase`: Auto-tracks cleaning and errors on each model instance
74+
- `audit_log()` / `audit_log_recursive()` for exporting to DataFrames
75+
76+
---
77+
78+
## How Logfire Could Help
79+
80+
### What Logfire Would Handle Automatically
81+
82+
1. **Pydantic Validation Instrumentation**
83+
```python
84+
logfire.instrument_pydantic() # Logs all model validations
85+
```
86+
87+
88+
- Captures validation success/failure for all Pydantic models
89+
- No code changes needed in model definitions
90+
- Provides metrics: validation counts, durations, error rates
91+
92+
2. **Structured Logging with Pydantic Models**
93+
```python
94+
logfire.info("Validation error", model=my_model, error=error_details)
95+
```
96+
97+
98+
- Native support for logging Pydantic models
99+
- Automatic serialization and indexing
100+
101+
3. **System Metrics** (optional)
102+
```python
103+
logfire.instrument_system_metrics() # CPU, memory usage
104+
```
105+
106+
4. **Tracing**
107+
108+
- Automatic correlation of events across a validation run
109+
- Span-based tracking for timing analysis
110+
111+
### What Would Still Need Custom Implementation
112+
113+
1. **Cleaning Operation Tracking**: Logfire doesn't have a concept of "data transformations" - the `add_cleaning_process()` tracking is unique to this library and would need custom spans/logs
114+
115+
2. **Error Patterns Aggregation**: The `top_errors()` functionality in `RunnerStats` would need to be implemented as Logfire queries
116+
117+
3. **Audit Log Export**: The `audit_log()` / `audit_log_recursive()` methods for DataFrame export would remain as-is (Logfire is for observability, not data export)
118+
119+
4. **Per-Model Process Logs**: The attached `ProcessLog` on each model instance is useful for downstream processing - Logfire's logs are separate from the data
120+
121+
---
122+
123+
## Comparison: Current System vs Logfire
124+
125+
| Feature | Current System | With Logfire |
126+
127+
|---------|---------------|--------------|
128+
129+
| **Validation error tracking** | Manual via `add_error()` | Automatic instrumentation |
130+
131+
| **Cleaning/transformation logs** | `add_cleaning_process()` | Custom spans needed |
132+
133+
| **Event observation** | `ObservableMixin` pattern | Built-in with traces |
134+
135+
| **Statistics/metrics** | `RunnerStats` class | Dashboard + SQL queries |
136+
137+
| **Data export** | `audit_log()` to DataFrame | SQL API or manual export |
138+
139+
| **Local-only mode** | Yes (default) | Console mode available |
140+
141+
| **Dashboard/UI** | Rich observers | Logfire web UI (cloud) |
142+
143+
| **Dependencies** | None (pure Pydantic) | `logfire` package |
144+
145+
| **Data ownership** | Local | Local (console) or cloud |
146+
147+
---
148+
149+
## Recommendations
150+
151+
### Option A: Logfire as Optional Enhancement (Recommended)
152+
153+
Add Logfire as an **optional integration** rather than replacing the current system:
154+
155+
```python
156+
# Optional logfire integration
157+
from abstract_validation_base import ValidationRunner
158+
159+
runner = ValidationRunner(data, MyModel)
160+
161+
# If user has logfire configured, emit spans
162+
if logfire_available:
163+
runner.add_observer(LogfireObserver())
164+
```
165+
166+
**Pros**:
167+
168+
- Users get full observability if they want it
169+
- No breaking changes to existing API
170+
- Library works standalone without cloud dependency
171+
- Best of both worlds
172+
173+
**Implementation**:
174+
175+
1. Add `logfire` as optional dependency: `pip install abstract-validation-base[logfire]`
176+
2. Create `LogfireObserver` that implements `ValidationObserver` protocol
177+
3. Emit Logfire spans for validation events
178+
4. Add `logfire.instrument_pydantic()` call in observer setup
179+
180+
### Option B: Replace Event System with Logfire
181+
182+
Replace `ObservableMixin` and `ValidationEvent` with native Logfire spans:
183+
184+
**Pros**:
185+
186+
- Simpler codebase, fewer abstractions
187+
- Industry-standard OpenTelemetry format
188+
189+
**Cons**:
190+
191+
- Requires Logfire for full functionality
192+
- Breaking change for existing users
193+
- Loss of standalone operation
194+
195+
### Option C: Keep Current System (Status Quo)
196+
197+
The existing system is well-designed and meets the library's needs:
198+
199+
**Pros**:
200+
201+
- No dependencies, works offline
202+
- Purpose-built for validation workflows
203+
- Full control over data format and storage
204+
205+
**Cons**:
206+
207+
- No built-in dashboard/UI (though Rich observers help)
208+
- Manual instrumentation required
209+
210+
---
211+
212+
## Implementation Sketch for Option A
213+
214+
```mermaid
215+
flowchart LR
216+
subgraph current [Current System]
217+
VB[ValidationBase]
218+
OBS[ObservableMixin]
219+
VR[ValidationRunner]
220+
RICH[RichDashboardObserver]
221+
end
222+
223+
subgraph logfire_integration [Optional Logfire]
224+
LFO[LogfireObserver]
225+
SPANS[Logfire Spans]
226+
CLOUD[Logfire Cloud/Console]
227+
end
228+
229+
VB --> OBS
230+
VR --> OBS
231+
OBS --> RICH
232+
OBS -.-> LFO
233+
LFO --> SPANS
234+
SPANS --> CLOUD
235+
```
236+
237+
Example `LogfireObserver` implementation:
238+
239+
```python
240+
# src/abstract_validation_base/logfire_support.py
241+
class LogfireObserver:
242+
def on_event(self, event: ValidationEvent) -> None:
243+
import logfire
244+
245+
if event.event_type == ValidationEventType.VALIDATION_STARTED:
246+
logfire.info("Validation started", **event.data)
247+
elif event.event_type == ValidationEventType.ERROR_ADDED:
248+
logfire.warn("Validation error", **event.data)
249+
elif event.event_type == ValidationEventType.CLEANING_ADDED:
250+
logfire.info("Data cleaned", **event.data)
251+
# ... etc
252+
```
253+
254+
---
255+
256+
## Next Steps
257+
258+
If you want to proceed with integration:
259+
260+
1. **Add optional dependency** in `pyproject.toml`:
261+
```toml
262+
[project.optional-dependencies]
263+
logfire = ["logfire>=2.0"]
264+
```
265+
266+
2. **Create logfire_support.py** with `LogfireObserver`
267+
268+
3. **Add integration tests** that verify Logfire spans are emitted
269+
270+
4. **Document usage** in README and AGENTS.md
271+
272+
---
273+
274+
## Key Takeaways
275+
276+
1. **Logfire CAN work locally** via `send_to_logfire=False, console=True`
277+
2. **Self-hosting is enterprise-only** (not free for local deployment)
278+
3. **Current tracking system is solid** - Logfire would enhance, not replace
279+
4. **Best approach**: Optional integration via observer pattern
280+
5. **No architectural changes needed** - observer pattern already supports this

tests/test_runner.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ def test_error_summary_uses_type_when_msg_missing(self) -> None:
110110

111111
def test_error_summary_handles_empty_loc_tuple(self) -> None:
112112
"""Test error_summary handles Pydantic errors with empty loc tuple.
113-
113+
114114
This can occur with model_validator failures on nested models where
115115
Pydantic may return an empty location tuple.
116116
"""

0 commit comments

Comments
 (0)