diff --git a/docs/implementation/PHASE_3_ALERT_BASELINE.md b/docs/implementation/PHASE_3_ALERT_BASELINE.md
new file mode 100644
index 0000000..84daff1
--- /dev/null
+++ b/docs/implementation/PHASE_3_ALERT_BASELINE.md
@@ -0,0 +1,33 @@
+# Phase 3 Alert Baseline
+
+This baseline defines starter alerts for runtime reliability. Thresholds should be tuned with production traffic history.
+
+## SLO-aligned Starter Alerts
+
+1. Error rate alert
+- Condition: 5xx rate > 1% over 5 minutes
+- Signal: `tinyurl.http.server.requests.total` with `status_class=5xx`
+- Action: check recent deploy, downstream DB state, and app logs by `correlation_id`
+
+2. Latency alert
+- Condition: P99 latency > 500 ms over 10 minutes
+- Signal: `http.server.requests` percentile metrics from Actuator/Prometheus
+- Action: inspect slow endpoints, DB pool pressure, and host CPU/memory
+
+3. Readiness degradation
+- Condition: `/actuator/health/readiness` not `UP` for 2+ checks
+- Signal: readiness endpoint health status
+- Action: inspect DB connectivity and Flyway startup validation state
+
+4. Liveness instability
+- Condition: frequent restarts or liveness failures in 10 minutes
+- Signal: container restart count + `/actuator/health/liveness`
+- Action: inspect fatal exceptions, memory pressure, and image/runtime mismatches
+
+## Triage Flow
+
+1. Confirm user impact from error and latency graphs.
+2. Filter logs by `correlation_id` and endpoint path.
+3. Validate dependency health (`db`, readiness group).
+4. Roll back recent deploy if regression is confirmed.
+5. Add post-incident action item with metric threshold adjustment if needed.
diff --git a/docs/implementation/PHASE_3_IMPLEMENTATION_EXPLANATION.md b/docs/implementation/PHASE_3_IMPLEMENTATION_EXPLANATION.md
new file mode 100644
index 0000000..655dc9d
--- /dev/null
+++ b/docs/implementation/PHASE_3_IMPLEMENTATION_EXPLANATION.md
@@ -0,0 +1,149 @@
+# Phase 3 Implementation Explanation
+
+This document explains what was implemented in Phase 3 (Observability), why it was added, and how to verify it before committing.
+
+## Goal of Phase 3
+
+Add production-grade observability for:
+
+1. Structured logging with correlation id
+2. Request/error metrics and latency visibility
+3. Readiness and liveness health visibility
+4. Basic alerting and log-aggregation operational guidance
+
+## What Was Implemented
+
+### 1) Structured JSON logging
+
+Files:
+
+- tinyurl/src/main/resources/logback-spring.xml
+- tinyurl/src/main/java/com/tinyurl/observability/RequestObservabilityFilter.java
+
+What changed:
+
+- Logging output is now JSON using logstash-logback-encoder.
+- A correlation id is propagated through X-Correlation-Id.
+- Correlation id is added to MDC as correlation_id.
+- Request metadata is logged (method, route, status, duration, client_ip, user_agent).
+
+Why:
+
+- Makes logs machine-parseable and searchable.
+- Enables request-level traceability across services and failures.
+
+### 2) Metrics instrumentation
+
+Files:
+
+- tinyurl/build.gradle.kts
+- tinyurl/src/main/java/com/tinyurl/observability/RequestObservabilityFilter.java
+- tinyurl/src/main/java/com/tinyurl/controller/GlobalExceptionHandler.java
+- tinyurl/src/main/resources/application.yaml
+
+What changed:
+
+- Added Prometheus registry dependency.
+- Added request counter metric:
+  - tinyurl.http.server.requests.total
+- Added request duration metric:
+  - tinyurl.http.server.request.duration
+- Added explicit error counter metric in exception handlers:
+  - tinyurl.http.server.errors.total
+- Enabled actuator metrics and prometheus endpoints.
+- Enabled percentiles/histogram for http.server.requests.
+- Added common metric tag: application=tinyurl.
+
+Why:
+
+- Supports request rate, latency, and error-rate monitoring.
+- Enables baseline alert conditions (error and p99 latency).
+
+### 3) Health model hardening
+
+File:
+
+- tinyurl/src/main/resources/application.yaml
+
+What changed:
+
+- Enabled component-level health details and components.
+- Defined explicit health groups:
+  - readiness: readinessState, db, diskSpace, ping
+  - liveness: livenessState, ping
+- Exposed endpoints:
+  - /actuator/health
+  - /actuator/metrics
+  - /actuator/prometheus
+
+Why:
+
+- Improves dependency-aware readiness behavior.
+- Makes startup/degraded dependency states visible and actionable.
+
+### 4) Operational documentation for alerting and logs
+
+Files:
+
+- docs/implementation/PHASE_3_ALERT_BASELINE.md
+- docs/implementation/PHASE_3_LOG_AGGREGATION_BASELINE.md
+- docs/implementation/PHASE_3_OBSERVABILITY.md
+
+What changed:
+
+- Added starter alert thresholds and triage flow.
+- Added baseline log aggregation architecture and checklist.
+- Linked both documents in Phase 3 observability execution guide.
+
+Why:
+
+- Ensures implementation is operable, not only code-complete.
+- Gives clear next actions for production rollout.
+
+## Verification Steps Used
+
+1. Unit/integration tests:
+
+- run tests for app context, service logic, and encoder tests
+- expected result: all passing
+
+2. Runtime verification:
+
+- docker compose up -d --build
+- check readiness endpoint
+- check liveness endpoint
+- check prometheus endpoint exports:
+  - tinyurl_http_server_requests_total
+  - tinyurl_http_server_errors_total
+  - hikaricp metrics
+
+3. Error metric trigger check:
+
+- send a known invalid request (for example unknown short code format)
+- confirm tinyurl_http_server_errors_total increments with tags
+
+## Commit Scope (Phase 3)
+
+Code/config files:
+
+- tinyurl/build.gradle.kts
+- tinyurl/src/main/resources/application.yaml
+- tinyurl/src/main/resources/logback-spring.xml
+- tinyurl/src/main/java/com/tinyurl/observability/RequestObservabilityFilter.java
+- tinyurl/src/main/java/com/tinyurl/controller/GlobalExceptionHandler.java
+
+Docs:
+
+- docs/implementation/PHASE_3_OBSERVABILITY.md
+- docs/implementation/PHASE_3_ALERT_BASELINE.md
+- docs/implementation/PHASE_3_LOG_AGGREGATION_BASELINE.md
+- docs/implementation/PHASE_3_IMPLEMENTATION_EXPLANATION.md
+
+## Suggested Commit Message
+
+feat(observability): implement phase 3 logging, metrics, health groups, and operational baselines
+
+## Notes
+
+- This implementation intentionally stops at Phase 3 baseline level.
+- Full external dashboard platform rollout and distributed tracing are still out of scope for this phase.
diff --git a/docs/implementation/PHASE_3_LOG_AGGREGATION_BASELINE.md b/docs/implementation/PHASE_3_LOG_AGGREGATION_BASELINE.md
new file mode 100644
index 0000000..d94f761
--- /dev/null
+++ b/docs/implementation/PHASE_3_LOG_AGGREGATION_BASELINE.md
@@ -0,0 +1,63 @@
+# Phase 3 Log Aggregation Baseline
+
+This document defines a minimal production-ready approach for collecting and querying structured application logs.
+
+## Goal
+
+Ensure logs from all services can be searched by time range, severity, endpoint, and correlation id.
+
+## Required Log Fields
+
+All application logs should include at least:
+
+- `@timestamp`
+- `level`
+- `message`
+- `service`
+- `correlation_id`
+- `logger_name`
+- request metadata fields when available (`method`, `route`, `status`, `duration_ms`)
+
+## Recommended Pipeline (Baseline)
+
+1. App emits JSON logs to stdout.
+2. Container runtime captures stdout/stderr.
+3. Log shipper (CloudWatch Agent, Fluent Bit, Filebeat, or Vector) forwards logs.
+4. Central store indexes logs (CloudWatch Logs, ELK/OpenSearch, Grafana Loki).
+5. Dashboards and alerts query centralized logs.
+
+## Minimum Alerts for Logs
+
+1. Error volume spike
+- Trigger when ERROR logs exceed baseline over 5 minutes.
+
+2. Correlation-id missing rate
+- Trigger when logs without `correlation_id` exceed 1%.
+
+3. Exception signature surge
+- Trigger on sudden spikes for repeated exception signatures.
+
+## Triage Query Examples
+
+1. Correlate one failing request
+- Filter by `correlation_id` and inspect all matching events.
+
+2. Endpoint failure analysis
+- Filter by `route` and `status>=500`, group by exception/error code.
+
+3. Slow request analysis
+- Filter by high `duration_ms`, group by route and time window.
+
+## Security and Privacy Notes
+
+- Never log credentials, tokens, or full secrets.
+- Mask sensitive fields before logging.
+- Keep retention and access controls aligned with security policy.
+
+## Rollout Checklist
+
+- [ ] JSON logs enabled in all environments
+- [ ] Log shipping configured for app containers
+- [ ] Correlation id searchable in central logs
+- [ ] Error and latency dashboards created
+- [ ] Alert rules validated with test events
diff --git a/docs/implementation/PHASE_3_LOG_STORAGE_MIGRATION_LOKI_PROMTAIL_GRAFANA.md b/docs/implementation/PHASE_3_LOG_STORAGE_MIGRATION_LOKI_PROMTAIL_GRAFANA.md
new file mode 100644
index 0000000..2eeec5d
--- /dev/null
+++ b/docs/implementation/PHASE_3_LOG_STORAGE_MIGRATION_LOKI_PROMTAIL_GRAFANA.md
@@ -0,0 +1,230 @@
+# Phase 3 Log Storage Migration Plan (Free): Loki + Promtail + Grafana
+
+This document provides a production-oriented migration plan for storing logs securely at zero license cost using the recommended stack:
+
+- Grafana Loki (log store)
+- Promtail (log shipper)
+- Grafana (query, dashboards, alerting)
+
+## 1) Why this stack
+
+### Free options considered
+
+1. Loki + Promtail + Grafana (recommended)
+- Pros: low-cost indexing model, simple operations, good Docker support, strong Grafana integration
+- Cons: log query language is different from Elasticsearch ecosystem
+
+2. OpenSearch + Fluent Bit/Filebeat
+- Pros: powerful full-text search, mature ecosystem
+- Cons: heavier memory/storage footprint and higher ops complexity
+
+3. ELK (Elasticsearch + Logstash + Kibana)
+- Pros: mature and widely known
+- Cons: most resource-heavy for small teams
+
+### Recommendation
+
+Use Loki + Promtail + Grafana for v1/v2 scale because it is easiest to operate and secure on a single-host or small-cluster deployment.
+
+---
+
+## 2) Target architecture (production)
+
+1. TinyURL app writes structured JSON logs to stdout.
+2. Container runtime writes stdout/stderr to local log files.
+3. Promtail tails container logs and ships to Loki over private network.
+4. Loki stores log streams on encrypted disk.
+5. Grafana queries Loki and provides dashboards/alerts.
+
+Logical flow:
+
+`App -> stdout -> Promtail -> Loki -> Grafana`
+
+---
+
+## 3) Security controls (minimum baseline)
+
+1. Network isolation
+- Run Loki and Promtail on private network only.
+- Do not expose Loki directly to public internet.
+
+2. Transport security
+- Use TLS for Grafana access.
+- If Loki is remote, use TLS/mTLS between Promtail and Loki.
+
+3. Authentication and authorization
+- Enable Grafana login (no anonymous access in production).
+- Use strong admin password and role-based access.
+- Restrict datasource edit rights to admins only.
+
+4. Data at rest
+- Store Loki data on encrypted volume.
+- Restrict filesystem permissions for log directories.
+
+5. Retention and lifecycle
+- Set finite retention (for example 14-30 days to start).
+- Enforce deletion and compaction to control risk/cost.
+
+6. Sensitive data handling
+- Never log secrets, tokens, or passwords.
+- Use Promtail pipeline stages to drop/mask sensitive patterns if needed.
+
+---
+
+## 4) Migration strategy (phased)
+
+### Phase A: Prepare
+
+1. Confirm JSON logging is enabled in application.
+2. Define required labels: `service`, `env`, `level`, `correlation_id`.
+3. Define retention target and incident triage queries.
+
+### Phase B: Deploy logging stack
+
+1. Deploy Loki with persistent encrypted storage.
+2. Deploy Promtail on same host(s) as app containers.
+3. Deploy Grafana and add Loki datasource.
+
+### Phase C: Connect TinyURL logs
+
+1. Configure Promtail scrape job for container logs.
+2. Parse JSON fields from app log lines.
+3. Map key labels (`service=tinyurl`, `env=prod`, `level`, optional `route`).
+
+### Phase D: Validate
+
+1. Generate synthetic requests and errors.
+2. Search by `correlation_id` end-to-end.
+3. Verify alert rules trigger for error spikes.
+
+### Phase E: Harden
+
+1. Enable TLS and auth on Grafana endpoint.
+2. Restrict network ingress to admin CIDRs/VPN.
+3. Tune retention and label cardinality.
+
+---
+
+## 5) Example production Compose skeleton
+
+Use this as a conceptual baseline and adapt to your deployment model.
+
+```yaml
+services:
+  loki:
+    image: grafana/loki:3.0.0
+    command: -config.file=/etc/loki/config.yaml
+    volumes:
+      - ./observability/loki/config.yaml:/etc/loki/config.yaml:ro
+      - loki-data:/loki
+    networks: [observability]
+    restart: unless-stopped
+
+  promtail:
+    image: grafana/promtail:3.0.0
+    command: -config.file=/etc/promtail/config.yaml
+    volumes:
+      - ./observability/promtail/config.yaml:/etc/promtail/config.yaml:ro
+      - /var/lib/docker/containers:/var/lib/docker/containers:ro
+      - /var/run/docker.sock:/var/run/docker.sock:ro
+    networks: [observability]
+    restart: unless-stopped
+
+  grafana:
+    image: grafana/grafana:11.0.0
+    environment:
+      - GF_SECURITY_ADMIN_USER=admin
+      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD}
+      - GF_AUTH_ANONYMOUS_ENABLED=false
+    volumes:
+      - grafana-data:/var/lib/grafana
+    ports:
+      - "3000:3000"
+    networks: [observability]
+    restart: unless-stopped
+
+volumes:
+  loki-data:
+  grafana-data:
+
+networks:
+  observability:
+    internal: true
+```
+
+Notes:
+
+- Prefer not to publish Loki port publicly.
+- Publish Grafana through secure reverse proxy with TLS.
+
+---
+
+## 6) Promtail parsing recommendations
+
+1. Use docker/container scrape configs.
+2. Apply JSON pipeline stage to extract:
+- `level`
+- `correlation_id`
+- `logger_name`
+- `message`
+
+3. Keep label cardinality low:
+- Good labels: service, env, level
+- Avoid high-cardinality labels: user_id, request_id as labels
+- Keep high-cardinality fields in log body, not labels
+
+---
+
+## 7) Dashboard and alert baseline
+
+Create Grafana panels for:
+
+1. Error volume by level and route
+2. Top exception signatures
+3. Missing-correlation-id count
+4. Slow-request logs (`duration_ms` threshold)
+
+Starter alerts:
+
+1. Error log rate spike in 5 minutes
+2. Repeated exception signature spike
+3. Promtail ingestion failures
+4. Loki disk usage threshold breach
+
+---
+
+## 8) Operational runbook (minimum)
+
+1. Incident lookup flow
+- Start with alert time window
+- Filter `service=tinyurl`
+- Pivot by `correlation_id`
+- Correlate with metrics and health endpoints
+
+2. Capacity checks
+- Loki disk growth/day
+- Query latency in Grafana Explore
+- Promtail backlog/retry behavior
+
+3. Backup/restore
+- Backup Loki persistent volume snapshots
+- Test restore at least once per quarter
+
+---
+
+## 9) Acceptance criteria for migration completion
+
+1. Logs searchable in Grafana for all app instances.
+2. Correlation-id trace works for one request across full flow.
+3. Security controls enabled (auth, TLS, network restrictions).
+4. Retention policy active and verified.
+5. At least three log-based alerts configured and tested.
+
+---
+
+## 10) Future upgrades (optional)
+
+1. Move from Promtail to Grafana Alloy when standardizing telemetry agents.
+2. Add long-term object storage backend for Loki if retention grows.
+3. Add SSO for Grafana access control.
+4. Add trace correlation once distributed tracing is introduced.
diff --git a/docs/implementation/PHASE_3_OBSERVABILITY.md b/docs/implementation/PHASE_3_OBSERVABILITY.md
index 911ff5e..eb62e1f 100644
--- a/docs/implementation/PHASE_3_OBSERVABILITY.md
+++ b/docs/implementation/PHASE_3_OBSERVABILITY.md
@@ -75,6 +75,12 @@ Tasks:
 - Define target alert thresholds (error and latency)
 - Document triage paths for common failures
 
+Reference baseline:
+
+- [PHASE_3_ALERT_BASELINE.md](PHASE_3_ALERT_BASELINE.md)
+- [PHASE_3_LOG_AGGREGATION_BASELINE.md](PHASE_3_LOG_AGGREGATION_BASELINE.md)
+- [PHASE_3_LOG_STORAGE_MIGRATION_LOKI_PROMTAIL_GRAFANA.md](PHASE_3_LOG_STORAGE_MIGRATION_LOKI_PROMTAIL_GRAFANA.md)
+
 ## Deliverables
 
 - Structured logs enabled and validated
diff --git a/tinyurl/build.gradle.kts b/tinyurl/build.gradle.kts
index d0be12f..6fd08ae 100644
--- a/tinyurl/build.gradle.kts
+++ b/tinyurl/build.gradle.kts
@@ -31,6 +31,8 @@ dependencies {
 	implementation("org.springframework.boot:spring-boot-starter-web")
 	implementation("org.flywaydb:flyway-core")
 	implementation("org.flywaydb:flyway-database-postgresql")
+	implementation("io.micrometer:micrometer-registry-prometheus")
+	implementation("net.logstash.logback:logstash-logback-encoder:7.4")
 	compileOnly("org.projectlombok:lombok")
 	developmentOnly("org.springframework.boot:spring-boot-devtools")
 	runtimeOnly("org.postgresql:postgresql")
diff --git a/tinyurl/src/main/java/com/tinyurl/controller/GlobalExceptionHandler.java b/tinyurl/src/main/java/com/tinyurl/controller/GlobalExceptionHandler.java
index 075d301..4c90c97 100644
--- a/tinyurl/src/main/java/com/tinyurl/controller/GlobalExceptionHandler.java
+++ b/tinyurl/src/main/java/com/tinyurl/controller/GlobalExceptionHandler.java
@@ -3,6 +3,8 @@
 import com.tinyurl.dto.ErrorResponse;
 import com.tinyurl.exception.GoneException;
 import com.tinyurl.exception.NotFoundException;
+import io.micrometer.core.instrument.Counter;
+import io.micrometer.core.instrument.MeterRegistry;
 import jakarta.validation.ConstraintViolationException;
 import jakarta.persistence.PersistenceException;
 import org.springframework.dao.DataAccessException;
@@ -17,6 +19,12 @@
 @RestControllerAdvice
 public class GlobalExceptionHandler {
 
+    private final MeterRegistry meterRegistry;
+
+    public GlobalExceptionHandler(MeterRegistry meterRegistry) {
+        this.meterRegistry = meterRegistry;
+    }
+
     @ExceptionHandler(MethodArgumentNotValidException.class)
     public ResponseEntity<ErrorResponse> handleValidation(MethodArgumentNotValidException ex) {
         String code = "INVALID_REQUEST";
@@ -24,11 +32,13 @@ public ResponseEntity<ErrorResponse> handleValidation(MethodArgumentNotValidExce
         if (fieldError != null && fieldError.getDefaultMessage() != null) {
             code = fieldError.getDefaultMessage();
         }
+        incrementErrorMetric(HttpStatus.BAD_REQUEST, code);
         return ResponseEntity.badRequest().body(new ErrorResponse(code, messageForCode(code)));
     }
 
     @ExceptionHandler(ConstraintViolationException.class)
     public ResponseEntity<ErrorResponse> handleConstraintViolation(ConstraintViolationException ex) {
+        incrementErrorMetric(HttpStatus.BAD_REQUEST, "INVALID_URL");
         return ResponseEntity.badRequest()
             .body(new ErrorResponse("INVALID_URL", messageForCode("INVALID_URL")));
     }
@@ -39,6 +49,7 @@ public ResponseEntity<ErrorResponse> handleIllegalArgument(IllegalArgumentExcept
         HttpStatus status = "INVALID_EXPIRY".equals(code) || "INVALID_URL".equals(code)
             ? HttpStatus.BAD_REQUEST
             : HttpStatus.INTERNAL_SERVER_ERROR;
+        incrementErrorMetric(status, code);
         return ResponseEntity.status(status).body(new ErrorResponse(code, messageForCode(code)));
     }
 
@@ -46,6 +57,7 @@ public ResponseEntity<ErrorResponse> handleIllegalArgument(IllegalArgumentExcept
     public ResponseEntity<ErrorResponse> handleServiceUnavailable(Exception ex) {
         HttpHeaders headers = new HttpHeaders();
         headers.add("Retry-After", "30");
+        incrementErrorMetric(HttpStatus.SERVICE_UNAVAILABLE, "SERVICE_UNAVAILABLE");
         return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
             .headers(headers)
             .body(new ErrorResponse("SERVICE_UNAVAILABLE", "The service is temporarily unavailable. Please try again."));
@@ -53,22 +65,46 @@ public ResponseEntity<ErrorResponse> handleServiceUnavailable(Exception ex) {
 
     @ExceptionHandler(NotFoundException.class)
     public ResponseEntity<ErrorResponse> handleNotFound(NotFoundException ex) {
+        incrementErrorMetric(HttpStatus.NOT_FOUND, "NOT_FOUND");
         return ResponseEntity.status(HttpStatus.NOT_FOUND)
             .body(new ErrorResponse("NOT_FOUND", ex.getMessage()));
     }
 
     @ExceptionHandler(GoneException.class)
     public ResponseEntity<ErrorResponse> handleGone(GoneException ex) {
+        incrementErrorMetric(HttpStatus.GONE, "GONE");
         return ResponseEntity.status(HttpStatus.GONE)
             .body(new ErrorResponse("GONE", ex.getMessage()));
     }
 
     @ExceptionHandler(Exception.class)
     public ResponseEntity<ErrorResponse> handleUnexpected(Exception ex) {
+        incrementErrorMetric(HttpStatus.INTERNAL_SERVER_ERROR, "INTERNAL_ERROR");
         return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
             .body(new ErrorResponse("INTERNAL_ERROR", "An unexpected error occurred. Please try again."));
     }
 
+    private void incrementErrorMetric(HttpStatus status, String errorCode) {
+        String normalizedCode = normalizeErrorCode(errorCode);
+        Counter.builder("tinyurl.http.server.errors.total")
+            .tag("status", Integer.toString(status.value()))
+            .tag("error_code", normalizedCode)
+            .register(meterRegistry)
+            .increment();
+    }
+
+    private String normalizeErrorCode(String errorCode) {
+        if (errorCode == null || errorCode.isBlank()) {
+            return "UNKNOWN";
+        }
+        // Map to bounded set of known error codes
+        return switch (errorCode) {
+            case "INVALID_URL", "INVALID_EXPIRY", "INVALID_REQUEST" -> errorCode;
+            case "SERVICE_UNAVAILABLE", "NOT_FOUND", "GONE", "INTERNAL_ERROR" -> errorCode;
+            default -> "UNKNOWN_ERROR";
+        };
+    }
+
     private String messageForCode(String code) {
         return switch (code) {
             case "INVALID_URL" -> "URL must be a valid HTTP or HTTPS address (max 2048 characters).";
diff --git a/tinyurl/src/main/java/com/tinyurl/observability/RequestObservabilityFilter.java b/tinyurl/src/main/java/com/tinyurl/observability/RequestObservabilityFilter.java
new file mode 100644
index 0000000..29c0281
--- /dev/null
+++ b/tinyurl/src/main/java/com/tinyurl/observability/RequestObservabilityFilter.java
@@ -0,0 +1,102 @@
+package com.tinyurl.observability;
+
+import io.micrometer.core.instrument.Counter;
+import io.micrometer.core.instrument.MeterRegistry;
+import io.micrometer.core.instrument.Timer;
+import jakarta.servlet.FilterChain;
+import jakarta.servlet.ServletException;
+import jakarta.servlet.http.HttpServletRequest;
+import jakarta.servlet.http.HttpServletResponse;
+import java.io.IOException;
+import java.util.UUID;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.slf4j.MDC;
+import org.springframework.core.Ordered;
+import org.springframework.core.annotation.Order;
+import org.springframework.stereotype.Component;
+import org.springframework.web.filter.OncePerRequestFilter;
+import org.springframework.web.servlet.HandlerMapping;
+
+@Component
+@Order(Ordered.HIGHEST_PRECEDENCE)
+public class RequestObservabilityFilter extends OncePerRequestFilter {
+
+    private static final Logger log = LoggerFactory.getLogger(RequestObservabilityFilter.class);
+    private static final String CORRELATION_ID_HEADER = "X-Correlation-Id";
+    private static final String CORRELATION_ID_MDC_KEY = "correlation_id";
+
+    private final MeterRegistry meterRegistry;
+
+    public RequestObservabilityFilter(MeterRegistry meterRegistry) {
+        this.meterRegistry = meterRegistry;
+    }
+
+    @Override
+    protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
+        throws ServletException, IOException {
+
+        String correlationId = request.getHeader(CORRELATION_ID_HEADER);
+        if (correlationId == null || correlationId.isBlank()) {
+            correlationId = UUID.randomUUID().toString();
+        }
+
+        MDC.put(CORRELATION_ID_MDC_KEY, correlationId);
+        response.setHeader(CORRELATION_ID_HEADER, correlationId);
+
+        long startNanos = System.nanoTime();
+        int statusCode = HttpServletResponse.SC_INTERNAL_SERVER_ERROR;
+
+        try {
+            filterChain.doFilter(request, response);
+            statusCode = response.getStatus();
+        } finally {
+            try {
+                String method = request.getMethod();
+                String route = (String) request.getAttribute(HandlerMapping.BEST_MATCHING_PATTERN_ATTRIBUTE);
+                if (route == null || route.isBlank()) {
+                    route = "UNMAPPED";
+                }
+
+                String status = Integer.toString(statusCode);
+                String statusClass = (statusCode / 100) + "xx";
+                String outcome = statusCode >= 400 ? "error" : "success";
+
+                long durationNanos = System.nanoTime() - startNanos;
+                Timer.builder("tinyurl.http.server.request.duration")
+                    .tag("method", method)
+                    .tag("route", route)
+                    .tag("status", status)
+                    .register(meterRegistry)
+                    .record(durationNanos, java.util.concurrent.TimeUnit.NANOSECONDS);
+
+                Counter.builder("tinyurl.http.server.requests.total")
+                    .tag("method", method)
+                    .tag("route", route)
+                    .tag("status_class", statusClass)
+                    .tag("outcome", outcome)
+                    .register(meterRegistry)
+                    .increment();
+
+                log.info(
+                    "http_request method={} route={} status={} duration_ms={} client_ip={} user_agent={}",
+                    method,
+                    route,
+                    statusCode,
+                    durationNanos / 1_000_000,
+                    request.getRemoteAddr(),
+                    sanitize(request.getHeader("User-Agent"))
+                );
+            } finally {
+                MDC.remove(CORRELATION_ID_MDC_KEY);
+            }
+        }
+    }
+
+    private String sanitize(String value) {
+        if (value == null) {
+            return "unknown";
+        }
+        return value.replaceAll("[\r\n]", " ");
+    }
+}
diff --git a/tinyurl/src/main/resources/application-prod.yaml b/tinyurl/src/main/resources/application-prod.yaml
new file mode 100644
index 0000000..9a3dc92
--- /dev/null
+++ b/tinyurl/src/main/resources/application-prod.yaml
@@ -0,0 +1,7 @@
+# Production-specific configuration
+management:
+  endpoints:
+    web:
+      exposure:
+        # Production: only expose health endpoint. Metrics/prometheus require separate admin port + authentication
+        include: health
diff --git a/tinyurl/src/main/resources/application.yaml b/tinyurl/src/main/resources/application.yaml
index 31f2cdc..04137e9 100644
--- a/tinyurl/src/main/resources/application.yaml
+++ b/tinyurl/src/main/resources/application.yaml
@@ -19,11 +19,26 @@ management:
     health:
       probes:
         enabled: true
+      show-components: when_authorized
       show-details: when_authorized
+      group:
+        readiness:
+          include: readinessState,db,diskSpace,ping
+        liveness:
+          include: livenessState,ping
   endpoints:
     web:
       exposure:
-        include: health
+        # Default: expose metrics for dev/test. Restricted in production via application-prod.yaml
+        include: health,metrics,prometheus
+  metrics:
+    tags:
+      application: ${spring.application.name}
+    distribution:
+      percentiles-histogram:
+        http.server.requests: true
+      percentiles:
+        http.server.requests: 0.95,0.99
 
 tinyurl:
   base-url: ${TINYURL_BASE_URL:http://localhost}
diff --git a/tinyurl/src/main/resources/logback-spring.xml b/tinyurl/src/main/resources/logback-spring.xml
new file mode 100644
index 0000000..7e34c6b
--- /dev/null
+++ b/tinyurl/src/main/resources/logback-spring.xml
@@ -0,0 +1,14 @@
+<configuration>
+    <springProperty scope="context" name="appName" source="spring.application.name" defaultValue="app"/>
+
+    <appender name="JSON_CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
+        <encoder class="net.logstash.logback.encoder.LogstashEncoder">
+            <customFields>{"service":"${appName}"}</customFields>
+            <includeMdcKeyName>correlation_id</includeMdcKeyName>
+        </encoder>
+    </appender>
+
+    <root level="INFO">
+        <appender-ref ref="JSON_CONSOLE"/>
+    </root>
+</configuration>