Skip to content

Latest commit

 

History

History
850 lines (681 loc) · 31.5 KB

File metadata and controls

850 lines (681 loc) · 31.5 KB

Architecture Overview

System Design

Mission

platform-java is a Java Application Platform for running multiple isolated Java applications within a single JVM, similar to a JEE application server but designed for any Java application (web servers, batch processors, message consumers, etc.). Think of it as running multiple Java applications in separate terminal windows, but all within one JVM with comprehensive isolation and management capabilities.

Core Principles

  1. Isolation - Complete application isolation at ClassLoader, thread pool, security, and resource levels
  2. Simplicity - Drop-in deployment without requiring code changes
  3. Flexibility - Support for platform-aware and legacy applications
  4. Observability - Comprehensive monitoring and metrics (Prometheus, JMX, OpenTelemetry)
  5. Extensibility - Pluggable deployment, clustering, and service discovery mechanisms

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Management Layer                          │
│  ┌──────────────┬─────────────┬──────────────┬─────────┐   │
│  │  REST API    │  Web Console│ Swing Desktop│Terminal │   │
│  │  (Netty)     │ (Browser)   │    UI        │   UI    │   │
│  └──────────────┴─────────────┴──────────────┴─────────┘   │
└────────────────────────┬────────────────────────────────────┘
                         │
┌─────────────────────────┴────────────────────────────────────┐
│                   ApplicationManager                         │
│  • Deployment (YAML/JSON/API)                              │
│  • Lifecycle Management                                     │
│  • Application Registry                                     │
└────────┬──────────────────────────────────────────────────┬─┘
         │                                                  │
    ┌────┴─────┐                                  ┌────────┴─────────┐
    │ Clustering│ (Hazelcast, Consul, etcd)       │  Service Registry │
    └───────────┘                                  │ (Consul, etcd, Eureka)
                                                   └──────────────────┘

┌─────────────────────────────────────────────────────────────┐
│              Per-Application Isolation Sandbox               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │    App 1     │  │    App 2     │  │    App N     │      │
│  │  Instance    │  │  Instance    │  │  Instance    │      │
│  ├──────────────┤  ├──────────────┤  ├──────────────┤      │
│  │ClassLoader   │  │ClassLoader   │  │ClassLoader   │      │
│  │Isolation     │  │Isolation     │  │Isolation     │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │Thread Pool   │  │Thread Pool   │  │Thread Pool   │      │
│  │Management    │  │Management    │  │Management    │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │Security      │  │Security      │  │Security      │      │
│  │Policy        │  │Policy        │  │Policy        │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │Resource      │  │Resource      │  │Resource      │      │
│  │Monitor       │  │Monitor       │  │Monitor       │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│              Platform Services (Optional)                    │
│  ┌──────────────┬─────────────┬─────────────┬────────────┐  │
│  │Message Bus   │  Volumes    │  Observability│  Metrics │  │
│  │(Event Flow)  │ (Storage)   │  (OpenTelemetry)         │  │
│  └──────────────┴─────────────┴─────────────┴────────────┘  │
└─────────────────────────────────────────────────────────────┘

Module Organization (43 Core + 8 Extended Modules)

Core Platform Modules

API & Contracts

  • platform-api - Public API interfaces and contracts for all platform features

Core Implementation

  • platform-core - ApplicationManager, ApplicationContext, lifecycle coordination
  • platform-classloader - Isolated ClassLoader implementation with parent-last delegation
  • platform-threadpool - Per-application thread pool management with configurable limits
  • platform-security - Security policy enforcement (modern StackWalker-based, no deprecated SecurityManager)
  • platform-monitoring - Resource monitoring and quota enforcement (CPU, memory, threads)

Messaging & Service Discovery

  • platform-messaging - In-memory event bus (MessageBus) and ServiceRegistry
  • platform-messaging-jms - JMS-backed MessageBus for distributed multi-node messaging

Configuration & Deployment

  • platform-config - YAML/JSON descriptor parsing and validation
  • platform-deployment - Deployment SPI and lifecycle hooks
  • platform-fs-watcher - Filesystem monitoring for auto-deployment with debouncing
  • platform-launcher - Platform bootstrap and interactive console

Management & Monitoring

  • platform-rest-api - REST API server (HTTP endpoints for deployment/lifecycle/metrics)
  • platform-rest-api-netty - Netty-based REST API implementation
  • platform-web-console - Browser-based management UI with real-time metrics (Chart.js)
  • platform-swing-ui - Native desktop management UI with graphical controls
  • platform-terminal-ui - Full-screen terminal interface (curses-like keyboard controls)
  • platform-metrics-jmx - JMX metrics exporter for JConsole/VisualVM
  • platform-metrics-prometheus - Prometheus metrics exporter for modern monitoring stacks

Advanced Features

  • platform-storage - Persistent volume management (ephemeral and durable storage)
  • platform-storage-s3 - S3-backed volume storage
  • platform-storage-database - Database-backed volume storage
  • platform-storage-redis - Redis-backed volume storage
  • platform-vm-management - Virtual machine orchestration (KVM/QEMU via libvirt)
  • platform-otel - OpenTelemetry integration for metrics and tracing
  • platform-jvmti-agent - Optional native agent for precise heap measurement
  • platform-rag - RAG infrastructure (document Q&A with vector databases)
    • platform-rag-core - RAG implementation core
    • platform-rag-embedding - Vector embedding services
    • platform-rag-ingest - Document ingestion pipeline
    • platform-rag-query - Query processing
    • platform-rag-vectordb - Vector database integration
  • platform-mlops - ML workflow orchestration (MLOps platform)
    • platform-mlops-core - MLOps implementation core
    • platform-mlops-deployment - Model deployment
    • platform-mlops-experiments - Experiment tracking
    • platform-mlops-registry - Model registry
    • platform-mlops-serving - Model serving infrastructure
    • platform-mlops-training - Training pipeline orchestration
  • platform-ai-marketplace - AI model marketplace
    • platform-ai-marketplace-core - Marketplace infrastructure
    • platform-ai-marketplace-deployment - Model deployment from marketplace
    • platform-ai-marketplace-optimization - Model optimization

Clustering & Service Discovery Modules

Clustering Implementations (multi-node deployment coordination)

  • platform-cluster - Base clustering abstraction (Hazelcast)
  • platform-cluster-consul - Consul-based clustering
  • platform-cluster-etcd - etcd-based clustering
  • platform-cluster-redis - Redis-based clustering
  • platform-cluster-zookeeper - ZooKeeper-based clustering

Service Registry Implementations (inter-application service lookup)

  • platform-registry-consul - Consul service registry
  • platform-registry-etcd - etcd service registry
  • platform-registry-eureka - Eureka service registry

Configuration Sources (dynamic configuration management)

  • platform-config-consul - Consul configuration source
  • platform-config-etcd - etcd configuration source
  • platform-config-vault - Vault configuration source
  • platform-config-zookeeper - ZooKeeper configuration source

Support Modules

  • platform-bom - Bill of Materials for dependency management
  • platform-samples - Example applications demonstrating platform features

Key Components

ApplicationManager

Central orchestrator for application lifecycle and registry. Maintains the collection of deployed applications and coordinates all isolation subsystems.

Responsibilities:

  • Application deployment and undeployment
  • Lifecycle management (start, stop, reload)
  • Application state tracking
  • Hot code reload coordination
  • Clustering and distribution (in multi-node setups)

Key Methods:

ApplicationManager manager = new ApplicationManager();
manager.deploy(descriptor);
manager.start("app-id");
manager.stop("app-id");
manager.undeploy("app-id");
ApplicationContext context = manager.getApplicationContext("app-id");

ApplicationContext

Container for all application-specific resources. Provides isolated execution environment and access to platform services.

Contains:

  • Isolated ClassLoader
  • Managed ThreadPool
  • SecurityPolicy
  • ResourceMonitor
  • Optional: MessageBus, ServiceRegistry, StorageVolumes

Key Features:

  • Application lifecycle hooks (start, stop, reload)
  • Access to platform features via optional services
  • Resource monitoring and tracking
  • State preservation support

IsolatedClassLoader

Parent-last ClassLoader delegation for maximum application isolation while maintaining compatibility with platform APIs.

Design:

Bootstrap ClassLoader
  └── Platform ClassLoader
      └── System ClassLoader
          └── PlatformSharedClassLoader (platform-api)
              ├── App1ClassLoader (parent-last) -> App1 JARs
              ├── App2ClassLoader (parent-last) -> App2 JARs
              └── App3ClassLoader (parent-last) -> App3 JARs

Behavior:

  • Platform APIs (org.flossware.platform-java.api.*) loaded once in shared ClassLoader
  • Application classes use parent-last delegation to achieve maximum isolation
  • Prevents ClassCastExceptions on shared interfaces
  • Garbage collection support through reference counting

ManagedThreadPool

Per-application ThreadPoolExecutor with isolation and monitoring.

Configuration:

ThreadPoolConfig config = ThreadPoolConfig.builder()
    .corePoolSize(4)
    .maxPoolSize(20)
    .queueCapacity(100)
    .build();

Features:

  • Named threads tagged with application ID
  • Graceful shutdown coordination
  • Monitoring hooks for resource tracking
  • Thread quota enforcement with grace periods
  • Configurable rejection policies

SecurityPolicy

Configurable permission model supporting multiple permission types.

Permission Types:

  • File permissions (read, write, delete, execute)
  • Socket permissions (connect, resolve)
  • Runtime permissions (getenv, setProperty)
  • Reflection control
  • Native code control

Implementation Strategy:

  • Java 8-16: SecurityManager with custom Policy
  • Java 17+: StackWalker API for modern enforcement (deprecated SecurityManager support removed)

Configuration:

security:
  allowReflection: false
  allowNativeCode: false
  filePermissions:
    - path: "/tmp/*"
      actions: "read,write"
  socketPermissions:
    - host: "localhost:5432"
      actions: "connect"

ResourceMonitor

Tracks per-application resource usage with quota enforcement.

Tracked Resources:

  • CPU time (via ThreadMXBean per thread)
  • Heap usage (estimated via ClassLoader object tracking)
  • Thread count
  • I/O metrics
  • Custom metrics

Enforcement Levels:

  • NOTIFY (default) - Log violations without action
  • THROTTLE - Slow down execution to reduce resource pressure
  • SHUTDOWN - Gracefully stop application
  • KILL - Immediately terminate without cleanup

Grace Periods:

  • Configurable violation threshold (default: 3 consecutive violations)
  • Per-resource-type enforcement policies
  • Violation count reset when usage returns to normal

Design Patterns

1. Isolation Pattern

Each application runs in its own sandbox with:

  • Independent ClassLoader preventing class conflicts
  • Dedicated thread pool preventing thread exhaustion
  • Isolated security policy preventing unauthorized access
  • Individual resource quotas preventing resource hogging

Benefits: One faulty application cannot crash or degrade others

2. Application Lifecycle Pattern

Standardized lifecycle for all applications:

UNDEPLOYED -> DEPLOYED -> RUNNING -> STOPPED -> UNDEPLOYED
                          (down arrow)
                      (RELOADING)

Hooks Available:

  • Application.start(ApplicationContext) - Startup logic
  • Application.stop() - Cleanup logic
  • ReloadableApplication.captureState() - State before reload
  • ReloadableApplication.restoreState() - State after reload

3. Pluggable Architecture Pattern

Core platform is independent of deployment mechanisms:

  • Deployment SPI allows any deployment source (CLI, REST, filesystem, programmatic)
  • Service Registry SPI supports multiple backends (Consul, etcd, Eureka)
  • Clustering SPI supports multiple implementations (Hazelcast, Consul, etcd, Redis, ZooKeeper)
  • Config Source SPI supports multiple backends (Consul, etcd, Vault, ZooKeeper)
  • Storage SPI supports multiple backends (filesystem, S3, database, Redis)

Benefits: Add new deployment methods without modifying core

4. Optional Services Pattern

Platform services are completely optional:

  • Applications can be totally oblivious to MessageBus and ServiceRegistry
  • Services disabled by default, enabled in configuration
  • Applications opt-in via ApplicationContext methods
  • Graceful degradation if services unavailable

Example:

context.getMessageBus().ifPresent(bus -> {
    bus.subscribe("topic", message -> handle(message));
});

5. Configuration as Code Pattern

YAML/JSON descriptors define complete application configuration:

  • No code changes required
  • Version control friendly
  • Validation before deployment
  • Support for templating and variable substitution

Example:

applicationId: my-app
mainClass: com.example.MyApp
resources:
  maxHeapMB: 512
  maxThreads: 50
security:
  allowReflection: false
messaging:
  enabled: true

ClassLoader Isolation Strategy

Problem Solved

Multiple Java applications in one JVM suffer from:

  • Class Version Conflicts - App1 needs commons-lang 3.0, App2 needs 2.6
  • Static State Pollution - Shared static fields from different apps interfere
  • ClassLoader Leaks - Undeployed apps leave garbage in memory

Solution: Isolated ClassLoader Hierarchy

Each Application
  (down arrow)
IsolatedClassLoader (parent-last)
  (down arrow)
Loads application JARs FIRST
  (down arrow)
Falls back to PlatformSharedClassLoader ONLY for platform APIs
  (down arrow)
platform-api classes loaded once, shared across all apps

Key Features

  1. Parent-Last Delegation

    • Application classes are attempted FIRST
    • Platform APIs fall back to shared ClassLoader
    • Completely different applications can use different library versions
  2. Garbage Collection Support

    • Reference counting tracks active class usage
    • ClassLoader becomes eligible for GC when app undeployed
    • Temporary retention during reload for safe transitions
  3. Best Practices

    • Avoid ThreadLocal usage (call remove() in stop())
    • Clear static collections in stop() method
    • Implement ReloadableApplication for state preservation
    • Document thread creation (should use managed thread pool)

See: ClassLoader Best Practices


Thread Pool Management

Design

Each application has its own isolated ThreadPoolExecutor:

Application Context
  (down arrow)
ManagedThreadPool (ThreadPoolExecutor)
  ├── Core Threads: Configurable (default: 4)
  ├── Max Threads: Configurable (default: 20)
  ├── Queue Capacity: Configurable (default: 100)
  └── Monitoring: CPU, memory, thread count per-app

Configuration

threadPool:
  corePoolSize: 4
  maxPoolSize: 20
  queueCapacity: 100
  rejectionPolicy: "ABORT"  # ABORT, CALLER_RUNS, DISCARD, DISCARD_OLDEST

Enforcement

Resource enforcement prevents thread exhaustion:

ResourceConfig config = ResourceConfig.builder()
    .maxThreads(50)
    .threadEnforcementAction(EnforcementAction.SHUTDOWN)
    .violationGracePeriod(3)  // 3 violations before shutdown
    .build();

Best Practices

  • Use Managed Thread Pool - Always submit tasks to context.getThreadPool()
  • Avoid Direct Thread Creation - Using new Thread() bypasses isolation
  • Configure Appropriately - Match pool size to workload
  • Monitor Usage - Check metrics to identify leaks or misconfigurations

See: Resource Enforcement


Security Model

Modern Approach (No SecurityManager)

platform-java uses the StackWalker API (Java 9+) instead of deprecated SecurityManager:

StackWalker walker = StackWalker.getInstance(StackWalker.Option.RETAIN_CLASS_REFERENCE);
walker.walk(frames -> {
    Frame securityFrame = frames
        .filter(frame -> frame.getDeclaringClass().getClassLoader() == appClassLoader)
        .findFirst()
        .orElse(null);
    
    if (securityFrame != null) {
        // Check if class has permission for operation
        return enforcePermission(securityFrame.getDeclaringClass(), operation);
    }
    return true;  // No app frame found, allow
});

Permission Types

  1. File Permissions

    • read - Read files
    • write - Write files
    • delete - Delete files
    • execute - Execute files
  2. Socket Permissions

    • connect - Create outbound connections
    • resolve - DNS resolution
  3. Runtime Permissions

    • getenv.VARIABLE_NAME - Read environment variables
    • setProperty.PROPERTY_NAME - Set system properties
    • loadLibrary - Load native libraries

Configuration Examples

Strict Isolation:

security:
  allowReflection: false
  allowNativeCode: false
  filePermissions: []
  socketPermissions: []

Web Application:

security:
  allowReflection: false
  filePermissions:
    - path: "/app/data/*"
      actions: "read,write"
  socketPermissions:
    - host: "database:5432"
      actions: "connect"

Legacy Application:

security:
  allowReflection: true
  allowNativeCode: true
  filePermissions:
    - path: "/*"
      actions: "read,write,delete"
  socketPermissions:
    - host: "*"
      actions: "connect"

See: Security Guide


Clustering Approach

Multi-Node Architecture

Node 1                          Node 2
┌─────────────────┐            ┌─────────────────┐
│ ApplicationMgr  │◄──────────>│ ApplicationMgr   │
│  App A: RUNNING │ Clustering │  App A: RUNNING  │
│  App B: RUNNING │ (Hazelcast)│  App B: RUNNING  │
└────────┬────────┘            └────────┬────────┘
         │                              │
         └──────────┬───────────────────┘
                    │
            ┌──────────────┐
            │Service Registry
            │(Consul/etcd)  │
            └──────────────┘

Clustering Implementations

  • Hazelcast (default) - In-memory data grid with built-in clustering
  • Consul - Service mesh with clustering support
  • etcd - Distributed configuration and clustering
  • Redis - In-memory data store with clustering
  • ZooKeeper - Distributed coordination

Features

  1. Application Distribution

    • Deploy app once, runs on all nodes
    • Automatic state synchronization
    • Consistent ordering across cluster
  2. Service Discovery

    • Services from any application accessible across cluster
    • Automatic registration/deregistration on deploy/undeploy
    • Health checking and failover
  3. Distributed Messaging

    • Event bus topics replicated across cluster
    • JMS integration for external messaging systems
    • Message ordering guarantees

Configuration

clustering:
  enabled: true
  provider: "hazelcast"  # hazelcast, consul, etcd, redis, zookeeper
  hazelcast:
    multicastEnabled: true
    multicastGroup: "224.2.2.3"
    multicastPort: 54327
  consul:
    host: "consul-server"
    port: 8500

See: Module documentation in platform-cluster-* and platform-registry-* directories.


Advanced Features

Hot Code Reload

Zero-downtime application updates with optional state preservation:

  • Update JAR while application is running
  • Old ClassLoader retained until safe to collect
  • State captured and restored via ReloadableApplication
  • Automatic rollback on failure

See: Hot Code Reload

Resource Enforcement

Automatic action when applications exceed resource quotas:

  • NOTIFY, THROTTLE, SHUTDOWN, or KILL actions
  • Grace periods for transient spikes
  • Per-resource configuration (CPU, memory, threads)

See: Resource Enforcement

Application Dependencies

Declare and manage inter-application dependencies:

  • Topologically-ordered startup
  • REQUIRED and OPTIONAL dependency types
  • Circular dependency detection
  • Service version tracking

See: Application Dependencies

Persistent Storage Volumes

Per-application isolated storage:

  • Persistent volumes survive restarts
  • Ephemeral volumes cleaned up on undeploy
  • Size limits and enforcement
  • Multiple storage backends (filesystem, S3, database, Redis)

See: Volumes

Native Binary Support

Platform-specific native library loading:

  • Automatic platform detection
  • Per-application library isolation
  • Version conflict prevention

See: Native Binaries

Observability

Comprehensive monitoring and metrics:

  • OpenTelemetry - Distributed tracing (OTLP)
  • Prometheus - Time-series metrics
  • JMX - Java Management Extensions
  • Structured Logging - MDC context (app_id, trace_id, span_id)

See: Observability


Deployment Methods

Method Interactive Automatic Real-time Use Case
CLI Yes No Yes Manual testing, development
YAML/JSON Descriptors Yes Yes Yes Configuration-driven, GitOps
Filesystem Watcher No Yes Yes Auto-deployment, watch directories
REST API No Yes Yes Programmatic, CI/CD integration
Web Console Yes Yes Yes Browser-based UI with metrics
Swing Desktop UI Yes No Yes Native desktop application
Terminal UI Yes No Yes SSH/remote management
Programmatic API No Yes Yes Embedded platform usage

Examples:


Application Packaging

Descriptor-Based Deployment

applicationId: my-app
name: My Application
version: 1.0
mainClass: com.example.MyApp

dependencies:
  - classpath entries (JARs)

threadPool:
  corePoolSize: 4
  maxPoolSize: 20
  queueCapacity: 100

security:
  allowReflection: false
  filePermissions:
    - path: "/tmp/my-app"
      actions: "read,write"

resources:
  maxHeapMB: 512
  maxThreads: 50

messaging:
  enabled: true

volumes:
  - id: data
    type: persistent
    sizeGB: 10
    mountPath: /data

hotReloadEnabled: true
preserveState: true

Platform-Aware Application (Optional)

Implement the Application interface to access platform features:

public class MyApp implements Application {
    @Override
    public void start(ApplicationContext context) throws Exception {
        // Use managed thread pool
        context.getThreadPool().submit(() -> { /* work */ });
        
        // Optional: Use messaging
        context.getMessageBus().ifPresent(bus -> {
            bus.subscribe("topic", message -> { /* handle */ });
        });
        
        // Optional: Use service registry
        context.getServiceRegistry().ifPresent(registry -> {
            registry.registerService(MyService.class, new MyServiceImpl());
        });
    }
    
    @Override
    public void stop() throws Exception {
        // Cleanup resources
    }
}

Legacy Application (Plain main() method)

Any Java application with a main method works unchanged:

public class LegacyApp {
    public static void main(String[] args) {
        System.out.println("Running isolated in platform-java");
    }
}

Further Reading

Core Documentation

Getting Started

Core Features

Deployment & Management

Advanced Topics

VM & Infrastructure

Project Status

AI/ML Features (Phase 4+)

Examples


Implementation Status

Production Ready

Core Platform:

  • API definitions, core implementation, ClassLoader isolation, thread pool management
  • Resource monitoring with quota enforcement
  • Security policy enforcement (StackWalker-based)
  • Messaging and service registry (in-memory and JMS-backed)
  • Platform launcher with interactive console

Deployment & Management:

  • YAML/JSON descriptor parsing
  • Filesystem watcher for auto-deployment
  • REST API server (Netty-based)
  • Web console with real-time metrics
  • Swing desktop UI with graphical controls
  • Terminal UI with keyboard shortcuts
  • JMX and Prometheus metrics exporters

Advanced Features:

  • Hot code reload with state preservation
  • Resource enforcement (NOTIFY, THROTTLE, SHUTDOWN, KILL)
  • Application dependencies with topological ordering
  • Persistent volumes with multiple backends
  • Native binary support (platform-specific libraries)
  • Observability via OpenTelemetry, Prometheus, JMX
  • Clustering (Hazelcast) with failover support
  • Container orchestration (Docker, Podman, LXC)
  • Virtual machine management (KVM/QEMU via libvirt)

In Development / Planned

  • JVMTI agent for precise heap monitoring
  • Additional clustering backends (Consul, etcd, Redis, ZooKeeper)
  • MLOps platform for ML workflow orchestration
  • AI model marketplace with curated catalog
  • RAG infrastructure for document Q&A
  • LLM serving with vLLM/TGI integration
  • Advanced monitoring dashboards and alerting

Getting Help

  • Issues: GitHub Issues
  • Documentation: See links above
  • Examples: Check platform-samples/ directory
  • Contributing: Contributions welcome!