Thermodynamically Consistent Deep Learning Model for Nanocomposites using Nested Learning

A Physics-Informed Deep Learning (PIDL) framework for modeling complex material behavior while enforcing thermodynamic consistency. This repository implements a novel approach that combines HOPE (Nested Learning) architecture with Feed-Forward Neural Networks (FFNNs) to predict the mechanical response of nanoparticle-filled epoxy composites under varying ambient conditions.

🎯 Overview

This framework addresses the challenge of modeling complex material behavior by:

Enforcing Physics: Incorporating thermodynamic principles directly into the neural network architecture
Capturing History: Using HOPE blocks (Titans Memory + CMS) to model history-dependent material behavior through internal state variables
Ensuring Consistency: Guaranteeing thermodynamic consistency through physics-based loss terms
Handling Complexity: Managing multi-scale effects from temperature, moisture, and nanoparticle content variations

Key Innovation

The model uniquely combines:

HOPE (Nested Learning) Architecture → Based on the paper "Nested Learning: The Illusion of Deep Learning Architecture" (Behrouz et al.)
TitansL2 Memory Module → Adaptive memory with Delta Rule updates for capturing temporal dependencies
Continuum Memory System (CMS) → Multi-frequency memory consolidation for persistent knowledge storage
FFNN for Free Energy → Approximate the material's thermodynamic state
Automatic Differentiation → Derive stress from free energy (∂Ψ/∂C)
Physics Constraints → Enforce non-negative dissipation and thermodynamic laws

🧠 HOPE Implementation Overview

This section provides a detailed overview of the HOPE (Nested Learning) implementation in src/hope_layer.py, which replaces traditional LSTM layers with a more expressive memory system inspired by neuroscience principles.

Architecture Hierarchy

src/hope_layer.py
├── DynamicDense           # GLU-style gated projections
├── TitansL2               # Base memory with Delta Rule (standard projections)
├── TitansL2Dynamic        # Memory with Dynamic (gated) projections
├── CMSBlock               # Simple MLP for persistent storage
├── CMSLayer               # Chunk-based memory with slower updates
├── HopeBlock              # TitansL2 + CMSBlock
├── HopeBlockDynamic       # TitansL2Dynamic + CMSBlock
└── FullHOPEBlock          # TitansL2Dynamic + CMSLayer (used in model)

1. DynamicDense Layer

GLU-style gated projection that modulates the output based on input content:

y = (x @ W_static) * SiLU(x @ W_gate)

Component	Description
`W_static`	Static projection weights
`W_gate`	Gating weights with SiLU activation
Output	Element-wise product of projection and gate

Purpose: Provides input-dependent gating for more expressive Q/K/V projections.

2. TitansL2 Memory Module

The core associative memory with Delta Rule updates. Processes sequences step-by-step using tf.scan.

Memory Update Formula

M_new = M_prev - α · forget_term + β · write_term

where:
  forget_term = (M @ k) @ k^T    # Selective forgetting
  write_term  = v @ k^T          # New information

Architecture Details

Parameter	Description	Value
`head_dim`	Dimension per attention head	`units // n_head`
`n_head`	Number of parallel memory heads	6 (configurable)
`α` (alpha)	Forgetting rate	`sigmoid(α_raw) × 0.8` ∈ [0, 0.8]
`β` (beta)	Writing rate	`sigmoid(β_raw) × 0.8` ∈ [0, 0.8]
Memory shape	Per-head memory matrix	`[batch, n_head, head_dim, head_dim]`

Processing Flow

1. Project inputs → Q, K, V  (via Dense or DynamicDense)
2. L2-normalize K and Q      (gradient stability)
3. Reshape to multi-head     [B, T, n_head, head_dim]
4. tf.scan over timesteps:
   a. Read:  y = M @ q
   b. Forget: M -= α · (M @ k) @ k^T
   c. Write:  M += β · v @ k^T
5. Project output            (c_proj)

Optional Momentum

When use_momentum=True, updates are smoothed via exponential moving average:

δ = -α · forget_term + β · write_term
momentum_new = β_m · momentum_prev + (1 - β_m) · δ
M_new = M_prev + momentum_new

3. TitansL2Dynamic

Same as TitansL2 but uses DynamicDense for Q/K/V projections instead of standard Dense layers. This provides input-dependent gating for more expressive memory operations.

4. CMSBlock (Simple)

Persistent knowledge storage via a standard MLP:

Sequential([
    Dense(4 × units, activation='gelu'),
    Dense(units),
    Dropout(rate)
])

Purpose: Stores static knowledge learned during training (like original Transformer FFN).

5. CMSLayer (Chunk-based)

Continuum Memory System with multi-frequency updates inspired by brain oscillations.

Key Innovation: Chunk-based Memory Updates

Instead of updating memory at every timestep (like TitansL2), CMSLayer accumulates updates and applies them at chunk boundaries:

# At each timestep:
pending_forget += (M @ k) @ k^T
pending_write  += v @ k^T

# At chunk boundaries (every chunk_size steps):
M += -α · (pending_forget / chunk_size) + β · (pending_write / chunk_size)

Architecture

Input ──┬── MLP Path ───────────────── x_static
        │      c_fc (4×units, gelu)
        │      c_proj (units)
        │
        └── Memory Path ────────────── y_mem
               c_key, c_val projections
               Chunk-based memory updates
               
Output = x_static + y_mem  (combined knowledge)

Parameter	Description	Default
`chunk_size`	Steps between memory updates	16
`α`, `β`	Forget/write rates	learnable

Purpose: Lower-frequency updates consolidate information over longer timescales.

6. FullHOPEBlock (Used in Model)

Combines high-frequency (TitansL2Dynamic) and low-frequency (CMSLayer) memory systems:

Input
  │
  ├──► LayerNorm ──► TitansL2Dynamic ──┐
  │                                    + (residual)
  ◄────────────────────────────────────┘
  │
  ├──► LayerNorm ──► CMSLayer ─────────┐
  │                                    + (residual)
  ◄────────────────────────────────────┘
  │
Output

Multi-Timescale Design

Component	Update Frequency	Purpose
TitansL2Dynamic	Every timestep	Fast adaptation, short-term dependencies
CMSLayer	Every 16 timesteps	Slow consolidation, long-term patterns

This mirrors brain oscillation theory where different frequencies handle different cognitive functions.

Integration with Physics-Informed Model

In src/model.py, two FullHOPEBlock layers replace traditional LSTM:

# Input projection to match HOPE dimensions
self.input_proj = TimeDistributed(Dense(layer_size))

# Two stacked HOPE blocks
self.hope_block1 = FullHOPEBlock(units=layer_size, n_head=6, chunk_size=16)
self.hope_block2 = FullHOPEBlock(units=layer_size, n_head=6, chunk_size=16)

The HOPE blocks capture history-dependent material behavior, which is then used to predict internal state variables (z_i) for the thermodynamic model.

Why HOPE Instead of LSTM?

Aspect	LSTM	HOPE
Memory Type	Vector state	Matrix associative memory
Update Rule	Gated cell state	Delta Rule (gradient descent on memory)
Capacity	Fixed hidden size	O(head_dim²) per head
Multi-scale	Single timescale	High + low frequency systems
Interpretability	Opaque gates	Key-value associations

🚀 Features

✨ Core Capabilities

Thermodynamic Consistency: Automatic enforcement of physical laws through custom loss functions
Multi-Scale Modeling: Handles effects from molecular (moisture) to macro (fiber orientation) scales
Environmental Sensitivity: Accounts for temperature, moisture content, and nanoparticle volume fraction
History Dependence: Captures path-dependent material behavior through internal variables
Experimental Data Driven: Trained directly on experimental stress-strain data

🔬 Scientific Features

Physics-Informed Architecture: Custom neural network layers that respect continuum mechanics
Automatic Stress Derivation: Stress computed as σ = 2∂Ψ/∂C using TensorFlow's automatic differentiation
Dissipation Monitoring: Real-time calculation and enforcement of non-negative energy dissipation
Free Energy Learning: Neural network approximation of Helmholtz free energy function

🛠️ Technical Features

Modular Design: Easily adaptable to different material systems
HOPE Integration: Advanced memory architecture for temporal modeling
Comprehensive Logging: Detailed training metrics and physics constraint monitoring
GPU Acceleration: Optimized for high-performance computing environments

📋 Requirements

System Requirements

Python: 3.8 or higher
GPU: NVIDIA GPU with CUDA support (recommended)
Memory: Minimum 8GB RAM, 16GB+ recommended for large datasets

Dependencies

# Core dependencies
tensorflow >= 2.8.0
numpy >= 1.21.0
scipy >= 1.7.0
matplotlib >= 3.5.0

# Optional but recommended
nvidia-cudnn-cu11  # For GPU acceleration

🔧 Installation

Option 1: Clone and Install

# Clone the repository
git clone https://github.com/BBahtiri/Deep-Learning-Constitutive-Model.git
cd Deep-Learning-Constitutive-Model

# Create virtual environment (recommended)
python -m venv physics_ai_env
source physics_ai_env/bin/activate  # On Windows: physics_ai_env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Option 2: Direct Installation

pip install tensorflow numpy scipy matplotlib
# Add nvidia-cudnn-cu11 for GPU support

📁 Project Structure

PHYSICS-AI/
├── 📄 Main_ML.py              # Main training and evaluation script
├── 🧠 DL_model.py             # Core neural network with HOPE architecture
├── 🔮 hope_layer.py           # HOPE module implementation (TitansL2, CMS, HopeBlock)
├── 🔧 misc.py                 # Data loading and preprocessing utilities
├── 📊 data_experiments_train/ # Training experimental data (.mat files)
├── 📊 data_experiments_validation/ # Validation experimental data
├── 📁 experiment_outputs_pinn/ # Generated results and outputs
├── 📁 checkpoints/            # Model checkpoints during training
├── 📄 extracted_paper_content.txt # Nested Learning paper reference
├── 📄 README.md              # This file
└── 🖼️ pinn.PNG               # Architecture diagram

🏃‍♂️ Quick Start

1. Prepare Your Data

Organize your experimental data in the following structure:

data_experiments_train/
├── epoxy_1_1_1_001.mat
├── epoxy_1_1_2_001.mat
└── ... (more .mat files)

data_experiments_validation/
├── epoxy_2_1_1_001.mat
└── ... (validation .mat files)

Expected .mat file contents:

expStress: Experimental stress data
trueStrain: True strain measurements
timeVec: Time vector for the experiment

2. Configure Hyperparameters

Edit the hyperparameters in Main_ML.py:

# Network Architecture
layer_size = 24              # HOPE and Dense layer units (must be divisible by n_head)
layer_size_fenergy = 24      # Free energy network units
internal_variables = 6       # Number of internal state variables
n_head = 6                   # Number of attention heads in HOPE blocks

# Training Parameters
learning_rate = 0.001        # Initial learning rate
num_epochs = 2000           # Maximum training epochs
batch_size = 32             # Training batch size
timesteps = 500             # Sequence length for HOPE processing

3. Run Training

python Main_ML.py

4. Monitor Results

Training outputs are saved to structured directories:

./final_predictions/ - Model predictions and internal states
./stress_exact/ - Ground truth stress data
./weights/ - Final trained model weights
./checkpoints/ - Training checkpoints

📊 Data Format

Input Data Requirements

Your .mat files should contain:

Variable	Description	Shape	Units
`expStress`	Experimental stress	[n_timesteps, 1]	MPa
`trueStrain`	True strain	[n_timesteps, 1]	dimensionless
`timeVec`	Time vector	[n_timesteps, 1]	seconds

Filename Convention

The code expects filenames in the format: epoxy_X_Y_Z_*.mat

X: Nanoparticle content indicator (1=0%, 2=5%, 3=10%)
Y: Moisture condition (1=dry, 2=saturated)
Z: Temperature condition (1=-20°C, 2=23°C, 3=50°C, 4=60°C)

🧠 Model Architecture

Overall Framework

graph TD
    A[Input Sequence] --> B[Input Projection]
    B --> C[FullHOPE Block 1]
    C --> D[FullHOPE Block 2]
    D --> E[Dense Layers]
    E --> F[Internal Variables z_i]
    F --> G[Free Energy Network]
    A --> G
    G --> H[Free Energy Ψ]
    H --> I[Automatic Differentiation]
    I --> J[Stress σ = 2∂Ψ/∂C]
    F --> K[Dissipation Calculation]
    J --> L[Physics-Informed Loss]
    K --> L

Key Components

1. FullHOPE Blocks (Replacing LSTM)

TitansL2Dynamic: Multi-head associative memory with dynamic projections
CMSLayer: Continuum Memory System with chunk-based updates
Architecture: Pre-LayerNorm → TitansL2 → Residual → LayerNorm → CMS → Residual

2. TitansL2 Memory Update

# Delta Rule Memory Update
M_new = M_prev - α * forget_term + β * write_term
# where:
#   forget_term = (M @ k) @ k^T  (selective forgetting)
#   write_term = v @ k^T         (new information)

3. Internal Variable Prediction

Input: HOPE hidden states
Architecture: Time-distributed dense layers with swish activation
Output: Evolution of internal state variables (z_i)

4. Free Energy Network

Input: Internal variables + strain measure
Architecture: Dense layers with physics constraints
Constraints: Non-negative weights, softplus activation
Output: Helmholtz free energy (Ψ)

5. Physics Enforcement

Stress Derivation: σ = 2∂Ψ/∂C via automatic differentiation
Dissipation: D = -∑(τ_i · ż_i) where τ_i = -∂Ψ/∂z_i
Constraints: D ≥ 0 (thermodynamic consistency)

⚙️ Advanced Usage

Custom Material Systems

To adapt for different materials:

Modify data loading in misc.py:

def getData_exp(input_mat_file_path, target_sequence_length=1000):
    # Adapt for your data format
    mat_contents = scipy.io.loadmat(input_mat_file_path)
    # Modify key names as needed
    stress_raw = mat_contents['your_stress_key']
    # ... rest of implementation

Adjust HOPE architecture in DL_model.py:

# Modify number of heads (must divide layer_size evenly)
n_head = 4  # or 6, 8, etc.

# Adjust chunk size for CMS
self.hope_block1 = FullHOPEBlock(units=layer_size, n_head=n_head, chunk_size=32)

Update network architecture in DL_model.py:

# Adjust number of internal variables for your physics
internal_variables = 12  # Example: more complex material

Custom Physics Constraints

Add domain-specific physics in DL_model.py:

def call(self, normalized_inputs_seq):
    # ... existing code ...
    
    # Add your custom physics constraint
    custom_physics_penalty = your_physics_function(psi_final_full_sequence)
    self.add_loss(custom_physics_penalty * weight_factor)
    
    return norm_pred_stress_for_loss

Hyperparameter Tuning

Key hyperparameters to optimize:

# Architecture
layer_size = [24, 30, 48]           # Network capacity (must be divisible by n_head)
n_head = [4, 6, 8]                  # Number of attention heads
internal_variables = [6, 8, 12]     # Complexity of internal state
layer_size_fenergy = [20, 30, 50]   # Free energy network size

# Training
learning_rate = [1e-4, 1e-3, 5e-3]  # Learning rate schedule
batch_size = [16, 32, 64]           # Memory vs. gradient quality
timesteps = [250, 500, 1000]        # Sequence length vs. memory

📈 Evaluation and Visualization

Training Metrics

The framework automatically tracks:

Primary Loss: Mean Absolute Error on stress prediction
Physics Penalties: Dissipation and free energy constraints
Validation Performance: Generalization metrics

Output Analysis

Post-training analysis includes:

Stress-Strain Curves: Compare predictions vs. experiments
Internal Variable Evolution: Track material state changes
Free Energy Landscapes: Visualize thermodynamic surfaces
Dissipation Monitoring: Verify physics compliance

Visualization Example

import matplotlib.pyplot as plt
import numpy as np

# Load results
stress_pred = np.loadtxt('./final_predictions/stress_pred_unnorm_0.txt')
stress_true = np.loadtxt('./stress_exact/stress_unnorm_0.txt')
strain = np.loadtxt('./strain/strain_unnorm_0.txt')

# Plot stress-strain comparison
plt.figure(figsize=(10, 6))
plt.plot(strain[1:], stress_true, 'b-', label='Experimental', linewidth=2)
plt.plot(strain[1:], stress_pred, 'r--', label='PIDL-HOPE Prediction', linewidth=2)
plt.xlabel('Strain')
plt.ylabel('Stress (MPa)')
plt.legend()
plt.grid(True)
plt.title('PIDL-HOPE Model Performance')
plt.show()

🔬 Scientific Background

Theoretical Foundation

This implementation is based on:

Thermodynamically Consistent Framework:
- Helmholtz Free Energy: Ψ(C, z_i, θ) defines the material's thermodynamic state
- Stress Derivation: σ = 2∂Ψ/∂C (from continuum mechanics)
- Evolution Laws: ż_i governed by thermodynamic forces τ_i = -∂Ψ/∂z_i
- Dissipation: D = -∑τ_i·ż_i ≥ 0 (second law of thermodynamics)
Nested Learning Theory (Behrouz et al.):
- Multi-frequency memory updates inspired by brain oscillations
- Delta Rule associative memory for temporal dependencies
- Continuum Memory System for knowledge consolidation

Key Advantages

Physics Consistency: Guaranteed satisfaction of thermodynamic laws
Interpretability: Internal variables have physical meaning
Generalization: Physics constraints improve extrapolation
Data Efficiency: Physics guidance reduces data requirements
Advanced Memory: HOPE architecture captures complex temporal patterns

📚 Citation

If you use this code in your research, please cite:

@article{bahtiri2024thermodynamically,
  title={A thermodynamically consistent physics-informed deep learning material model for short fiber/polymer nanocomposites},
  author={Bahtiri, Betim and Arash, Behrouz and Scheffler, Sven and Jux, Maximilian and Rolfes, Raimund},
  journal={Computer Methods in Applied Mechanics and Engineering},
  volume={427},
  pages={117038},
  year={2024},
  publisher={Elsevier},
  doi={https://doi.org/10.1016/j.cma.2024.117038}
}

@article{behrouz2025nested,
  title={Nested Learning: The Illusion of Deep Learning Architecture},
  author={Behrouz, Ali and Razaviyayn, Meisam and Zhong, Peilin and Mirrokni, Vahab},
  journal={Neural Information Processing Systems (NeurIPS)},
  year={2025},
  doi={https://arxiv.org/abs/2512.24695}
}

Areas for Contribution

New Material Systems: Adapt framework for metals, ceramics, biologics
Enhanced Physics: Add new thermodynamic constraints
HOPE Extensions: Experiment with different memory configurations
Optimization: Improve computational efficiency
Visualization: Enhanced plotting and analysis tools

Development Setup

# Fork and clone your fork
git clone https://github.com/BBahtiri/Deep-Learning-Constitutive-Model.git
cd Deep-Learning-Constitutive-Model

# Create development environment
python -m venv dev_env
source dev_env/bin/activate

# Install in development mode
pip install -e .
pip install -r requirements-dev.txt  # Include testing dependencies

# Run tests
python -m pytest tests/

🐛 Troubleshooting

Common Issues

GPU Memory Errors

# In Main_ML.py, reduce batch size or sequence length
batch_size = 16  # Reduce from 32
timesteps = 250  # Reduce from 500

Layer Size / Head Compatibility

# layer_size must be divisible by n_head
layer_size = 24  # Works with n_head = 4, 6, 8
n_head = 6       # 24 / 6 = 4 (valid)

Convergence Issues

# Try different learning rates or architectures
learning_rate = 5e-4  # Adjust learning rate
layer_size = 30       # Try different capacity

Data Loading Errors

Verify .mat file structure matches expected format
Check filename conventions match parsing logic
Ensure sufficient data files in train/validation directories

Getting Help

📧 Email: [betimbahtiri@outlook.de]
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

⭐ Star this repository if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
docs/images		docs/images
scripts		scripts
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PINN_Overview.png		PINN_Overview.png
README.md		README.md
pinn.PNG		pinn.PNG
requirements.txt		requirements.txt
run_training.py		run_training.py
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation