Detection of outliers in multivariate functional time series

This directory contains the implementation of outlier detection methods for multivariate functional time series data using bootstrap procedures. This is the multivariate extension of the methods described in the original_proposal directory.

Overview

The multivariate functional outlier detection methods extend the univariate bootstrap-based approaches to handle multiple functional variables simultaneously. These methods:

Use multivariate functional depth measures (e.g., multiMBD) that combine information across all variables
Support weighted combinations of variables to emphasize certain components
Apply moving block bootstrap (MBBo) to preserve temporal dependence structure
Implement iterative outlier removal procedures

Key difference from univariate methods: The multivariate approach considers the joint behavior across all variables when detecting outliers, which can identify outliers that might not be apparent when analyzing variables separately.

Prerequisites

Before using these functions, ensure you have the required packages loaded:

library(fda)
library(roahd)
library(fda.usc)
library(mrfDepth)

Also, make sure to source the necessary files:

source("R/utils.R")
source("R/simulated-models.R")  # Required for data generation
source("R/depths.R")             # Required for multiMBD depth function
source("R/multivariate_process/bootstrap-procedures.R")
source("R/multivariate_process/outlier-detection-procedures.R")

Data Structure

Multivariate functional data should be provided as a list, where each element is a functional data object (or matrix) representing one variable. All variables must have:

The same number of observations (rows)
Consistent row names (if provided)
The same time grid (columns)

Example Data Structure

# Create multivariate functional data
n_obs <- 100
n_time <- 50
n_vars <- 3  # Number of variables

# Option 1: List of fdata objects (recommended)
mfdataobj <- list()
for (v in 1:n_vars) {
  data_matrix <- matrix(rnorm(n_obs * n_time), nrow = n_obs, ncol = n_time)
  mfdataobj[[v]] <- fdata(data_matrix)
}

# Option 2: List of matrices (also works)
mfdataobj <- list(
  variable1 = matrix(rnorm(n_obs * n_time), nrow = n_obs, ncol = n_time),
  variable2 = matrix(rnorm(n_obs * n_time), nrow = n_obs, ncol = n_time),
  variable3 = matrix(rnorm(n_obs * n_time), nrow = n_obs, ncol = n_time)
)

Main Functions

1. `outlier_multivariate_bootstrap()` - Bootstrap-based Outlier Detection

This function implements outlier detection for multivariate functional time series using bootstrap procedures, following the same iterative algorithm as the univariate version.

Function Signature

outlier_multivariate_bootstrap(
    mfdataobj,
    nb = 200,
    smo = 0.05,
    quan = 0.5,
    dfunc = multiMBD,
    l = 4,
    p = 0.1,
    ns = 0.01,
    boot = multiMBBo,
    weights = c(1/3, 1/3, 1/3)
)

Parameters

mfdataobj: List containing multivariate functional data objects. Each element should be an fdata object or matrix representing one variable. All variables must have the same number of observations (rows).
nb: Number of bootstrap samples to generate (default: 200). More samples provide more stable cutoff estimates but increase computation time.
smo: Smoothness parameter (default: 0.05). Currently not actively used but kept for compatibility with function signature.
quan: Quantile used for cutoff estimation from bootstrap distribution (default: 0.5, i.e., median). Values between 0 and 1.
dfunc: Depth function to use for outlier detection (default: multiMBD). Must be a multivariate depth function that:
- Accepts a list of functional data objects
- Accepts a weights parameter
- Returns a vector of depth values (one per observation)
- Common options: multiMBD (Multivariate Modified Band Depth)
l: Block size for moving block bootstrap (default: 4). Should be chosen based on the temporal dependence structure of your data.
p: Success probability for geometric distribution (default: 0.1). Currently only multiMBBo is implemented, so this parameter is not actively used.
ns: Quantile to calculate in each bootstrap iteration (default: 0.01). This is the quantile of depth values used within each bootstrap sample to estimate the cutoff.
boot: Bootstrap procedure to use (default: multiMBBo). Currently only multiMBBo (Moving Block Bootstrap for multivariate data) is implemented.
weights: Vector of weights for each univariate process/variable (default: c(1/3, 1/3, 1/3) for equal weights).
- Must have length equal to the number of variables
- Weights should sum to 1 (or will be normalized)
- Higher weights emphasize certain variables in the depth calculation
- Example: c(0.5, 0.3, 0.2) gives more weight to the first variable

Return Value

The function returns a list with the following components:

outliers: Vector of detected outlier indices (as row names from the original data)
outlier_depths: Vector of depth values for the detected outliers
iteration: Vector indicating the iteration when each outlier was found
quantile: Estimated cutoff value used for outlier detection
Dep: Vector of depth values for all curves (after iterative process)

Example Usage

# Prepare multivariate functional data
n_obs <- 100
n_time <- 50
n_vars <- 3

mfdataobj <- list()
for (v in 1:n_vars) {
  data_matrix <- matrix(rnorm(n_obs * n_time), nrow = n_obs, ncol = n_time)
  mfdataobj[[v]] <- fdata(data_matrix)
}

# Run outlier detection with equal weights
result <- outlier_multivariate_bootstrap(
    mfdataobj = mfdataobj,
    nb = 200,
    quan = 0.5,
    dfunc = multiMBD,
    l = 4,
    ns = 0.01,
    boot = multiMBBo,
    weights = c(1/3, 1/3, 1/3)  # Equal weights
)

# Access results
print(paste("Number of outliers detected:", length(result$outliers)))
print(paste("Outlier indices:", result$outliers))
print(paste("Cutoff value:", result$quantile))

2. `multivariate_functional_boxplot()` - Functional Boxplot Method

This function implements outlier detection using functional boxplots for multivariate functional data.

Function Signature

multivariate_functional_boxplot(
    mfdataobj,
    dfunc = multiMBD,
    weights = NULL,
    ...
)

Parameters

mfdataobj: List containing multivariate functional data objects
dfunc: Depth function to use (default: multiMBD)
weights: Vector of weights for each variable (default: NULL, which sets equal weights automatically)

Return Value

The function returns a list with:

outliers: Vector of detected outlier indices

Example Usage

# Run functional boxplot outlier detection
result <- multivariate_functional_boxplot(
    mfdataobj = mfdataobj,
    dfunc = multiMBD,
    weights = c(1/3, 1/3, 1/3)
)

print(paste("Number of outliers detected:", length(result$outliers)))

Bootstrap Methods

multiMBBo (Multivariate Moving Block Bootstrap)

When to use: Recommended for multivariate functional time series with temporal dependence.

How it works:

Calculates multivariate functional depth for all observations
Removes outliers using a naive approach (functional boxplot) to get a trimmed sample
Divides the time series into overlapping blocks of size l
Samples blocks with replacement to create bootstrap samples (preserving temporal structure across all variables)
Calculates depth quantiles from bootstrap samples to estimate cutoff

Key features:

Preserves temporal dependence structure across all variables
Uses the same block indices for all variables, maintaining multivariate structure
Supports weighted depth calculations

Example usage:

# Prepare multivariate functional time series data
mfdataobj <- list(
  var1 = fdata(matrix(rnorm(100*50), nrow = 100, ncol = 50)),
  var2 = fdata(matrix(rnorm(100*50), nrow = 100, ncol = 50)),
  var3 = fdata(matrix(rnorm(100*50), nrow = 100, ncol = 50))
)

# Run outlier detection with multiMBBo
result <- outlier_multivariate_bootstrap(
    mfdataobj = mfdataobj,
    nb = 200,
    quan = 0.5,
    dfunc = multiMBD,
    l = 4,
    ns = 0.01,
    boot = multiMBBo,
    weights = c(1/3, 1/3, 1/3)
)

Key parameters for multiMBBo:

l: Block size. Should reflect the temporal dependence structure:
- Small values (2-4): For weak dependence
- Medium values (5-10): For moderate dependence
- Large values (10+): For strong dependence
nb: Number of bootstrap samples. Increase for more stable estimates.
weights: Variable weights. Use equal weights if all variables are equally important, or adjust based on domain knowledge.

Understanding Weights

The weights parameter allows you to control the relative importance of each variable in the multivariate depth calculation:

Equal Weights (Default)

# All variables equally important
weights = c(1/3, 1/3, 1/3)  # For 3 variables
weights = c(0.25, 0.25, 0.25, 0.25)  # For 4 variables

Unequal Weights

# Emphasize first variable
weights = c(0.5, 0.3, 0.2)

# Emphasize last variable
weights = c(0.2, 0.3, 0.5)

# Single variable focus (weight one variable heavily)
weights = c(0.8, 0.1, 0.1)

When to Adjust Weights

Domain knowledge: If certain variables are more important for outlier detection
Variable quality: If some variables are more reliable or have less noise
Outlier patterns: If outliers tend to manifest more strongly in certain variables
Experimental tuning: Try different weight combinations and compare results

Note: Weights should generally sum to 1, though the function may normalize them automatically.

Complete Examples

Example 1: Basic Multivariate Outlier Detection

# Load required libraries
library(fda)
library(roahd)
library(fda.usc)
library(mrfDepth)

# Source required files
source("R/utils.R")
source("R/simulated-models.R")
source("R/depths.R")
source("R/multivariate_process/bootstrap-procedures.R")
source("R/multivariate_process/outlier-detection-procedures.R")

# Create multivariate functional data
set.seed(123)
n_obs <- 100
n_time <- 50
n_vars <- 3

mfdataobj <- list()
for (v in 1:n_vars) {
  data_matrix <- matrix(rnorm(n_obs * n_time), 
                       nrow = n_obs, 
                       ncol = n_time)
  mfdataobj[[v]] <- fdata(data_matrix)
}

# Add row names
for (v in 1:n_vars) {
  row.names(mfdataobj[[v]]) <- 1:n_obs
}

# Run outlier detection
result <- outlier_multivariate_bootstrap(
    mfdataobj = mfdataobj,
    nb = 200,
    quan = 0.5,
    dfunc = multiMBD,
    l = 4,
    ns = 0.01,
    boot = multiMBBo,
    weights = c(1/3, 1/3, 1/3)
)

# Analyze results
cat("Total outliers detected:", length(result$outliers), "\n")
cat("Outlier indices:", result$outliers, "\n")
cat("Cutoff value:", result$quantile, "\n")
cat("Outlier depths:", result$outlier_depths, "\n")

Example 2: Using Simulated Models

# Use built-in simulation models
set.seed(1234)

# Generate multivariate functional data with contamination
fit <- multifdata(
    rho = 0.8,
    k = 10,  # Contamination level
    model = magnitude,  # Magnitude outliers
    plot = FALSE,
    dim = 3  # 3 variables
)

mfdataobj <- fit$mfdataobj
generated_outliers <- fit$outliers

# Run outlier detection
result <- outlier_multivariate_bootstrap(
    mfdataobj = mfdataobj,
    nb = 200,
    quan = 0.5,
    dfunc = multiMBD,
    l = 4,
    ns = 0.01,
    boot = multiMBBo,
    weights = c(1/3, 1/3, 1/3)
)

# Compare with known outliers
detected_outliers <- as.numeric(result$outliers)
true_positives <- sum(detected_outliers %in% generated_outliers)
false_positives <- sum(!(detected_outliers %in% generated_outliers))

cat("True positives:", true_positives, "out of", length(generated_outliers), "\n")
cat("False positives:", false_positives, "\n")

Example 3: Comparing Different Weight Schemes

# Test different weight schemes
weight_schemes <- list(
  equal = c(1/3, 1/3, 1/3),
  emphasize_first = c(0.6, 0.2, 0.2),
  emphasize_last = c(0.2, 0.2, 0.6)
)

results_list <- list()

for (scheme_name in names(weight_schemes)) {
  result <- outlier_multivariate_bootstrap(
      mfdataobj = mfdataobj,
      nb = 200,
      quan = 0.5,
      dfunc = multiMBD,
      l = 4,
      ns = 0.01,
      boot = multiMBBo,
      weights = weight_schemes[[scheme_name]]
  )
  
  results_list[[scheme_name]] <- result
  cat(scheme_name, "weights detected", length(result$outliers), "outliers\n")
}

# Compare outlier sets
if (length(results_list) > 1) {
  outlier_sets <- lapply(results_list, function(r) r$outliers)
  common_outliers <- Reduce(intersect, outlier_sets)
  cat("Common outliers across all weight schemes:", length(common_outliers), "\n")
}

Example 4: Functional Boxplot Method

# Run functional boxplot outlier detection
result_fbplot <- multivariate_functional_boxplot(
    mfdataobj = mfdataobj,
    dfunc = multiMBD,
    weights = c(1/3, 1/3, 1/3)
)

# Compare with bootstrap method
result_bootstrap <- outlier_multivariate_bootstrap(
    mfdataobj = mfdataobj,
    nb = 200,
    quan = 0.5,
    dfunc = multiMBD,
    l = 4,
    boot = multiMBBo,
    weights = c(1/3, 1/3, 1/3)
)

cat("Functional boxplot detected", length(result_fbplot$outliers), "outliers\n")
cat("Bootstrap method detected", length(result_bootstrap$outliers), "outliers\n")

Parameter Tuning Guidelines

nb: Start with 200. Increase to 500-1000 for more stable results, especially with small sample sizes.
quan:
- 0.5 (median): Balanced approach
- Lower values (0.25): More conservative (fewer outliers)
- Higher values (0.75): More liberal (more outliers)
ns: Typically kept at 0.01. This is the quantile used within each bootstrap iteration.
l: Block size for multiMBBo. Choose based on temporal dependence:
- Weak dependence: l = 2-4
- Moderate dependence: l = 5-10
- Strong dependence: l = 10+
weights:
- Start with equal weights: rep(1/n_vars, n_vars)
- Adjust based on domain knowledge or experimental results
- Consider the relative importance and reliability of each variable
dfunc: Currently multiMBD is the default and recommended choice.

Differences from Univariate Methods

Aspect	Univariate	Multivariate
Data structure	Single `fdata` object or matrix	List of `fdata` objects/matrices
Depth function	`MBD`, `MD`, etc.	`multiMBD` (multivariate depth)
Weights	Not applicable	Can weight variables
Bootstrap	`MBBo`, `SmBoD`, `StBo`	`multiMBBo` (only)
Outlier detection	Based on single variable	Based on joint behavior across variables

Advantages of Multivariate Approach

Joint detection: Can identify outliers that are not apparent in individual variables
Correlation structure: Considers relationships between variables
Weighted emphasis: Allows prioritizing certain variables
Temporal coherence: Preserves temporal structure across all variables simultaneously

Limitations

Computational cost: More expensive than univariate methods (multiple variables)
Weight selection: Requires careful consideration or experimentation
Data requirements: All variables must have same number of observations
Limited bootstrap options: Currently only multiMBBo is implemented (no SmBoD or StBo variants)

Troubleshooting

Error: "object must be a list": Ensure mfdataobj is a list, not a single object.
Error: "ERROR IN THE DATA DIMENSIONS": Ensure all variables have the same number of rows (observations).
Error related to weights: Ensure weights vector has length equal to the number of variables.
No outliers detected: Try adjusting quan to a higher value or check if your data actually contains outliers.
Too many outliers detected: Try adjusting quan to a lower value or increase nb for more stable cutoff estimation.
Error related to multiMBD: Ensure the multiMBD function is available (source R/depths.R).

Algorithm Details

Multivariate Bootstrap Algorithm

Bootstrap cutoff estimation:
- Uses multiMBBo to generate bootstrap samples
- Calculates multivariate depth for each bootstrap sample
- Estimates cutoff from the distribution of depth quantiles
Iterative outlier removal:
- Calculates multivariate depths for all observations
- Identifies observations with depth < cutoff
- Removes identified outliers from all variables simultaneously
- Recalculates depths on remaining data
- Repeats until no more outliers are found or more than 20% of data is removed
Stopping criteria: The algorithm stops when:
- No outliers are found in an iteration, OR
- More than 20% of the original data has been identified as outliers

Functional Boxplot Algorithm

Iterative process:
- Calculates multivariate depths for all observations
- For each variable, uses functional boxplot to detect outliers
- Combines outliers detected across all variables
- Removes identified outliers from all variables
- Repeats until no more outliers are found

Best Practices

Data preparation: Ensure all variables have consistent structure and row names
Weight selection: Start with equal weights, then adjust based on results or domain knowledge
Block size: Choose l based on temporal dependence structure
Validation: Test on data with known outliers if available
Compare methods: Try both bootstrap and functional boxplot methods
Check results: Examine outlier depths and iteration information to understand detection process

References

For the original univariate methods, see:

Raña, P., Aneiros, G. and Vilar, J. (2015) Detection of outliers in functional time series. Environmetrics, 26, 178–191.

For multivariate functional depth, see:

General literature on multivariate functional data analysis and depth measures

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detection of outliers in multivariate functional time series

Overview

Prerequisites

Data Structure

Example Data Structure

Main Functions

1. `outlier_multivariate_bootstrap()` - Bootstrap-based Outlier Detection

Function Signature

Parameters

Return Value

Example Usage

2. `multivariate_functional_boxplot()` - Functional Boxplot Method

Function Signature

Parameters

Return Value

Example Usage

Bootstrap Methods

multiMBBo (Multivariate Moving Block Bootstrap)

Understanding Weights

Equal Weights (Default)

Unequal Weights

When to Adjust Weights

Complete Examples

Example 1: Basic Multivariate Outlier Detection

Example 2: Using Simulated Models

Example 3: Comparing Different Weight Schemes

Example 4: Functional Boxplot Method

Parameter Tuning Guidelines

Differences from Univariate Methods

Advantages of Multivariate Approach

Limitations

Troubleshooting

Algorithm Details

Multivariate Bootstrap Algorithm

Functional Boxplot Algorithm

Best Practices

References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Detection of outliers in multivariate functional time series

Overview

Prerequisites

Data Structure

Example Data Structure

Main Functions

1. outlier_multivariate_bootstrap() - Bootstrap-based Outlier Detection

Function Signature

Parameters

Return Value

Example Usage

2. multivariate_functional_boxplot() - Functional Boxplot Method

Function Signature

Parameters

Return Value

Example Usage

Bootstrap Methods

multiMBBo (Multivariate Moving Block Bootstrap)

Understanding Weights

Equal Weights (Default)

Unequal Weights

When to Adjust Weights

Complete Examples

Example 1: Basic Multivariate Outlier Detection

Example 2: Using Simulated Models

Example 3: Comparing Different Weight Schemes

Example 4: Functional Boxplot Method

Parameter Tuning Guidelines

Differences from Univariate Methods

Advantages of Multivariate Approach

Limitations

Troubleshooting

Algorithm Details

Multivariate Bootstrap Algorithm

Functional Boxplot Algorithm

Best Practices

References

1. `outlier_multivariate_bootstrap()` - Bootstrap-based Outlier Detection

2. `multivariate_functional_boxplot()` - Functional Boxplot Method