Cross-Validation for NMF-SEM — nmf.sem.cv • nmfkc

Performs K-fold cross-validation to evaluate the equilibrium mapping of the NMF-SEM model.

For each fold, nmf.sem is fitted on the training samples, yielding an equilibrium mapping \(\hat Y_1 = M_{\mathrm{model}} Y_2\). The held-out endogenous variables \(Y_1\) are then predicted from \(Y_2\) using this mapping, and the mean absolute error (MAE) over all entries in the test block is computed. The returned value is the average MAE across folds.

This implements the hyperparameter selection strategy described in the paper: hyperparameters are chosen by predictive cross-validation rather than direct inspection of the internal structural matrices.

nmf.sem.cv(
  Y1,
  Y2,
  rank = NULL,
  X.init = NULL,
  X.L2.ortho = 100,
  C1.L1 = 0.5,
  C2.L1 = 0,
  epsilon = 1e-04,
  maxit = 50000,
  seed = NULL,
  div = 5,
  shuffle = TRUE,
  ...
)

Arguments

Y1: A non-negative numeric matrix of endogenous variables with rows = variables (P1), columns = samples (N).
Y2: A non-negative numeric matrix of exogenous variables with rows = variables (P2), columns = samples (N). Must satisfy ncol(Y1) == ncol(Y2).
rank: Integer; rank (number of latent factors) passed to nmf.sem. If NULL, nmf.sem decides the effective rank (via ... or nrow(Y2)).
X.init: Optional initialization for X (as in nmf.sem).
X.L2.ortho: L2 orthogonality penalty for X.
C1.L1: L1 sparsity penalty for C1 (\(\Theta_1\)).
C2.L1: L1 sparsity penalty for C2 (\(\Theta_2\)).
epsilon: Convergence threshold for nmf.sem.
maxit: Maximum number of iterations for nmf.sem.
seed: Master random seed for CV splitting and fold-specific calls to nmf.sem. If NULL, RNG is not controlled within folds.
div: Number of CV folds. (Default: 5)
shuffle: Logical; if TRUE, samples are randomly permuted before assigning to folds. (Default: TRUE)
...: Additional arguments passed to nmf.sem (except for rank, seed, div, shuffle, which are handled here).

Value

A numeric scalar: mean MAE across CV folds.