Three-Layer Non-negative Matrix Factorization (NMF-AE)

nmfae fits a three-layer nonnegative matrix factorization model \(Y_1 \approx X_1 \Theta X_2 Y_2\), where \(X_1\) is a decoder basis (column sum 1), \(\Theta\) is a bottleneck parameter matrix, \(X_2\) is an encoder basis (row sum 1), and \(Y_2\) is the input matrix.

When Y2 = Y1, the model acts as a non-negative autoencoder. When Y1 != Y2, it acts as a heteroencoder.

Initialization uses a three-step NMF procedure via nmfkc: (1) nmfkc(Y1, rank=Q) to obtain \(X_1\), (2) nmfkc(Y1, A=Y2, rank=Q) with fixed \(X_1\) to obtain \(C = \Theta X_2\), (3) nmfkc(Y2, rank=R) to factor \(C\) into \(\Theta\) and \(X_2\).

Usage

nmfae(
  Y1,
  Y2 = Y1,
  rank = 2,
  rank.encoder = rank,
  epsilon = 1e-04,
  maxit = 5000,
  verbose = FALSE,
  ...
)

Source

Satoh, K. (2025). Applying Non-negative Matrix Factorization with Covariates to Multivariate Time Series. Japanese Journal of Statistics and Data Science.

Arguments

Y1

Output matrix \(Y_1\) (P1 x N). Non-negative. May contain NAs (handled via Y1.weights).

Y2

Input matrix \(Y_2\) (P2 x N). Non-negative. Default is Y1 (autoencoder).

rank

Integer. Rank of the decoder basis \(X_1\) (P1 x Q). Default is 2. For backward compatibility, Q is accepted via ....

rank.encoder

Integer. Rank of the encoder basis \(X_2\) (R x P2). Default is rank. For backward compatibility, R is accepted via ....

epsilon

Positive convergence tolerance. Default is 1e-4.

maxit

Maximum number of multiplicative update iterations. Default is 5000.

verbose

Logical. If TRUE, prints progress messages during fitting. Default is FALSE.

...

Additional arguments:

Y1.weights: Optional non-negative weight matrix (P1 x N) or vector for \(Y_1\), analogous to the weights argument of lm. Loss becomes \(\sum W_{ij} \, (Y_{1,ij} - \hat Y_{1,ij})^2\) (lm()-style, linear in \(W\)). Logical matrices (TRUE / FALSE) are also accepted. Typical ECV / CV usage passes a binary mask \(W \in \{0,1\}\) for held-out elements; real-valued weights for importance weighting are also supported. Default: if Y1 has NA, a binary mask is auto-generated (0 for NA, 1 elsewhere).
C.L1: L1 regularization parameter for \(C\). Default is 0.
X1.L2.ortho: L2 orthogonality regularization for \(X_1\) columns. Default is 0.
X2.L2.ortho: L2 orthogonality regularization for \(X_2\) rows. Default is 0.
seed: Integer seed for reproducibility. Default is 123.
print.trace: Logical. If TRUE, prints progress. Default is FALSE.

Value

An object of class "nmfae", a list with components:

X1: Decoder basis matrix (P1 x Q), column sum 1.
C: Parameter matrix (Q x R).
X2: Encoder basis matrix (R x P2), row sum 1.
Y1hat: Fitted values \(X_1 \Theta X_2 Y_2\) (P1 x N).
rank: Named integer vector c(Q, R).
objfunc: Final objective value.
objfunc.iter: Objective values by iteration.
r.squared: \(\mathrm{cor}(Y, \widehat Y)^2\) (Pearson; in \([0,1]\)).
r.squared.uncentered: Uncentered \(R^2 = 1 - \|Y - \widehat Y\|_F^2 / \|Y\|_F^2\) (baseline = zero matrix).
r.squared.centered: Row-mean centered \(1 - \|Y - \widehat Y\|_F^2 / \|Y - \bar Y_{p\cdot}\|_F^2\).
niter: Number of iterations performed.
runtime: Elapsed time as a difftime object.
n.missing: Number of missing (or zero-weighted) elements in \(Y_1\).
n.total: Total number of elements in \(Y_1\) (P1 x N).

Lifecycle

This function is experimental. The interface may change in future versions.

References

Lee, D. D. and Seung, H. S. (2001). Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems, 13.

Saha, S. et al. (2022). Hierarchical Deep Learning Neural Network (HiDeNN): An Artificial Intelligence (AI) Framework for Computational Science and Engineering. Computer Methods in Applied Mechanics and Engineering, 399.

Examples

# Autoencoder example
Y <- matrix(c(1,0,1,0, 0,1,0,1, 1,1,0,0), nrow=3, byrow=TRUE)
res <- nmfae(Y, rank=2, rank.encoder=2)
res$r.squared
#> [1] 0.7842934

# Heteroencoder example
Y1 <- matrix(c(1,0,0,1), nrow=2)
Y2 <- matrix(runif(8), nrow=4)
res2 <- nmfae(Y1, Y2, rank=2, rank.encoder=2)