Skip to contents

nmfkc fits a nonnegative matrix factorization with kernel covariates under the tri-factorization model \(Y \approx X C A = X B\).

This function supports two major input modes:

  1. Matrix Mode (Existing): nmfkc(Y=matrix, A=matrix, ...)

  2. Formula Mode (New): nmfkc(formula=Y_vars ~ A_vars, data=df, rank=Q, ...)

The rank of the basis matrix can be specified using either the rank argument (preferred for formula mode) or the hidden Q argument (for backward compatibility).

Usage

nmfkc(
  Y,
  A = NULL,
  rank = NULL,
  data,
  epsilon = 1e-04,
  maxit = 5000,
  verbose = TRUE,
  ...
)

Arguments

Y

Observation matrix (P x N), OR a formula object for Formula Mode. In Formula Mode, use Y1 + Y2 ~ A1 + A2 with data, or Y_matrix ~ A_matrix for direct matrix evaluation. Supports dot notation (. ~ A1 + A2) when data is supplied.

A

Covariate matrix. Default is NULL (no covariates). Ignored when Y is a formula.

rank

Integer. The rank of the basis matrix \(X\) (Q). Preferred over Q.

data

Optional. A data frame from which variables in the formula should be taken.

epsilon

Positive convergence tolerance.

maxit

Maximum number of iterations.

verbose

Logical. If TRUE (default), prints matrix dimensions and elapsed time.

...

Additional arguments passed for fine-tuning regularization, initialization, constraints, and output control. This includes the backward-compatible arguments Q and method.

  • Y.weights: Optional weight matrix (P x N) or vector (length N) with non-negative entries, analogous to the weights argument of lm. When supplied, the objective becomes \(\sum W_{ij} \, (Y_{ij} - (XB)_{ij})^2\) (i.e.\ linear in \(W\); lm()-style weighted least squares). Logical matrices (TRUE / FALSE) are also accepted and coerced to 1 / 0. The primary use case is missing-value masking for ECV / CV, where \(W_{ij} \in \{0, 1\}\) (FALSE / TRUE) indicates held-out vs.\ used elements; real-valued weights for observation-level importance weighting are also supported. Default NULL: if Y contains NA a binary mask is auto-constructed (0 for NA, 1 elsewhere); otherwise no weighting.

  • X.L2.ortho: Nonnegative penalty parameter for the orthogonality of \(X\) (default: 0). It minimizes the off-diagonal elements of the Gram matrix \(X^\top X\), reducing the correlation between basis vectors (conceptually minimizing \(\| X^\top X - \mathrm{diag}(X^\top X) \|_F^2\)). (Formerly lambda.ortho).

  • B.L1: Nonnegative penalty parameter for L1 regularization on \(B = C A\) (default: 0). Promotes sparsity in the coefficients. (Formerly gamma).

  • C.L1: Nonnegative penalty parameter for L1 regularization on \(C\) (default: 0). Promotes sparsity in the parameter matrix. (Formerly lambda).

  • Q: Backward-compatible name for the rank of the basis matrix (Q).

  • method: Objective function: Euclidean distance "EU" (default) or Kullback–Leibler divergence "KL".

  • X.restriction: Constraint for columns of \(X\). Options: "colSums" (default), "colSqSums", "totalSum", "none", or "fixed". "none" applies no normalization to \(X\) after each update, allowing it to absorb the scale freely.

  • X.init: Method for initializing the basis matrix \(X\). Options: "kmeans" (default), "kmeansar", "runif", "nndsvd", or a user-specified matrix. "kmeansar" applies \(k\)-means initialization and then fills zero entries with Uniform(0, mean(Y)/100), analogous to NNDSVDar.

  • nstart: Number of random starts for initialization of \(X\) (default: 1). Used by kmeans (when X.init = "kmeans" or "kmeansar") and by the multi-start evaluation (when X.init = "runif").

  • seed: Integer seed for reproducibility (default: 123).

  • C.init: Optional numeric matrix giving the initial value of the parameter matrix \(C\) (i.e., \(\Theta\)). If A is NULL, C has dimension \(Q \times N\) (equivalently \(B\)); otherwise, C has dimension \(Q \times K\) where \(K = nrow(A)\). Default initializes all entries to 1.

  • Y.symmetric: Removed. Symmetric NMF (\(Y \approx X X^\top\) or \(X C X^\top\)) has moved to the dedicated nmfkc.net function (types "tri", "bi", "signed"), which uses the correct Frobenius bilateral-gradient updates. Passing Y.symmetric to nmfkc() now stops with a message pointing to nmfkc.net().

  • prefix: Prefix for column names of \(X\) and row names of \(B\) (default: "Basis").

  • print.trace: Logical. If TRUE, prints progress every 10 iterations (default: FALSE).

  • print.dims: Deprecated. Use verbose instead.

  • detail: Level of post-fit criterion computation. "full" computes all criteria including silhouette, CPCC, dist.cor; "fast" skips expensive distance-based criteria; "minimal" returns only information criteria. Default is "full". For backward compatibility, save.time = TRUE maps to "fast" and save.memory = TRUE maps to "minimal".

Value

A list with components:

call

The matched call, as captured by match.call().

dims

A character string summarizing the matrix dimensions of the model.

runtime

A character string summarizing the computation time.

X

Basis matrix. Column normalization depends on X.restriction.

B

Coefficient matrix \(B = C A\).

XB

Fitted values for \(Y\).

C

Parameter matrix.

B.prob

Soft-clustering probabilities derived from columns of \(B\).

B.cluster

Hard-clustering labels (argmax over \(B.prob\) for each column).

X.prob

Row-wise soft-clustering probabilities derived from \(X\).

X.cluster

Hard-clustering labels (argmax over \(X.prob\) for each row).

A.attr

List of attributes of the input covariate matrix A, containing metadata like lag order and intercept status if created by nmfkc.ar or nmfkc.kernel.

formula.meta

If fitted via Formula Mode, a list with formula, Y_cols, and A_cols; otherwise NULL.

objfunc

Final objective value.

objfunc.iter

Objective values by iteration.

r.squared

\(R^2 = \mathrm{cor}(Y, XB)^2\) (Pearson; scale-invariant; \([0,1]\)).

r.squared.uncentered

Uncentered \(R^2 = 1 - \|Y - XB\|_F^2 / \|Y\|_F^2\) (baseline = zero matrix; natural for non-negative factorizations without an intercept).

r.squared.centered

Row-mean centered \(R^2 = 1 - \|Y - XB\|_F^2 / \|Y - \bar Y_{p\cdot}\|_F^2\), the multivariate regression \(R^2\).

method

Character string indicating the optimization method used ("EU" or "KL").

n.missing

Number of missing (or zero-weighted) elements in \(Y\).

n.total

Total number of elements in \(Y\).

rank

The rank \(Q\) used in the factorization.

sigma

The residual standard error, representing the typical deviation of the observed values \(Y\) from the fitted values \(X B\).

mae

Mean Absolute Error between \(Y\) and \(X B\).

criterion

A list of selection criteria: silhouette (mean silhouette width of the hard clustering, computed in the original data space dist(t(Y)) with the per-sample labels), CPCC (cophenetic correlation of a hierarchical clustering of the coefficient distances dist(t(B))), dist.cor (correlation between original-data and coefficient distances), B.prob.max.mean (clustering crispness: mean dominant-cluster membership, in \([1/Q, 1]\); meaningful at fixed \(Q\) as a confidence check before using B.cluster as hard labels), and effective.rank. The last is the effective rank: \(\exp\) of the Shannon entropy of the explained-variance distribution \(p_k = \mathrm{var}(B_{k\cdot}) / \sum_j \mathrm{var}(B_{j\cdot})\). By the trace identity \(\sum_k \mathrm{var}(B_{k\cdot}) = \mathrm{tr}(\mathrm{Cov}(B))\), \(p_k\) is the exact fraction of the total coefficient variance carried by factor \(k\), so the entropy measures how that variance is spread across factors. It ranges in \([1, Q]\) (1 when one factor carries all the variance, \(Q\) when all contribute equally) and counts the number of latent factors that actively shape across-sample variation. This is the PCA-style explained-variance / effective-dimensionality measure and reuses the \(\exp(\mathrm{entropy})\) functional form of Roy & Vetterli (2007).

References

Satoh, K. (2024). Applying Non-negative Matrix Factorization with Covariates to the Longitudinal Data as Growth Curve Model. arXiv:2403.05359. https://arxiv.org/abs/2403.05359

Satoh, K. (2025). Applying non-negative matrix factorization with covariates to multivariate time series data as a vector autoregression model. Japanese Journal of Statistics and Data Science. arXiv:2501.17446. doi:10.1007/s42081-025-00314-0

Satoh, K. (2025). Applying non-negative matrix factorization with covariates to label matrix for classification. arXiv:2510.10375. https://arxiv.org/abs/2510.10375

Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthogonal Nonnegative Matrix Tri-Factorizations for Clustering. In Proc. 12th ACM SIGKDD (pp. 126–135). doi:10.1145/1150402.1150420

Roy, O., & Vetterli, M. (2007). The effective rank: A measure of effective dimensionality. In 15th European Signal Processing Conference (EUSIPCO) (pp. 606–610).

Examples

# Example 1. Matrix Mode (Existing)
X <- cbind(c(1,0,1),c(0,1,0))
B <- cbind(c(1,0),c(0,1),c(1,1))
Y <- X %*% B
rownames(Y) <- paste0("P",1:nrow(Y))
colnames(Y) <- paste0("N",1:ncol(Y))
print(X); print(B); print(Y)
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1
#> [3,]    1    0
#>      [,1] [,2] [,3]
#> [1,]    1    0    1
#> [2,]    0    1    1
#>    N1 N2 N3
#> P1  1  0  1
#> P2  0  1  1
#> P3  1  0  1
res <- nmfkc(Y,rank=2,epsilon=1e-6)
#> Y(3,3)~X(3,2)B(2,3)...
#> 0sec
res$X
#>    Basis1      Basis2
#> P1      0 0.498047869
#> P2      1 0.003904261
#> P3      0 0.498047869
res$B
#>              N1           N2        N3
#> Basis1 0.000000 0.9999995176 0.9920988
#> Basis2 2.007838 0.0001206861 2.0079012

# Example 2. Formula Mode
set.seed(1)
dummy_data <- data.frame(Y1=rpois(10,5), Y2=rpois(10,10),
                         A1=abs(rnorm(10,5)), A2=abs(rnorm(10,3)))
res_f <- nmfkc(Y1 + Y2 ~ A1 + A2, data=dummy_data, rank=2)
#> A created using columns: A1, A2
#> Y(2,10)~X(2,2)C(2,2)A(2,10)=XB(2,10)...
#> 0sec

# For symmetric NMF (Y approximated by X X^T or X C X^T),
# use \code{\link{nmfkc.net}()} instead.