Skip to contents

nmfkc fits a nonnegative matrix factorization with kernel covariates under the tri-factorization model \(Y \approx X C A = X B\).

This function supports two major input modes:

  1. Matrix Mode (Existing): nmfkc(Y=matrix, A=matrix, ...)

  2. Formula Mode (New): nmfkc(formula=Y_vars ~ A_vars, data=df, rank=Q, ...)

The rank of the basis matrix can be specified using either the rank argument (preferred for formula mode) or the hidden Q argument (for backward compatibility).

Usage

nmfkc(
  Y,
  A = NULL,
  rank = NULL,
  data,
  epsilon = 1e-04,
  maxit = 5000,
  verbose = TRUE,
  ...
)

Arguments

Y

Observation matrix (P x N), OR a formula object for Formula Mode. In Formula Mode, use Y1 + Y2 ~ A1 + A2 with data, or Y_matrix ~ A_matrix for direct matrix evaluation. Supports dot notation (. ~ A1 + A2) when data is supplied.

A

Covariate matrix. Default is NULL (no covariates). Ignored when Y is a formula.

rank

Integer. The rank of the basis matrix \(X\) (Q). Preferred over Q.

data

Optional. A data frame from which variables in the formula should be taken.

epsilon

Positive convergence tolerance.

maxit

Maximum number of iterations.

verbose

Logical. If TRUE (default), prints matrix dimensions and elapsed time.

...

Additional arguments passed for fine-tuning regularization, initialization, constraints, and output control. This includes the backward-compatible arguments Q and method.

  • Y.weights: Optional numeric matrix (P x N) or vector (length N). 0 indicates missing/ignored values. If NULL (default), weights are automatically set to 0 for NAs in Y, and 1 otherwise.

  • X.L2.ortho: Nonnegative penalty parameter for the orthogonality of \(X\) (default: 0). It minimizes the off-diagonal elements of the Gram matrix \(X^\top X\), reducing the correlation between basis vectors (conceptually minimizing \(\| X^\top X - \mathrm{diag}(X^\top X) \|_F^2\)). (Formerly lambda.ortho).

  • B.L1: Nonnegative penalty parameter for L1 regularization on \(B = C A\) (default: 0). Promotes sparsity in the coefficients. (Formerly gamma).

  • C.L1: Nonnegative penalty parameter for L1 regularization on \(C\) (default: 0). Promotes sparsity in the parameter matrix. (Formerly lambda).

  • Q: Backward-compatible name for the rank of the basis matrix (Q).

  • method: Objective function: Euclidean distance "EU" (default) or Kullback–Leibler divergence "KL".

  • X.restriction: Constraint for columns of \(X\). Options: "colSums" (default), "colSqSums", "totalSum", "none", or "fixed". "none" applies no normalization to \(X\) after each update, allowing it to absorb the scale freely. This is automatically set when Y.symmetric = "bi" or "tri", because column normalization would prevent \(X X^\top\) (or \(X C X^\top\)) from approximating \(Y\) at the correct scale.

  • X.init: Method for initializing the basis matrix \(X\). Options: "kmeans" (default), "kmeansar", "runif", "nndsvd", or a user-specified matrix. "kmeansar" applies \(k\)-means initialization and then fills zero entries with Uniform(0, mean(Y)/100), analogous to NNDSVDar.

  • nstart: Number of random starts for kmeans when initializing \(X\) (default: 1).

  • seed: Integer seed for reproducibility (default: 123).

  • C.init: Optional numeric matrix giving the initial value of the parameter matrix \(C\) (i.e., \(\Theta\)). If A is NULL, C has dimension \(Q \times N\) (equivalently \(B\)); otherwise, C has dimension \(Q \times K\) where \(K = nrow(A)\). Default initializes all entries to 1.

  • Y.symmetric: Character string specifying the type of symmetric NMF. "none" (default): standard NMF (\(Y \approx XB\)). "bi": 2-factor symmetric NMF (\(Y \approx X X^\top\)). Internally implemented as the "tri" model with \(C = I_Q\) (identity matrix) held fixed, so that \(X\) is updated freely without column normalization. The multiplicative update for \(X\) uses cube-root damping (\(X \leftarrow X \circ (numerator / denominator)^{1/3}\)) to prevent oscillation, since \(X\) appears in both factors of the decomposition (He et al., 2011, Proposition 1). "tri": 3-factor symmetric NMF (\(Y \approx X C X^\top\)) where \(C\) is a \(Q \times Q\) matrix representing cluster interactions (Ding et al., 2006). Both "bi" and "tri" require Y to be square and cannot be used with covariate matrix A. When Y.symmetric = "bi", X.restriction is automatically set to "none" (no column normalization), because \(C = I_Q\) is fixed and cannot absorb the scale. For "tri", column normalization is retained (default "colSums") because the free parameter matrix \(C\) absorbs the scale.

  • prefix: Prefix for column names of \(X\) and row names of \(B\) (default: "Basis").

  • print.trace: Logical. If TRUE, prints progress every 10 iterations (default: FALSE).

  • print.dims: Deprecated. Use verbose instead.

  • detail: Level of post-fit criterion computation. "full" computes all criteria including silhouette, CPCC, dist.cor; "fast" skips expensive distance-based criteria; "minimal" returns only information criteria. Default is "full". For backward compatibility, save.time = TRUE maps to "fast" and save.memory = TRUE maps to "minimal".

Value

A list with components:

call

The matched call, as captured by match.call().

dims

A character string summarizing the matrix dimensions of the model.

runtime

A character string summarizing the computation time.

X

Basis matrix. Column normalization depends on X.restriction.

B

Coefficient matrix \(B = C A\).

XB

Fitted values for \(Y\).

C

Parameter matrix.

B.prob

Soft-clustering probabilities derived from columns of \(B\).

B.cluster

Hard-clustering labels (argmax over \(B.prob\) for each column).

X.prob

Row-wise soft-clustering probabilities derived from \(X\).

X.cluster

Hard-clustering labels (argmax over \(X.prob\) for each row).

A.attr

List of attributes of the input covariate matrix A, containing metadata like lag order and intercept status if created by nmfkc.ar or nmfkc.kernel.

formula.meta

If fitted via Formula Mode, a list with formula, Y_cols, and A_cols; otherwise NULL.

objfunc

Final objective value.

objfunc.iter

Objective values by iteration.

r.squared

Coefficient of determination \(R^2\) between \(Y\) and \(X B\).

method

Character string indicating the optimization method used ("EU" or "KL").

n.missing

Number of missing (or zero-weighted) elements in \(Y\).

n.total

Total number of elements in \(Y\).

rank

The rank \(Q\) used in the factorization.

sigma

The residual standard error, representing the typical deviation of the observed values \(Y\) from the fitted values \(X B\).

mae

Mean Absolute Error between \(Y\) and \(X B\).

criterion

A list of selection criteria, including ICp (ICp1, ICp2, ICp3), CPCC, silhouette, AIC, BIC, dist.cor, B.prob.sd.min, B.prob.max.mean, and B.prob.entropy.mean.

References

Satoh, K. (2024). Applying Non-negative Matrix Factorization with Covariates to the Longitudinal Data as Growth Curve Model. arXiv:2403.05359. https://arxiv.org/abs/2403.05359

Satoh, K. (2025). Applying non-negative matrix factorization with covariates to multivariate time series data as a vector autoregression model. Japanese Journal of Statistics and Data Science. arXiv:2501.17446. doi:10.1007/s42081-025-00314-0

Satoh, K. (2025). Applying non-negative matrix factorization with covariates to label matrix for classification. arXiv:2510.10375. https://arxiv.org/abs/2510.10375

Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthogonal Nonnegative Matrix Tri-Factorizations for Clustering. In Proc. 12th ACM SIGKDD (pp. 126–135). doi:10.1145/1150402.1150420

Examples

# Example 1. Matrix Mode (Existing)
X <- cbind(c(1,0,1),c(0,1,0))
B <- cbind(c(1,0),c(0,1),c(1,1))
Y <- X %*% B
rownames(Y) <- paste0("P",1:nrow(Y))
colnames(Y) <- paste0("N",1:ncol(Y))
print(X); print(B); print(Y)
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1
#> [3,]    1    0
#>      [,1] [,2] [,3]
#> [1,]    1    0    1
#> [2,]    0    1    1
#>    N1 N2 N3
#> P1  1  0  1
#> P2  0  1  1
#> P3  1  0  1
res <- nmfkc(Y,rank=2,epsilon=1e-6)
#> Y(3,3)~X(3,2)B(2,3)...
#> 0sec
res$X
#>    Basis1      Basis2
#> P1      0 0.498047869
#> P2      1 0.003904261
#> P3      0 0.498047869
res$B
#>              N1           N2        N3
#> Basis1 0.000000 0.9999995176 0.9920988
#> Basis2 2.007838 0.0001206861 2.0079012

# Example 2. Formula Mode
set.seed(1)
dummy_data <- data.frame(Y1=rpois(10,5), Y2=rpois(10,10),
                         A1=abs(rnorm(10,5)), A2=abs(rnorm(10,3)))
res_f <- nmfkc(Y1 + Y2 ~ A1 + A2, data=dummy_data, rank=2)
#> A created using columns: A1, A2
#> Y(2,10)~X(2,2)C(2,2)A(2,10)=XB(2,10)...
#> 0sec

# Example 3. Symmetric NMF (bi: Y ~ X X^T)
S <- matrix(c(3,0,2, 0,3,1, 2,1,2), nrow=3)
res_bi <- nmfkc(S, rank=2, Y.symmetric="bi")
#> Y(3,3)~X(3,2)B(2,3)...
#> 0sec
res_bi$X   # basis matrix (no column normalization)
#>         Basis1     Basis2
#> [1,] 0.0000000 1.71131965
#> [2,] 1.7274149 0.01105598
#> [3,] 0.5977175 1.22664897
res_bi$XB  # reconstruction X %*% t(X)
#>            [,1]       [,2]     [,3]
#> [1,] 2.92861495 0.01892032 2.099188
#> [2,] 0.01892032 2.98408448 1.046068
#> [3,] 2.09918848 1.04606801 1.861934

# Example 4. Symmetric NMF (tri: Y ~ X C X^T)
res_tri <- nmfkc(S, rank=2, Y.symmetric="tri")
#> Y(3,3)~X(3,2)B(2,3)...
#> 0sec
res_tri$C  # Q x Q cluster interaction matrix
#>                 1          2
#> Basis1 0.03785993 8.61053418
#> Basis2 5.41314704 0.03785993
res_tri$XB # reconstruction X %*% C %*% t(X)
#>            [,1]       [,2]     [,3]
#> [1,] 2.92953729 0.01662336 2.098360
#> [2,] 0.01662336 2.98320036 1.047195
#> [3,] 2.09836042 1.04719548 1.862305