nmfkc fits a nonnegative matrix factorization with kernel covariates
under the tri-factorization model \(Y \approx X C A = X B\).
This function supports two major input modes:
Matrix Mode (Existing):
nmfkc(Y=matrix, A=matrix, ...)Formula Mode (New):
nmfkc(formula=Y_vars ~ A_vars, data=df, rank=Q, ...)
The rank of the basis matrix can be specified using either the rank argument
(preferred for formula mode) or the hidden Q argument (for backward compatibility).
Arguments
- Y
Observation matrix (P x N), OR a formula object for Formula Mode. In Formula Mode, use
Y1 + Y2 ~ A1 + A2withdata, orY_matrix ~ A_matrixfor direct matrix evaluation. Supports dot notation (. ~ A1 + A2) whendatais supplied.- A
Covariate matrix. Default is
NULL(no covariates). Ignored whenYis a formula.- rank
Integer. The rank of the basis matrix \(X\) (Q). Preferred over
Q.- data
Optional. A data frame from which variables in the formula should be taken.
- epsilon
Positive convergence tolerance.
- maxit
Maximum number of iterations.
- verbose
Logical. If
TRUE(default), prints matrix dimensions and elapsed time.- ...
Additional arguments passed for fine-tuning regularization, initialization, constraints, and output control. This includes the backward-compatible arguments
Qandmethod.Y.weights: Optional numeric matrix (P x N) or vector (length N). 0 indicates missing/ignored values. If NULL (default), weights are automatically set to 0 for NAs in Y, and 1 otherwise.X.L2.ortho: Nonnegative penalty parameter for the orthogonality of \(X\) (default: 0). It minimizes the off-diagonal elements of the Gram matrix \(X^\top X\), reducing the correlation between basis vectors (conceptually minimizing \(\| X^\top X - \mathrm{diag}(X^\top X) \|_F^2\)). (Formerlylambda.ortho).B.L1: Nonnegative penalty parameter for L1 regularization on \(B = C A\) (default: 0). Promotes sparsity in the coefficients. (Formerlygamma).C.L1: Nonnegative penalty parameter for L1 regularization on \(C\) (default: 0). Promotes sparsity in the parameter matrix. (Formerlylambda).Q: Backward-compatible name for the rank of the basis matrix (Q).method: Objective function: Euclidean distance"EU"(default) or Kullback–Leibler divergence"KL".X.restriction: Constraint for columns of \(X\). Options:"colSums"(default),"colSqSums","totalSum","none", or"fixed"."none"applies no normalization to \(X\) after each update, allowing it to absorb the scale freely. This is automatically set whenY.symmetric = "bi"or"tri", because column normalization would prevent \(X X^\top\) (or \(X C X^\top\)) from approximating \(Y\) at the correct scale.X.init: Method for initializing the basis matrix \(X\). Options:"kmeans"(default),"kmeansar","runif","nndsvd", or a user-specified matrix."kmeansar"applies \(k\)-means initialization and then fills zero entries withUniform(0, mean(Y)/100), analogous to NNDSVDar.nstart: Number of random starts forkmeanswhen initializing \(X\) (default: 1).seed: Integer seed for reproducibility (default: 123).C.init: Optional numeric matrix giving the initial value of the parameter matrix \(C\) (i.e., \(\Theta\)). IfAisNULL,Chas dimension \(Q \times N\) (equivalently \(B\)); otherwise,Chas dimension \(Q \times K\) where \(K = nrow(A)\). Default initializes all entries to 1.Y.symmetric: Character string specifying the type of symmetric NMF."none"(default): standard NMF (\(Y \approx XB\))."bi": 2-factor symmetric NMF (\(Y \approx X X^\top\)). Internally implemented as the"tri"model with \(C = I_Q\) (identity matrix) held fixed, so that \(X\) is updated freely without column normalization. The multiplicative update for \(X\) uses cube-root damping (\(X \leftarrow X \circ (numerator / denominator)^{1/3}\)) to prevent oscillation, since \(X\) appears in both factors of the decomposition (He et al., 2011, Proposition 1)."tri": 3-factor symmetric NMF (\(Y \approx X C X^\top\)) where \(C\) is a \(Q \times Q\) matrix representing cluster interactions (Ding et al., 2006). Both"bi"and"tri"requireYto be square and cannot be used with covariate matrixA. WhenY.symmetric = "bi",X.restrictionis automatically set to"none"(no column normalization), because \(C = I_Q\) is fixed and cannot absorb the scale. For"tri", column normalization is retained (default"colSums") because the free parameter matrix \(C\) absorbs the scale.prefix: Prefix for column names of \(X\) and row names of \(B\) (default: "Basis").print.trace: Logical. IfTRUE, prints progress every 10 iterations (default:FALSE).print.dims: Deprecated. Useverboseinstead.detail: Level of post-fit criterion computation."full"computes all criteria including silhouette, CPCC, dist.cor;"fast"skips expensive distance-based criteria;"minimal"returns only information criteria. Default is"full". For backward compatibility,save.time = TRUEmaps to"fast"andsave.memory = TRUEmaps to"minimal".
Value
A list with components:
- call
The matched call, as captured by
match.call().- dims
A character string summarizing the matrix dimensions of the model.
- runtime
A character string summarizing the computation time.
- X
Basis matrix. Column normalization depends on
X.restriction.- B
Coefficient matrix \(B = C A\).
- XB
Fitted values for \(Y\).
- C
Parameter matrix.
- B.prob
Soft-clustering probabilities derived from columns of \(B\).
- B.cluster
Hard-clustering labels (argmax over \(B.prob\) for each column).
- X.prob
Row-wise soft-clustering probabilities derived from \(X\).
- X.cluster
Hard-clustering labels (argmax over \(X.prob\) for each row).
- A.attr
List of attributes of the input covariate matrix
A, containing metadata like lag order and intercept status if created bynmfkc.arornmfkc.kernel.- formula.meta
If fitted via Formula Mode, a list with
formula,Y_cols, andA_cols; otherwiseNULL.- objfunc
Final objective value.
- objfunc.iter
Objective values by iteration.
- r.squared
Coefficient of determination \(R^2\) between \(Y\) and \(X B\).
- method
Character string indicating the optimization method used (
"EU"or"KL").- n.missing
Number of missing (or zero-weighted) elements in \(Y\).
- n.total
Total number of elements in \(Y\).
- rank
The rank \(Q\) used in the factorization.
- sigma
The residual standard error, representing the typical deviation of the observed values \(Y\) from the fitted values \(X B\).
- mae
Mean Absolute Error between \(Y\) and \(X B\).
- criterion
A list of selection criteria, including
ICp(ICp1,ICp2,ICp3),CPCC,silhouette,AIC,BIC,dist.cor,B.prob.sd.min,B.prob.max.mean, andB.prob.entropy.mean.
References
Satoh, K. (2024). Applying Non-negative Matrix Factorization with Covariates to the Longitudinal Data as Growth Curve Model. arXiv:2403.05359. https://arxiv.org/abs/2403.05359
Satoh, K. (2025). Applying non-negative matrix factorization with covariates to multivariate time series data as a vector autoregression model. Japanese Journal of Statistics and Data Science. arXiv:2501.17446. doi:10.1007/s42081-025-00314-0
Satoh, K. (2025). Applying non-negative matrix factorization with covariates to label matrix for classification. arXiv:2510.10375. https://arxiv.org/abs/2510.10375
Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthogonal Nonnegative Matrix Tri-Factorizations for Clustering. In Proc. 12th ACM SIGKDD (pp. 126–135). doi:10.1145/1150402.1150420
Examples
# Example 1. Matrix Mode (Existing)
X <- cbind(c(1,0,1),c(0,1,0))
B <- cbind(c(1,0),c(0,1),c(1,1))
Y <- X %*% B
rownames(Y) <- paste0("P",1:nrow(Y))
colnames(Y) <- paste0("N",1:ncol(Y))
print(X); print(B); print(Y)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 1
#> [3,] 1 0
#> [,1] [,2] [,3]
#> [1,] 1 0 1
#> [2,] 0 1 1
#> N1 N2 N3
#> P1 1 0 1
#> P2 0 1 1
#> P3 1 0 1
res <- nmfkc(Y,rank=2,epsilon=1e-6)
#> Y(3,3)~X(3,2)B(2,3)...
#> 0sec
res$X
#> Basis1 Basis2
#> P1 0 0.498047869
#> P2 1 0.003904261
#> P3 0 0.498047869
res$B
#> N1 N2 N3
#> Basis1 0.000000 0.9999995176 0.9920988
#> Basis2 2.007838 0.0001206861 2.0079012
# Example 2. Formula Mode
set.seed(1)
dummy_data <- data.frame(Y1=rpois(10,5), Y2=rpois(10,10),
A1=abs(rnorm(10,5)), A2=abs(rnorm(10,3)))
res_f <- nmfkc(Y1 + Y2 ~ A1 + A2, data=dummy_data, rank=2)
#> A created using columns: A1, A2
#> Y(2,10)~X(2,2)C(2,2)A(2,10)=XB(2,10)...
#> 0sec
# Example 3. Symmetric NMF (bi: Y ~ X X^T)
S <- matrix(c(3,0,2, 0,3,1, 2,1,2), nrow=3)
res_bi <- nmfkc(S, rank=2, Y.symmetric="bi")
#> Y(3,3)~X(3,2)B(2,3)...
#> 0sec
res_bi$X # basis matrix (no column normalization)
#> Basis1 Basis2
#> [1,] 0.0000000 1.71131965
#> [2,] 1.7274149 0.01105598
#> [3,] 0.5977175 1.22664897
res_bi$XB # reconstruction X %*% t(X)
#> [,1] [,2] [,3]
#> [1,] 2.92861495 0.01892032 2.099188
#> [2,] 0.01892032 2.98408448 1.046068
#> [3,] 2.09918848 1.04606801 1.861934
# Example 4. Symmetric NMF (tri: Y ~ X C X^T)
res_tri <- nmfkc(S, rank=2, Y.symmetric="tri")
#> Y(3,3)~X(3,2)B(2,3)...
#> 0sec
res_tri$C # Q x Q cluster interaction matrix
#> 1 2
#> Basis1 0.03785993 8.61053418
#> Basis2 5.41314704 0.03785993
res_tri$XB # reconstruction X %*% C %*% t(X)
#> [,1] [,2] [,3]
#> [1,] 2.92953729 0.01662336 2.098360
#> [2,] 0.01662336 2.98320036 1.047195
#> [3,] 2.09836042 1.04719548 1.862305