nmfkc fits a nonnegative matrix factorization with kernel covariates
under the tri-factorization model \(Y \approx X C A = X B\).
This function supports two major input modes:
Matrix Mode (Existing): nmfkc(Y=matrix, A=matrix, ...)
Formula Mode (New): nmfkc(formula=Y_vars ~ A_vars, data=df, rank=Q, ...)
The rank of the basis matrix can be specified using either the rank argument
(preferred for formula mode) or the hidden Q argument (for backward compatibility).
nmfkc(Y, A = NULL, rank = NULL, data, epsilon = 1e-04, maxit = 5000, ...)Satoh, K. (2024). Applying Non-negative Matrix Factorization with Covariates to the Longitudinal Data as Growth Curve Model. arXiv:2403.05359. https://arxiv.org/abs/2403.05359
Observation matrix, OR a formula object if data is supplied.
Covariate matrix. Default is NULL (no covariates).
Integer. The rank of the basis matrix \(X\) (Q). Preferred over Q.
Optional. A data frame from which variables in the formula should be taken.
Positive convergence tolerance.
Maximum number of iterations.
Additional arguments passed for fine-tuning regularization, initialization, constraints,
and output control. This includes the backward-compatible arguments Q and method.
Y.weights: Optional numeric matrix (P x N) or vector (length N).
0 indicates missing/ignored values. If NULL (default), weights are automatically
set to 0 for NAs in Y, and 1 otherwise.
X.L2.ortho: Nonnegative penalty parameter for the orthogonality of \(X\) (default: 0).
It minimizes the off-diagonal elements of the Gram matrix \(X^\top X\), reducing the correlation
between basis vectors (conceptually minimizing \(\| X^\top X - \mathrm{diag}(X^\top X) \|_F^2\)).
(Formerly lambda.ortho).
B.L1: Nonnegative penalty parameter for L1 regularization on \(B = C A\) (default: 0).
Promotes sparsity in the coefficients. (Formerly gamma).
C.L1: Nonnegative penalty parameter for L1 regularization on \(C\) (default: 0).
Promotes sparsity in the parameter matrix. (Formerly lambda).
Q: Backward-compatible name for the rank of the basis matrix (Q).
method: Objective function: Euclidean distance "EU" (default) or Kullback–Leibler divergence "KL".
X.restriction: Constraint for columns of \(X\). Options: "colSums" (default), "colSqSums", "totalSum", or "fixed".
X.init: Method for initializing the basis matrix \(X\). Options: "kmeans" (default), "runif", "nndsvd", or a user-specified matrix.
nstart: Number of random starts for kmeans when initializing \(X\) (default: 1).
seed: Integer seed for reproducibility (default: 123).
prefix: Prefix for column names of \(X\) and row names of \(B\) (default: "Basis").
print.trace: Logical. If TRUE, prints progress every 10 iterations (default: FALSE).
print.dims: Logical. If TRUE (default), prints matrix dimensions and elapsed time.
save.time: Logical. If TRUE (default), skips some post-computations (e.g., CPCC, silhouette) to save time.
save.memory: Logical. If TRUE, performs only essential computations (implies save.time = TRUE) to reduce memory usage (default: FALSE).
A list with components:
The matched call, as captured by match.call().
A character string summarizing the matrix dimensions of the model.
A character string summarizing the computation time.
Basis matrix. Column normalization depends on X.restriction.
Coefficient matrix \(B = C A\).
Fitted values for \(Y\).
Parameter matrix.
Soft-clustering probabilities derived from columns of \(B\).
Hard-clustering labels (argmax over \(B.prob\) for each column).
Row-wise soft-clustering probabilities derived from \(X\).
Hard-clustering labels (argmax over \(X.prob\) for each row).
List of attributes of the input covariate matrix A, containing metadata like lag order and intercept status if created by nmfkc.ar or nmfkc.kernel.
Final objective value.
Objective values by iteration.
Coefficient of determination \(R^2\) between \(Y\) and \(X B\).
The residual standard error, representing the typical deviation of the observed values \(Y\) from the fitted values \(X B\).
A list of selection criteria, including ICp, CPCC, silhouette, AIC, and BIC.
Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthogonal Nonnegative Matrix Tri-Factorizations for Clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 126–135). doi:10.1145/1150402.1150420 Potthoff, R. F., & Roy, S. N. (1964). A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika, 51, 313–326. doi:10.2307/2334137
# install.packages("remotes")
# remotes::install_github("ksatohds/nmfkc")
# Example 1. Matrix Mode (Existing)
library(nmfkc)
X <- cbind(c(1,0,1),c(0,1,0))
B <- cbind(c(1,0),c(0,1),c(1,1))
Y <- X %*% B
rownames(Y) <- paste0("P",1:nrow(Y))
colnames(Y) <- paste0("N",1:ncol(Y))
print(X); print(B); print(Y)
#> [,1] [,2]
#> [1,] 1 0
#> [2,] 0 1
#> [3,] 1 0
#> [,1] [,2] [,3]
#> [1,] 1 0 1
#> [2,] 0 1 1
#> N1 N2 N3
#> P1 1 0 1
#> P2 0 1 1
#> P3 1 0 1
library(nmfkc)
res <- nmfkc(Y,Q=2,epsilon=1e-6)
#> Y(3,3)~X(3,2)B(2,3)...
#> 0sec
res$X
#> Basis1 Basis2
#> P1 0 0.498047869
#> P2 1 0.003904261
#> P3 0 0.498047869
res$B
#> N1 N2 N3
#> Basis1 0.000000 0.9999995176 0.9920988
#> Basis2 2.007838 0.0001206861 2.0079012
# Example 2. Formula Mode (New)
# dummy_data <- data.frame(Y1=rpois(10,5), Y2=rpois(10,10), A1=1:10, A2=rnorm(10,5))
# res_f <- nmfkc(Y1 + Y2 ~ A1 + A2, data=dummy_data, rank=2)