nmfkc.ecv performs k-fold cross-validation by randomly holding out
individual elements of the data matrix (element-wise), assigning them a
weight of 0 via Y.weights, and evaluating the reconstruction error on
those held-out elements.
This method (also known as Wold's CV) is theoretically robust for determining
the optimal rank (Q) in NMF. This function supports vector input for Q,
allowing simultaneous evaluation of multiple ranks on the same folds.
For symmetric (network) data use nmfkc.net.ecv, which
creates upper-triangle folds to prevent information leakage through
the symmetric entries \(Y_{ij} = Y_{ji}\). Passing the old
Y.symmetric argument here is no longer supported and stops
with a redirect message.
Arguments
- Y
Observation matrix, or a formula (see
nmfkcfor Formula Mode).- A
Covariate matrix. Ignored when
Yis a formula.- rank
Vector of ranks to evaluate (e.g., 1:5). For backward compatibility,
Qis accepted via....- data
A data frame (required when
Yis a formula with column names).- ...
Additional arguments passed to
nmfkc(e.g.,method="EU"). Also accepts:nfolds(number of folds, default 5;divalso accepted),seed(integer seed, default 123).
Value
A list with components:
- objfunc
Numeric vector containing the Mean Squared Error (MSE) for each Q.
- sigma
Numeric vector containing the Residual Standard Error (RMSE) for each Q. Only available if method="EU".
- objfunc.fold
List of length equal to Q vector. Each element contains the MSE values for the k folds.
- folds
A list of length
div, containing the linear indices of held-out elements for each fold (shared across all Q).
References
Wold, S. (1978). Cross-validatory estimation of the number of
components in factor and principal components models.
Technometrics, 20(4), 397–405.
doi:10.1080/00401706.1978.10489693
Owen, A. B., & Perry, P. O. (2009). Bi-cross-validation of the SVD
and the nonnegative matrix factorization. Ann. Appl. Stat.
3(2), 564–594. doi:10.1214/08-AOAS227
(cross-validation of the
NMF rank; see also nmfkc.bicv).
See also
nmfkc, nmfkc.cv; other rank-selection
criteria: nmfkc.rank, nmfkc.bicv,
nmfkc.consensus, nmfkc.ard.
Examples
# Element-wise CV to select rank
Y <- t(iris[1:30, 1:4])
res <- nmfkc.ecv(Y, rank = 1:2, nfolds = 3)
#> Performing Element-wise CV for Q = 1,2 (3-fold)...
res$objfunc
#> Q=1 Q=2
#> 0.2597611 0.4376291