Changelog
Source:NEWS.md
nmfkc 0.8.2
nmfkc.net.DOT(): default layout is now "neato"
- The
layoutchoices are reordered by recommendation (neato,fdp,twopi,circo,dot), so the default changes from"fdp"to"neato", which separates community graphs more clearly. Raisingthreshold(e.g. 0.2–0.3) further declutters weak membership edges.
Bug fix: nmfkc.net.DOT() mis-detected type = "bi" as "tri"
- The bi-vs-tri auto-detection ignored the result’s
$typefield and fell back toall.equal(C, diag(Q)), which fails whenCcarries dimnames (it reports a names mismatch). Atype = "bi"fit was therefore treated as"tri", drawing the inter-class interaction layer that the bi model (with ) should not have. Detection now uses$typefirst (falling back to the dimnames-safe identity check), so"bi"correctly draws no inter-class edges.
nmfkc.bicv() / nmfkc.consensus(): leaner signatures
- Fine-tuning arguments move into
...(same safe defaults):nmfkc.bicv()is nownmfkc.bicv(Y, rank, ...)(nfolds= 2 per Owen & Perry, plusseed,nnls.maxit, via...), andnmfkc.consensus()isnmfkc.consensus(Y, A, rank, nrun, keep.consensus, ...)(seed,pac.rangevia...). Existing named-argument calls are unaffected.
nmfkc.ard(): simpler, safer interface
- The signature is trimmed to the essentials
nmfkc.ard(Y, rank, nrun, plot, ...); everything else (prior,seed,a,b,maxit,epsilon,tol) moves into...with the same safe defaults, so a typical call is justnmfkc.ard(Y, rank = K). -
nrunnow defaults to10(was1): ARD is a sensitive point estimate, and several restarts give a stable modal rank by default. - The help now states explicitly that the implementation is the Euclidean () case of Tan & Fevotte (2013) and that the default
bis an empirical energy scale, not the paper’s method-of-moments value (Eq. 38).
nmfkc.ard(): better default prior scale
- The default
bis now the initial per-component energy scale(nrow(Y) + ncol(Y)) / K * mean(Y)instead of a fixed0.001 * mean(Y). The old fixed fraction over-pruned (winner-take-all collapse onto one dominant component) when(F + N)/Kwas large; the new scale-aware default recovers genuine low-rank structure stably (e.g. a clean rank-3 signal: relevance1, 0.99, 0.87, 0, ..., all restarts agree).
New nmfkc.ard(): ARD rank determination (Tan & Fevotte 2013, prototype)
- Automatic Relevance Determination for the NMF rank (Euclidean). Fits NMF once at an over-complete rank and prunes automatically: each component carries a relevance weight with an inverse-gamma prior and the multiplicative updates gain a penalty (
L2half-normal /L1exponential) that drives unsupported components to zero. The number of surviving components is the estimated rank – no rank scan. Returns an"nmfkc.ard"object withprintand a relevance-barplot. Plain NMF only; a sensitive point estimate (depends on prior / start / init), so a complement to the CV / consensus engines, not a sole criterion.
New nmfkc.consensus(): consensus-clustering rank selection (Brunet 2004)
- The bioinformatics-standard stability approach, as a lightweight engine like
nmfkc.ecv/nmfkc.bicv. For each rank it runs NMFnruntimes from random initializations (X.init = "runif"), builds the consensus matrix from the per-run hard clusterings, and returns two stability scores per rank:cophenetic(cophenetic correlation coefficient, Brunet et al. 2004) anddispersion(Kim & Park 2007, in[0,1]). Unlike the CV engines, a good rank maximizes stability. Optionalkeep.consensus = TRUEreturns the consensus matrices. - Also reports
pac, the Proportion of Ambiguous Clustering (Senbabaoglu et al. 2014; fraction of consensus entries in the ambiguous intervalpac.range, default(0.1, 0.9)). Lower is better and it is more sensitive than the often-saturatedcophenetic. Theprint/criteria-plotshow all three metrics. - Returns an
"nmfkc.consensus"object withprintandplotmethods:plot(cs)(type = "criteria") draws the stability curves;plot(cs, type = "heatmap", rank = ...)draws the consensus matrix heatmap(s) reordered by hierarchical clustering (default = all ranks in an2mfrowgrid;mfrowoverridable).
New nmfkc.bicv(): bi-cross-validation for rank selection
- Owen & Perry’s (2009) bi-cross-validation (BCV), a lightweight CV engine in the spirit of
nmfkc.ecv: it returns the held-out error per rank (objfunc,sigma) and nothing more. Holds out a row-block and a column-block at once, fits NMF only on the retained block, and predicts the held-out block by folding the held-out rows/columns onto the fixed factors via non-negative regression (no information leakage, unlike element-wisenmfkc.ecv).nfolds = 2(leave out half rows / half columns) per Owen & Perry’s recommendation.
*.rank: eff.rank.idx shown for context (no best marker)
- The broken-stick-corrected effective-rank index (
eff.rank.idx, green) is drawn for context only and no longer carries a “Best (Max)” marker: it is a factor-utilization diagnostic (most even relative to the random null), not a predictive rank optimum. The recommended rank is driven solely by the ECV minimum and the R-squared elbow.
*.rank: broken-stick-corrected effective-rank index
- The
*.rankcriteria table gainseffective.rank.expected(the broken-stick / uniform-Dirichlet nullexp(H_Q - 1),H_Q= theQ-th harmonic number) andeffective.rank.index, the [0, 1] index(effective.rank - expected) / (Q - expected)(clamped). The index anchors 0 at the random null and 1 at perfect evenness, removing the small-rank inflation of the raweffective.rank / Q. Its maximum is a meaningful rank, so the diagnostics plot now draws this corrected index (green,eff.rank.idx) with a restored “Best (Max)” marker in place of the raw ratio.
*.rank results gain plot() / print() methods
- The rank-selection functions (
nmfkc.rank(),nmfkc.net.rank(),nmfkc.signed.rank(),nmfae.rank(),nmfae.signed.rank()) now return a classed object ("nmf.rank").plot()redraws the three-criterion diagnostics plot (honouringmain,xlab,ylab,lwd) andprint()shows the recommended rank, the per-criterion best ranks, and the criteria table. As before the constructor draws immediately whenplot = TRUE; the$rank.bestand$criteriafields are unchanged, so existing code keeps working.
New nmf.cluster.flow(): cluster-flow diagram across ranks
-
nmf.cluster.flow()andnmf.cluster.criteria()now treat the suppliedfitsas a generic (kept in the given order, sorted by rank), so the same rank fitted as different models is also supported. Both gain anamesargument for the x-axis tick labels (default: each result’s$rank), and innmf.cluster.flow()thereferenceargument is now the (1-based position) of the result that defines the colours – not a rank value – defaulting to the central resultfloor(length(fits) / 2) + 1(e.g. the 2nd of 2 or 3 results). - The adjusted Rand index (ARI) between each pair of adjacent ranks is now computed and printed along the top of the figure (and returned in
$ARI, length ), summarizing how much the hard clustering changes from one rank to the next. - Each cluster box is now tinted by the reference colour among the individuals it contains (the colour shared by the most member lines); ties are broken in favour of the earliest palette entry (the smallest reference-cluster id). This shows at a glance which reference cluster dominates each box at each rank.
-
nmf.cluster.flow()now inserts a gap of one average cluster () between clusters in the per-rank layout and sizes each grey box exactly to the minimum/maximum position of its members, so the cluster boxes are clearly separated with the gaps maximized. Each rank is normalized to the full height independently. - The cluster number is the dominant-factor index (argmax of the coefficient) of each fit, kept as-is so it matches the factor/basis numbering of the supplied models. A factor that never dominates any individual leaves an empty, unused cluster number (a gap, e.g. labels
2, 3with no1) – this is correct and consistent with the fit, and the labels are not renumbered. -
nmf.cluster.flow()now returns a classed object with a dedicatedplot()method, so the diagram can be (re)drawn withplot(fl, col = , lwd = , xlab = , ylab = , main = )– the colour vector (indexed by reference cluster), line width, axis labels and title are all honoured. The constructor still draws immediately by default (plot = TRUE) and forwards graphical arguments to the plot method; useplot = FALSEto build the object and plot it later. Itsprint()method shows the adjacent-rank ARI and the full cluster table. -
nmf.cluster.flow(fits, reference = )takes a list of models fitted at different ranks (any non-negative MU family) and draws an alluvial / Sankey-style diagram of how the hard sample clustering changes with the rank : each individual flows left-to-right across the ranks (x-axis), its vertical position is set by its cluster (clusters reordered per rank by a barycenter heuristic to reduce crossings), and lines are coloured by the cluster at the rank – so one can watch the reference clusters split or merge. At every rank a translucent grey box is drawn of each cluster’s members with the cluster number centred inside, so the grouping and labels are visible at all ranks (not only the reference). The default line palette is now a strong, well-separated qualitative set (ColorBrewer , no pale colours) and can be overridden with . Returns (invisibly) the table with rows = individuals, columns = rank, entries = cluster number.
New nmf.cluster.criteria(): sample-clustering quality across ranks
-
nmf.cluster.criteria(fits, Y)takes a (one per rank; a single fit is also accepted) and reports the clustering-quality criteriasilhouette,CPCC, anddist.corfor each rank, returning a per-rank$criteriatable (mirroringnmf.cluster.flow()). It hasplot()(line plot of the three criteria vs rank) andprint()(the table) methods, and draws immediately whenplot = TRUE. Works for any family (nmfkc,nmfkc.signed,nmfae,nmfae.signed,nmfkc.net,nmfre,nmf.sem/nmf.ffb; the last needs the exogenous block viaY2). These are clustering-stability diagnostics, deliberately separate from the rank-selection*.rankfunctions (r.squared / effective rank / ECV). - Hard sample clustering needs a non-negative coefficient/score matrix (a valid membership simplex).
nmf.cluster.criteria()detects this from the actual coefficient: when it is non-negative the hard-labelsilhouette(and cluster sizes) are returned; when it is signedsilhouetteisNAwhile the distance-basedCPCCanddist.corare still computed. (ARI is not reported here – it compares two clusterings, e.g. across ranks or resamples, so it is not a single-fit quantity.) -
nmfkc.rank()no longer carriesARI,silhouette,CPCC, ordist.corin itscriteriatable – those clustering-stability metrics now live innmf.cluster.criteria(). All five*.rankfunctions return the same five columns (rank,effective.rank,effective.rank.ratio,r.squared,sigma.ecv). Per-rank fits usedetail = "fast", so the expensive O(N^2) distance computations are skipped during rank selection.rank.bestis unchanged. The*.rankfunctions now emit a one-line message pointing tonmf.cluster.criteria()for clustering quality.
Rank-selection functions for the other NMF families
- New
nmfkc.net.rank(),nmfkc.signed.rank(),nmfae.rank()(paired ) andnmfae.signed.rank()(paired) bringnmfkc.rank-style rank selection to the other multiplicative-update models. Each reports the three criteria that are well defined for every family –r.squared, the effective rank (utilization), and the element-wise CV errorsigma.ecv– and returnslist(rank.best, criteria). (nmf.ffb/nmfreare not covered: they do not support the element masking that ECV needs.) -
nmfkc.rank()plot simplified and unified. All*.rankfunctions now share one back-end.rank.finish()and draw the same concise three-criterion figure:r.squared(red),eff.rank(green), andsigma.ecv(blue, right axis), each as a line with points, rank-number labels, and a highlighted best marker – “Best (Elbow)” for the R-squared knee, “Best (Peak)” for the effective-rank utilization, and “Best (Min)” for the CV minimum.nmfkc.rank()still computesARI,silhouette,CPCC, anddist.corinto itscriteriatable, but no longer plots them. - The four new
*.rankfunctions gain adetailargument matchingnmfkc.rank:"full"(default) runs the element-wise CV and reportssigma.ecv;"fast"skips the (expensive) CV, so the plot shows onlyr.squaredandeff.rankand the recommended rank falls back to the R-squared elbow.
Internal: shared element-wise CV helpers
- The four element-wise cross-validation functions (
nmfkc.ecv(),nmfae.ecv(),nmfkc.signed.ecv(),nmfae.signed.ecv()) now build their folds through a single internal helper.ecv.make.folds(), removing four near-identical copies of the fold-partitioning loop.nmfkc.net.ecv()keeps its symmetric upper-triangle folds. - element-wise CV functions now share one config-indexed loop driver
.ecv.run(labels, nfolds, run_one, progress): the single-rank ones (nmfkc.ecv(),nmfkc.net.ecv(),nmfkc.signed.ecv()) and the -grid ones (nmfae.ecv(),nmfae.signed.ecv()). Each supplies a model-specificrun_one(i, k)closure (mask fold, refit configi, return held-out loss) and an optional progress callback;.ecv.run()handles the config-by-fold loop, theobjfunc/sigma/objfunc.foldaggregation, and naming. This removes the last copies of the CV-loop machinery, including the per-grid reshaping innmfae.ecv(). - The refactor is behaviour-preserving: for the same seed the folds and all CV values (
objfunc,sigma,objfunc.fold, names/labels) are byte-for-byte identical to before, verified across EU and KL losses, the symmetric (upper-triangle) case, and both paired and full grids.
Unified summary print blocks
- New shared internal helpers
.print.fit.statistics()and.print.structure.diagnostics()render the “Statistics” / “Goodness of fit” and “Structure Diagnostics” blocks forsummary.nmfkc(),summary.nmfae(), andsummary.nmfkc.net()(incl. the signed variant). Labels are padded to a common width so values are column-aligned, fields absent from a given model are skipped automatically (e.g.nmfkc.nethas no residual SE), and any future fit statistic or sparsity row is now added in one place instead of per-summary.
Effective Rank in all five MU-family summaries
-
summary()now reports the Effective Rank asx.xx / Q (NN.N%)– the absolute value, the nominal rank, and the utilization ratioeffective.rank / Qas a percentage – fornmfkc(),nmfkc.net(),nmfae(),nmf.ffb()/nmf.sem(), andnmfre()— previously onlynmfkc()showed it. Each is computed by the new shared internal helper.effective.rank(B)from the model’s natural coefficient/score matrix: the coefficients (nmfkc), the latent encoding (nmfae), the node membership (nmfkc.net), the latent scores (nmf.ffb), and the BLUP scores (nmfre).NAat .
Rank-selection diagnostics: silhouette / CPCC fixed, IC removed
-
silhouetteis now computed in the original data space. It used to be evaluated on the rank-B.probsimplex, whose dimension changes with ; that made it monotone in (always favouring the smallest rank) and hid genuine cluster structure. It is now the standard mean silhouette width overdist(t(Y))(the fixed original-data sample distances) with the per-sample hard labels — the k-means convention. On data with real clusters it now shows an interior optimum (e.g. the road-OD network peaks at the same rank as the cross-validation minimum). -
CPCCis now the classic cophenetic correlation ofdist(t(B)). It used to be computed from the soft co-membershipt(B.prob) %*% B.prob, which was nearly flat across . It is nowcor(dist(t(B)), cophenetic(hclust(dist(t(B)))))— how well a hierarchical clustering of the rank- coefficient distances reproduces those distances (Sokal & Rohlf). It now varies with and recovers an interior optimum. -
Removed
ICp,AIC, andBICfromnmfkc()’scriterionlist, fromsummary.nmfkc(), and fromnmfkc.rank()’s table. Empirically (across three real datasets)ICpwas monotone increasing (always selecting ) andAICmonotone decreasing (always selecting the largest ); for NMF, where the parameter count grows as , these information criteria do not have a usable interior optimum, so they were misleading rather than informative. - The internal helper
.silhouette.simple()(centroid-approximate, took aB.probmatrix) was replaced by.silhouette.mean(D, labels), which returns the exact mean silhouette width from a distance matrix and labels.
Breaking change: symmetric NMF removed from nmfkc()
- The
Y.symmetric = "bi" / "tri"option (deprecated in v0.7.x) has been removed fromnmfkc()andnmfkc.ecv(). Symmetric NMF of network data now lives exclusively in the dedicatednmfkc.net()/nmfkc.net.ecv()functions, which use the correct Frobenius bilateral-gradient updates. PassingY.symmetrictonmfkc()ornmfkc.ecv()now stops with a message pointing to the replacement:nmfkc.net(Y, rank, type = "tri")(types"tri","bi","signed"). This also removes the bi/tri code branches (cube-root damping, fixedC = I, tri C-update, upper-triangle CV folds) fromnmfkc(), simplifying the core function.
New diagnostic: effective rank
-
nmfkc()now reportscriterion$effective.rank, the effective rank of the fit:expof the Shannon entropy of the explained-variance distributionp_k = var(B[k, ]) / sum_j var(B[j, ]). By the trace identitysum_k var(B[k, ]) = tr(Cov(B)), eachp_kis the exact fraction of the total coefficient variance carried by factork, so the entropy is a genuine additive decomposition (variances add; standard deviations do not, which is why variance — not sd — is the natural partner for the entropy here). It ranges in[1, Q]and counts how many latent factors actively shape across-sample variation (dead, zero-variance factors drop out). This is the PCA-style explained-variance / effective-dimensionality measure and reuses theexp(entropy)functional form of Roy & Vetterli (2007). -
summary.nmfkc()printsEffective Rank: x.xx / Q. -
nmfkc.rank()adds aneffective.rankcolumn to its criteria table. When effective rank plateaus well below the nominal rank, the extra factors are not carrying additional coefficient variance — a signal that the rank is over-specified. -
nmfkc.rank(plot = TRUE)overlays aneff.rankcurve (effective rank divided by nominal rank, in[0, 1], solid green line) on the diagnostics plot. A peak in this utilization curve marks the rank at which the latent factors carry the most evenly distributed variance.
Diagnostics cleanup: B.prob crispness metrics
- Removed
B.prob.sd.minandB.prob.entropy.meanfromnmfkc()’scriterionlist, fromsummary.nmfkc(), and fromnmfkc.rank()’s criteria table and plot. All threeB.prob.*peakedness metrics are monotone in the rankQ, so they carry no peak/elbow signal for rank selection (verified empirically); the principled rank signals are ECV, the R-squared elbow, and the neweffective.rankutilization. -
B.prob.max.mean(clustering crispness) is retained, but only insummary.nmfkc()(“Clustering Crispness”) and thecriterionlist. At a fixedQit remains a useful confidence check — the mean dominant-cluster membership — before treatingB.clusteras hard labels. It is no longer shown innmfkc.rank()(cross-Q), where its1/Qbaseline shift makes it misleading. -
summary.nmfkc()no longer prints “Clustering Entropy” (it duplicated the crispness information).
Improvements
-
Unified three-variant R² across all NMF functions. Every NMF variant (
nmfkc(),nmfae(),nmfae.signed(),nmfkc.net(),nmfkc.signed(),nmfre()) now returns three goodness-of-fit summaries on the same scale, computed by the new internal helper.r.squared.all():-
r.squared: Pearson (scale-invariant, in ). Unchanged from before. -
r.squared.uncentered: . Baseline = the zero matrix (natural for non-negative factorizations without an intercept); matches the “uncentered R²” of intercept-free regression. -
r.squared.centered: . Baseline = per-row mean; the standard (“centered”) multivariate- regression ; equals 0 when the model predicts the row mean.
Y.weights == 0masking (the standard NA-hold-out convention). Fornmfre()the same three variants are also reported on the fixed-only prediction asr.squared.fixed.*. Displayed by allsummary.*methods. -
Bug Fixes
-
nmfkc.net():r.squarednow correctly excludes weight-zero (NA-masked) entries whenY.weightsis supplied or auto-masking is in effect, matching the convention used bynmfkc(),nmfae(),nmfae.signed(), andnmfkc.signed(). Previously the correlation was computed over the full matrix including replaced-NA cells, giving a distorted r.squared.
Documentation
-
nmfkc(): removed Examples 3 & 4 (deprecatedY.symmetric = "bi"/"tri"); the documentation now points users to\link{nmfkc.net}()for symmetric NMF. -
summary.nmf.sem(): example code,@param, and@seealsoupdated to use the canonicalnmf.ffbname (the S3 method continues to dispatch correctly viac("nmf.ffb", "nmf.sem")inheritance).
nmfkc 0.7.3
CRAN release: 2026-05-13
Documentation
- README and
nmf-sem-with-nmfkc.Rmdvignette code now reference the canonicalnmf.ffb.*aliases (nmf.ffb(),nmf.ffb.cv(),nmf.ffb.DOT()) instead of the legacynmf.sem.*names. Both names continue to work; the change only affects what users see on the GitHub Pages homepage and in the vignette source.
nmfkc 0.7.2
Headline: NMF-FFB rebrand and full bootstrap inference
-
nmf.ffb*family added as the canonical alias fornmf.sem*(Satoh 2025, arXiv:2512.18250 adopts “NMF-FFB” — Non-negative Matrix Factorization with Feed-Forward + Feedback — as the model’s canonical name).nmf.sem*continues to work and shares the same return classes (c("nmf.ffb", "nmf.sem")andc("nmf.ffb.inference", "nmf.sem.inference", ...)), so existing scripts are unaffected. -
nmf.sem.inference()/nmf.ffb.inference(): replaced the legacy 1-step Newton wild bootstrap with a full X-fixed pair bootstrap. Resamples columns of (Y1, Y2), refits (C1, C2) with X held at the original fit, and reports per-elementsupport_rate = mean(|c_b| > threshold)together with percentile CIs. Significance markers (*/**/***at sup > 0.95 / 0.99 / 0.999) follow the lavaan convention. Both Theta_1 (feedback) and Theta_2 (exogenous) are inference targets (previous version covered only Theta_2). -
nmf.sem()/nmf.ffb(): now runsnmfkc(Y1, A = Y2)internally by default whenX.initis a string method, forwardingX.init,X.L2.ortho,epsilon,maxit,seed. The feedforward fit is used both as the X warm-start and as the baseline forSC.map.nmfkc.baseline = FALSEopts out.
Bug Fixes
-
nmf.sem.inference(): fixed dimension bug in the Leontief identity matrix (I_mat <- diag(Q)should have beendiag(P1)); previously every replicate was silently marked invalid whenP1 != Q. -
nmfkc.net(): now auto-masks NA entries ofY(parity with the other four NMF variants); previously errored at themin(Y) < 0check whenYcontained NA. -
nmfkc(): Fixed C matrix asymmetry in tri-symmetric NMF (Y.symmetric = "tri"). The C update was using stale B and XB computed from the old X; now B and XB are recomputed after X is updated. Also fixed column reordering to permute both rows and columns of C. Previously the relative asymmetry could reach ~46%; now it is at machine precision (~1e-14).
Improvements
-
Y.weightssemantics unified tolm()-style weighted least squares acrossnmfkc(),nmfae(),nmfkc.net(),nmfkc.signed(),nmfae.signed(): loss is nowsum(W * (Y - Yhat)^2)(linear in W, matchinglm()’sweightsargument). Binary masks (W ∈ {0, 1}; the standard ECV / NA-mask case) are unaffected since W = W^2. - All MU functions now emit a
"maximum iterations (N) reached..."warning whenmaxitis exhausted without meeting the relative- tolerance criterion (previously silent innmfae,nmfae.signed,nmfkc.net,nmfkc.signed,nmfre, andnmf.sem). - All MU functions now share
maxit = 5000as the default (was 5000 / 20000 / 50000 inconsistently). Together with the maxit warning above, users see explicit feedback when 5000 is insufficient and can opt into a larger cap. - New shared internal helper
.init_X_method()for X initialization via"nndsvd"/"kmeans"/"kmeansar"/"runif"/ numeric matrix. All NMF families now use the same dispatch logic; previous ad-hoc inline implementations are removed. -
nmf.sem()returnsSC.map(input-output structural fidelity: correlation between the equilibrium operator and the feedforward baseline mapping; Satoh 2025 §4.SC.map) automatically whennmfkc.baselineis supplied or computed internally. -
summary.nmf.sem(): rewritten to display the full-bootstrap inference output — separate Theta_1 / Theta_2 blocks withEstimate | CI_low | CI_high | support | Pr(>0) | sig, plus a bootstrap meta-info header. -
coef.nmf.sem(): now returns a long-format data frame with rows for every entry of both C1 and C2 (Type | Basis | Covariate | Estimate); previously returned only the C2 matrix when no inference had been run. Schema matches the inference-augmented output for uniformity. -
plot.nmf.sem(): default trace is nowobjfunc.full(loss + penalties — the actual monotonically-decreasing quantity that the multiplicative updates minimize) instead ofobjfunc(reconstruction only). New argumentwhich = "full" | "reconstruction" | "both". -
nmf.sem.DOT(): significance stars now appear on Theta_1 (feedback Y1 → F) edges in addition to Theta_2 (exogenous Y2 → F); X (F → Y1) edges remain unstarred since the basis is not the inference target. -
plot.nmfae.ecv(): Heatmap cell text color is now always black for better readability on light-colored cells. -
nmfkc():X.init = "runif"now supportsnstart > 1for multi-start initialization. Multiple random starting points are evaluated with 10 standard NMF iterations, and the best (lowest Frobenius error) is selected. -
nmfae(),nmfre():r.squaredis now computed ascor(Y, fitted)^2(squared correlation between observed and fitted values), consistent withnmfkc(). Previouslynmfae()used1 - SS_res/SS_totandnmfre()used the same regression-style R-squared, which can behave unexpectedly for intercept-free non-negative models. -
nmfkc.kernel.beta.nearest.med(): added acandidatesargument controlling the bandwidth grid. Options:"7points"(new default,t = {-1,-2/3,-1/3,0,1/3,2/3,1}),"4points"(t = {-1/2, 0, 1/2, 1}), or a user-supplied numeric vector of values. Previously the grid silently differed between the no-landmark (Uk = NULL; 4 points) and landmark (7 points) branches.
New Functions (Signed NMF family)
-
nmfkc.signed(): NMF-KC with signed covariate/coefficient. Model with , (signed), real-valued. Uses Ding et al. (2010) sign-splitting + Direct MU; may also contain negative entries (semi-NMF regression). SupportsY.weightsfor element-wise masking. -
nmfkc.signed.cv(),nmfkc.signed.ecv(): column-wise and element-wise k-fold CV for rank selection on signed data. -
nmfae.signed(): Three-layer autoencoder with . preserve soft clustering on both decoder and encoder sides while the bottleneck can carry negative weights (e.g., anti-correlated properties). Hybrid warm-start (fromnmfae()) + Direct MU with multi-restart. -
nmfae.signed.ecv(): element-wise CV for (decoder-rank, encoder-rank) selection. -
nmfae.signed.inference(): sandwich SE + wild bootstrap for (no non-negativity projection on since it is signed). - S3 methods
predict.*.signed(),plot.*.signed(),summary.*.signed(), andnmfae.signed.rename()helper.
New Functions (Network NMF family)
-
nmfkc.net(): Single unified entry point for symmetric NMF of network data, withtype = "tri" | "bi" | "signed". All three variants use the Frobenius-full bilateral gradient (supersedes the one-sided approximation innmfkc(Y.symmetric = ...)).type = "signed"supports signed via Ding et al. (2010) sign-splitting, preserving for soft clustering while allowing inter-cluster repulsion. The returned object’s fields are uniform across types: and are for tri/bi, and populated matrices for signed. is always populated (identity for bi, non-negative for tri, signed for signed). -
nmfkc.net.ecv(): Element-wise cross-validation with upper-triangle folds (mirrored to the lower triangle to prevent symmetry leakage). Unified entry point fortype = "tri" | "bi" | "signed"(callsnmfkc.net()with the matchingtypefor each fold). -
nmfkc.net.DOT(): Graphviz DOT visualization for symmetric NMF networks. Displays basis-to-node membership edges and inter-basis interaction edges (C matrix) with significance stars. Now hassignedparameter (auto-detected from class) to render negativeCentries as dashed edges. -
nmfkc.net.inference(): Statistical inference for symmetric NMF. Wrapper aroundnmfkc.inference()withA = t(X). Returns off-diagonal C coefficients with sandwich SE and wild bootstrap.
Deprecations
-
nmfkc(Y, Y.symmetric = "bi"|"tri"): Deprecated in favor ofnmfkc.net(Y, type = "bi"|"tri"). The old implementation uses a one-sided gradient approximation that empirically converges for but is theoretically incorrect and does not extend to signed . The deprecated branch still works in v0.6.8 (with a deprecation warning) and will be removed in a future release.
Parameter Renames (old names remain usable for backward compatibility)
-
nmf.sem.DOT():weight_scale_y2f→weight_scale_c2,weight_scale_fy1→weight_scale_x1(matrix-name-based naming, consistent withnmfae.DOT()andnmfkc.DOT()). -
nmf.sem.DOT():sig.levelmoved to afterthresholdfor consistency with other.DOTfunctions.
Documentation
- README, vignettes, and roxygen
@title/@descriptionupdated to use NMF-FFB as the canonical model name (with “(formerly NMF-SEM)” attached on first mention for discoverability of the legacy term). File names (R/nmf.sem.R,vignettes/nmf-sem-with- nmfkc.Rmd,man/nmf.sem.Rd), function names (nmf.sem*), and S3 classes ("nmf.sem") are unchanged so URLs and existing scripts continue to work.
nmfkc 0.6.7
CRAN release: 2026-04-15
Bug Fixes
- Added
fitted.nmfae()andresiduals.nmfae()S3 methods; previouslyfitted()on annmfaeobject silently returnedNULLbecause the wrong field name ($XBinstead of$Y1hat) was used.
Naming Unification (old names remain usable for backward compatibility)
- Coefficient tables: all inference functions now use
Basis/Covariatecolumns (wasFactor/Exogenousinnmf.sem.inference(),Decoder/Encoderinnmfae.inference()). - Wild bootstrap defaults unified:
wild.B = 500,wild.seed = 123across all inference functions. - First argument of all
.DOTfunctions renamed toresultfor consistency. - CV tuning parameters (
nfolds,seed,shuffle) moved to...innmfkc.ecv(),nmfae.ecv(),nmfae.cv(),nmf.sem.cv();divalso accepted for backward compatibility.
nmfkc 0.6.6
New Functions
-
nmfkc.criterion(): Extracted criterion computation fromnmfkc()as a standalone exported function. Supportsdetail = "full"/"fast"/"minimal"to control computation cost. -
nmfre.inference(): Separated statistical inference fromnmfre()optimization. Returns coefficient table with SE, z-values, and p-values via wild bootstrap. -
nmf.sem.inference(): Statistical inference for the C2 parameter matrix in NMF-SEM. Uses sandwich SE and wild bootstrap. - S3 methods
coef(),fitted(),residuals()for all model classes (nmfkc,nmfae,nmfre,nmf.sem). - S3 methods
plot()fornmfreandnmf.sem(convergence diagnostics). -
summary.nmf.sem(): Stability diagnostics, fit statistics, and C2 coefficient table.
Parameter Renames (old names remain usable for backward compatibility)
-
nmfkc(),nmfkc.rank():save.time/save.memory→detail -
nmfae():Q→rank,R→rank.encoder -
nmfre():Q→rank,dfU.cap.rate→df.rate -
nmfre.dfU.scan(),nmfkc.ar.degree.cv():Q→rank -
nmfkc.residual.plot():Y_XB_palette→fitted.palette,E_palette→residual.palette -
nmfkc.kernel.beta.nearest.med():block_size→block.size,sample_size→sample.size
Other Improvements
-
hide.isolatedoption added to all.DOTfunctions (defaultTRUE). -
nmf.sem.DOT(): Addedsig.levelparameter; C2 edges decorated with significance stars. -
nmfkc(): AddedX.restriction = "none"option andX.init = "kmeansar"initialization. - Added arXiv/DOI references to roxygen documentation for all main functions.
-
@section Lifecycle: Experimentaladded tonmfae(). - Removed
mc.coresparallel option fromnmfae.ecv()for CRAN compliance.
nmfkc 0.6.0
Bug Fixes
- Fixed variable
TshadowingTRUEin information criterion computation. - Fixed
nmfkc.ecv()to use KL divergence for evaluation whenmethod="KL". - Added performance flags (
save.time=TRUE) tonmfkc.ecv()inner calls. - Fixed zero-division in
nmfkc.rank()elbow normalization when R-squared values are identical. - Fixed parameter name mismatch (
rank→Q) innmfkc.rank()call tonmfkc.ecv(). - Fixed descending loop in
nmf.sem.split()when P=2. - Added input validation for
n.exogenousinnmf.sem.split().
Documentation
- Added roxygen documentation for
summary.nmfkc()andprint.summary.nmfkc(). - Added
@returnforplot.nmfkc()andpredict.nmfkc(). - Added missing
@returnitems (method,n.missing,n.total,rank,mae) tonmfkc().
Code Quality
- Replaced
T/FwithTRUE/FALSE. - Replaced
1:length()withseq_along(). - Changed default font from Meiryo to Arial in DOT functions.
- Aligned
nmf.sem.cv()defaults withnmf.sem().
nmfkc 0.5.8
Graphviz DOT Output Consolidation and Cleanup
Harmonized all DOT-generating functions (
nmf.sem.DOT,nmfkc.DOT,nmfkc.ar.DOT) for consistent structure, naming conventions, and visualization logic.Standardized node and edge formatting rules, including unified cluster behavior, color schemes, and edge-scaling conventions.
Implemented threshold-aware coefficient labeling so that displayed numerical precision aligns with the visualization threshold, preventing misleadingly detailed labels.
Removed unused or redundant DOT fragments and improved compatibility across Graphviz engines.
Enhanced layout readability through consistent indentation, node grouping, and suppression of isolated nodes in specific visualization modes (e.g.,
type = "YA"innmfkc.DOT).Refactored and expanded internal DOT helper functions (
.nmfkc_dot_format_coef,.nmfkc_dot_digits_from_threshold,.nmfkc_dot_cluster_nodes, etc.) for better maintainability and uniform behavior.-
New Function: Implemented
nmfkc.ecv()for Element-wise Cross-Validation (Wold’s CV).- This function randomly masks elements of the observation matrix to evaluate structural reconstruction error.
- It provides a statistically robust criterion for rank selection, avoiding the monotonic error decrease often seen in standard column-wise CV.
- Supports vector input for
rankto evaluate multiple ranks simultaneously.
-
Missing Value & Weight Support:
-
nmfkc()andnmfkc.cv()now fully support missing values (NA) and observation weights via the hidden argumentY.weights(passed through...). - If
YcontainsNAs, they are automatically detected and masked (assigned a weight of 0) during optimization.
-
-
Rank Selection Diagnostics (
nmfkc.rank):- Dual-Axis Visualization: The plot now displays fitting metrics (\(R^2\), etc.) on the left axis and ECV Sigma (RMSE) on the right axis (blue line).
-
Automatic Best Rank labeling: The plot explicitly marks the “Best” rank based on two criteria:
- Elbow: Geometric elbow point of the \(R^2\) curve.
- Min: Minimum error point of the Element-wise CV.
-
save.timedefaults toFALSE, enabling the robust Element-wise CV calculation by default.
-
Argument Standardization:
- Unified the rank argument name to
rankacross all functions (nmfkc,nmfkc.cv,nmfkc.ecv,nmfkc.rank). - The legacy argument
Qis still supported for backward compatibility but internally mapped torank.
- Unified the rank argument name to
-
Summary Improvements:
-
Other Improvements:
- Added a validation check in
nmfkc.ar()to ensure the inputYhas no missing values (as they cannot be propagated to the covariate matrixAin VAR models). - Refined
nmfkc.residual.plot()layout margins for better visibility of titles. - Updated documentation to reflect all changes.
- Added a validation check in
-
Regularization Update:
The regularization scheme has been revised from L2 (ridge) to L1 (lasso-type) penalties.-
gammanow controls the L1 penalty on the coefficient matrix ( B = C A ), promoting sparsity in sample-wise coefficients. - A new argument
lambdahas been added to control the L1 penalty on the parameter matrix ( C ), encouraging sparsity in the shared template structure.
Both parameters can be passed through the ellipsis (...) tonmfkc()and related functions.
-
Function Signature Simplification:** Many less-frequently used arguments in
nmfkc()(e.g.,gamma,X.restriction,X.init) and innmfkc.cv()(e.g.,div,seed) have been moved into the ellipsis (...) for a cleaner function signature.Performance Improvement: The internal function
.silhouette.simplewas vectorized and optimized to reduce computational cost, particularly for the calculation ofa(i)andb(i).Removed the
fast.calcoption from thenmfkc()function.Added the
X.initargument to thenmfkc()function, allowing selection between'kmeans'and'nndsvd'initialization methods.The penalty term has been changed from
tr(CC')totr(BB')=tr(CAA'C').Implemented the internal
.zandxnormfunctions.Added the fast.calc option to the
nmfkc()function.Optimized internal calculations for improved performance.
Updated
citation("nmfkc")and added AIC/BIC to the output.Implemented the
nmfkc.ar.stationarity()function.Modified the
z()function.Used
crossprod()for faster matrix multiplication.Implemented the
nmfkc.ar.DOT()function.Added logic to sort the columns of
Xto form a unit matrix in special cases.Implemented
nmfkc.kernel.beta.cv()andnmfkc.ar.degree.cv()functions.Set the default column names of
XtoBasis1,Basis2, etc.Added
X.probandX.clusterto the return object.Skipped CPCC and silhouette calculations when
save.time = TRUE.Added a prototype for the
nmfkc.ar()function.Added the
criterionargument to thenmfkc()function to support multiple criteria.Updated the
nmfkc.rank()function.Added the
criterionargument to thenmfkc.rank()function.Implemented the
save.timeargument.Implemented the
nmfkc.rank()function.Implemented the
nstartoption from thekmeans()function.Added an experimental implementation of the
nmfkc.rank()function.Removed zero-variance columns and rows with a warning.
Added source and references to the documentation.
-
Renamed several components for clarity:
-
nmfkcregtonmfkc -
create.kerneltonmfkc.kernel -
nmfkcreg.cvtonmfkc.cv -
PtoB.prob -
clustertoB.cluster -
unittoX.column -
tracetoprint.trace -
dimstoprint.dims
-
Added the
r.squaredargument to thenmfkcreg.cv()function.-
In
nmfkcreg():- Added the
dimsargument to check matrix sizes. - Added the
unitargument to normalize the basis matrix columns.
- Added the
Modified the
create.kernel()function to support prediction.Updated examples on GitHub.
Removed the
YHATreturn value; useXBinstead.Added the
clusterreturn value for hard clustering.