Statistical inference for NMF-FFB via X-fixed full pair bootstrap
Source:R/nmf.ffb.R, R/nmf.sem.R
nmf.sem.inference.Rdnmf.sem.inference performs statistical inference on the structural
coefficient matrices \(C_1\) (latent feedback, \(\Theta_1\)) and
\(C_2\) (exogenous loading, \(\Theta_2\)) from a fitted
nmf.sem model.
The procedure is a full pair bootstrap that holds the basis matrix \(\hat X\) from the original fit fixed across all replicates (which avoids label switching and gives a clean conditional interpretation: “uncertainty of the structural coefficients given the measurement model”):
For each replicate \(b = 1, \dots, B\), resample column indices \((i_1, \dots, i_N)\) with replacement from \(\{1, \dots, N\}\) and form \(Y_1^{(b)} = Y_1[, i]\), \(Y_2^{(b)} = Y_2[, i]\).
Re-estimate \((C_1^{(b)}, C_2^{(b)})\) by running the
nmf.semmultiplicative updates with \(X = \hat X\) held fixed (no \(X\) update; no centroid sort), using the sameC1.L1,C2.L1as the original fit.Discard replicates that violate stationarity (\(\rho(X C_1^{(b)}) \ge 1\)) or have an amplification ratio exceeding the geometric-series bound by more than 1\
Because \(C_1, C_2 \ge 0\) are non-negative by construction, exact
zeros are essentially never observed in the bootstrap distribution.
Significance is assessed via a support rate at a small display
threshold \(\delta\) (default 0.01):
$$
\mathrm{sup}(c) \;=\; \frac{1}{|\mathrm{valid}|}
\sum_{b \in \mathrm{valid}} \mathbf{1}\!\left( \hat c^{(b)} > \delta \right).
$$
This is a one-sided counterpart of the classical \(p\)-value:
large support_rate indicates strong evidence that the entry is
meaningfully positive. Significance markers follow the lavaan
convention with the natural correspondence \(p = 1 - \mathrm{sup}\):
* (sup > 0.95), ** (sup > 0.99), ***
(sup > 0.999). Cutoffs use strict greater-than so the rule
mirrors the standard R convention for p-values (p < 0.05 / 0.01 /
0.001 → //), translated to support_rate via
\(\mathrm{sup} = 1 - p\).
Usage
nmf.ffb.inference(
object,
Y1,
Y2,
B = 1000L,
threshold = 0.01,
ci.level = 0.95,
C1.L1 = 1,
C2.L1 = 0.1,
seed = 123L,
...
)
nmf.sem.inference(
object,
Y1,
Y2,
B = 1000L,
threshold = 0.01,
ci.level = 0.95,
C1.L1 = 1,
C2.L1 = 0.1,
seed = 123L,
...
)Arguments
- object
A fitted object returned by
nmf.sem. Must containX,C1,C2.- Y1
Endogenous variable matrix (P1 x N). Must match the data used in
nmf.sem().- Y2
Exogenous variable matrix (P2 x N). Same.
- B
Number of bootstrap replicates. Default
1000; required for the***threshold (sup > 0.999). Reduce to 500 for exploratory speed (only*/**stay reliable).- threshold
Display threshold \(\delta\) for the support rate \(\Pr_{\mathrm{boot}}(\hat c^{(b)} > \delta)\). Default
0.01; entries below this magnitude are treated as effectively zero in the path diagram.- ci.level
Confidence level for the percentile bootstrap CI. Default
0.95.- C1.L1, C2.L1
L1 sparsity penalties used by the original
nmf.semfit. These must match the fit's hyperparameters for the bootstrap to estimate the correct model. Defaults (1.0,0.1) matchnmf.sem's defaults but you should pass the actual values used.- seed
Base RNG seed for the bootstrap. Each replicate uses
seed + b(resampling) andseed + 1000 + b(\(C_1, C_2\) initialization). Default123.- ...
Hidden options:
epsilonConvergence tolerance for the inner fixed-X MU loop. Default
1e-6.maxitMaximum iterations for the inner MU loop. Default
5000.ncoresNumber of parallel workers. Default
1(serial). Cross-platform: usesparallel::mclapplyon Linux/macOS andparallel::parLapply(PSOCK cluster) on Windows.print.traceLogical, print progress. Default
FALSE.
Value
The input object with additional bootstrap inference
components:
- coefficients
Data frame with rows for every entry of \(C_1\) and \(C_2\) and columns
Type("C1" / "C2"),Basis,Covariate,Estimate,CI_low,CI_high,support_rate,p_value(\(= 1 - \mathrm{support\_rate}\), for compatibility with downstream consumers such asnmf.sem.DOT), andsig.- C1.support.rate, C2.support.rate
Per-element support rates (Q x P1 and Q x P2 matrices).
- C1.ci.lower, C1.ci.upper, C2.ci.lower, C2.ci.upper
Per-element percentile CI bounds.
- C1.array, C2.array
Bootstrap distributions: 3D arrays of shape B x Q x P1 (and B x Q x P2). Invalid replicates contain
NA.- rho.boot, AR.boot, iter.boot
Per-replicate spectral radius, amplification ratio, and inner-loop iteration count.
- bootstrap.B, bootstrap.threshold, bootstrap.ci.level
Inputs recorded for reproducibility.
- bootstrap.n.valid, bootstrap.n.invalid
Validity counts.
Lifecycle
This function's interface changed at v0.6.8: the legacy 1-step Newton
wild bootstrap (with sandwich SE) has been replaced by the full pair
bootstrap described above, following the paper revision. The fields
sigma2.used, C2.se, C2.se.boot, C2.p.side
that the previous implementation produced are no longer present.
References
Satoh, K. (2025). Applying non-negative matrix factorization with covariates to structural equation modeling for blind input-output analysis. arXiv:2512.18250. https://arxiv.org/abs/2512.18250
Examples
# \donttest{
Y <- t(iris[, -5])
Y1 <- Y[1:2, ]; Y2 <- Y[3:4, ]
res <- nmf.sem(Y1, Y2, rank = 2)
#> Warning: maximum iterations (5000) reached...
res2 <- nmf.sem.inference(res, Y1, Y2, B = 200) # quick demo
head(res2$coefficients)
#> Type Basis Covariate Estimate CI_low CI_high support_rate
#> 1 C1 Factor1 Sepal.Length 9.641115e-01 9.520505e-01 9.721157e-01 1
#> 2 C1 Factor2 Sepal.Length 4.158689e-03 3.642629e-03 5.222247e-03 0
#> 3 C1 Factor1 Sepal.Width 4.184298e-02 3.289576e-02 5.725921e-02 1
#> 4 C1 Factor2 Sepal.Width 9.900868e-01 9.878874e-01 9.912207e-01 1
#> 5 C2 Factor1 Petal.Length 2.255319e-02 1.777278e-02 2.936270e-02 1
#> 6 C2 Factor2 Petal.Length 4.262750e-08 4.116256e-10 6.088379e-07 0
#> p_value sig
#> 1 0 ***
#> 2 1
#> 3 0 ***
#> 4 0 ***
#> 5 0 ***
#> 6 1
# }