Statistical inference for NMF-FFB via X-fixed full pair bootstrap

nmf.sem.inference performs statistical inference on the structural coefficient matrices $C_1$ (latent feedback, $\Theta_1$) and $C_2$ (exogenous loading, $\Theta_2$) from a fitted nmf.sem model.

The procedure is a full pair bootstrap that holds the basis matrix $\hat X$ from the original fit fixed across all replicates (which avoids label switching and gives a clean conditional interpretation: “uncertainty of the structural coefficients given the measurement model”):

For each replicate $b = 1, \dots, B$, resample column indices $(i_1, \dots, i_N)$ with replacement from $\{1, \dots, N\}$ and form $Y_1^{(b)} = Y_1[, i]$, $Y_2^{(b)} = Y_2[, i]$.
Re-estimate $(C_1^{(b)}, C_2^{(b)})$ by running the nmf.sem multiplicative updates with $X = \hat X$ held fixed (no $X$ update; no centroid sort), using the same C1.L1, C2.L1 as the original fit.
Discard replicates that violate stationarity ($\rho(X C_1^{(b)}) \ge 1$) or have an amplification ratio exceeding the geometric-series bound by more than 1\

Because $C_1, C_2 \ge 0$ are non-negative by construction, exact zeros are essentially never observed in the bootstrap distribution. Significance is assessed via a support rate at a small display threshold $\delta$ (default 0.01): $$ \mathrm{sup}(c) \;=\; \frac{1}{|\mathrm{valid}|} \sum_{b \in \mathrm{valid}} \mathbf{1}\!\left( \hat c^{(b)} > \delta \right). $$ This is a one-sided counterpart of the classical $p$-value: large support_rate indicates strong evidence that the entry is meaningfully positive. Significance markers follow the lavaan convention with the natural correspondence $p = 1 - \mathrm{sup}$: * (sup > 0.95), ** (sup > 0.99), *** (sup > 0.999). Cutoffs use strict greater-than so the rule mirrors the standard R convention for p-values (p < 0.05 / 0.01 / 0.001 → //), translated to support_rate via $\mathrm{sup} = 1 - p$.

Usage

nmf.ffb.inference(
  object,
  Y1,
  Y2,
  B = 1000L,
  threshold = 0.01,
  ci.level = 0.95,
  C1.L1 = 1,
  C2.L1 = 0.1,
  seed = 123L,
  ...
)

nmf.sem.inference(
  object,
  Y1,
  Y2,
  B = 1000L,
  threshold = 0.01,
  ci.level = 0.95,
  C1.L1 = 1,
  C2.L1 = 0.1,
  seed = 123L,
  ...
)

Arguments

object

A fitted object returned by nmf.sem. Must contain X, C1, C2.

Y1

Endogenous variable matrix (P1 x N). Must match the data used in nmf.sem().

Y2

Exogenous variable matrix (P2 x N). Same.

B

Number of bootstrap replicates. Default 1000; required for the *** threshold (sup > 0.999). Reduce to 500 for exploratory speed (only * / ** stay reliable).

threshold

Display threshold $\delta$ for the support rate $\Pr_{\mathrm{boot}}(\hat c^{(b)} > \delta)$. Default 0.01; entries below this magnitude are treated as effectively zero in the path diagram.

ci.level

Confidence level for the percentile bootstrap CI. Default 0.95.

C1.L1, C2.L1

L1 sparsity penalties used by the original nmf.sem fit. These must match the fit's hyperparameters for the bootstrap to estimate the correct model. Defaults (1.0, 0.1) match nmf.sem's defaults but you should pass the actual values used.

seed

Base RNG seed for the bootstrap. Each replicate uses seed + b (resampling) and seed + 1000 + b ($C_1, C_2$ initialization). Default 123.

...

Hidden options:

epsilon: Convergence tolerance for the inner fixed-X MU loop. Default 1e-6.
maxit: Maximum iterations for the inner MU loop. Default 5000.
ncores: Number of parallel workers. Default 1 (serial). Cross-platform: uses parallel::mclapply on Linux/macOS and parallel::parLapply (PSOCK cluster) on Windows.
print.trace: Logical, print progress. Default FALSE.

Value

The input object with additional bootstrap inference components:

coefficients: Data frame with rows for every entry of $C_1$ and $C_2$ and columns Type ("C1" / "C2"), Basis, Covariate, Estimate, CI_low, CI_high, support_rate, p_value ($= 1 - \mathrm{support\_rate}$, for compatibility with downstream consumers such as nmf.sem.DOT), and sig.
C1.support.rate, C2.support.rate: Per-element support rates (Q x P1 and Q x P2 matrices).
C1.ci.lower, C1.ci.upper, C2.ci.lower, C2.ci.upper: Per-element percentile CI bounds.
C1.array, C2.array: Bootstrap distributions: 3D arrays of shape B x Q x P1 (and B x Q x P2). Invalid replicates contain NA.
rho.boot, AR.boot, iter.boot: Per-replicate spectral radius, amplification ratio, and inner-loop iteration count.
bootstrap.B, bootstrap.threshold, bootstrap.ci.level: Inputs recorded for reproducibility.
bootstrap.n.valid, bootstrap.n.invalid: Validity counts.

Lifecycle

This function's interface changed at v0.6.8: the legacy 1-step Newton wild bootstrap (with sandwich SE) has been replaced by the full pair bootstrap described above, following the paper revision. The fields sigma2.used, C2.se, C2.se.boot, C2.p.side that the previous implementation produced are no longer present.

References

Satoh, K. (2025). Applying non-negative matrix factorization with covariates to structural equation modeling for blind input-output analysis. arXiv:2512.18250. https://arxiv.org/abs/2512.18250

Examples

# \donttest{
Y <- t(iris[, -5])
Y1 <- Y[1:2, ]; Y2 <- Y[3:4, ]
res  <- nmf.sem(Y1, Y2, rank = 2)
#> Warning: maximum iterations (5000) reached...
res2 <- nmf.sem.inference(res, Y1, Y2, B = 200)  # quick demo
head(res2$coefficients)
#>   Type   Basis    Covariate     Estimate       CI_low      CI_high support_rate
#> 1   C1 Factor1 Sepal.Length 9.641115e-01 9.520505e-01 9.721157e-01            1
#> 2   C1 Factor2 Sepal.Length 4.158689e-03 3.642629e-03 5.222247e-03            0
#> 3   C1 Factor1  Sepal.Width 4.184298e-02 3.289576e-02 5.725921e-02            1
#> 4   C1 Factor2  Sepal.Width 9.900868e-01 9.878874e-01 9.912207e-01            1
#> 5   C2 Factor1 Petal.Length 2.255319e-02 1.777278e-02 2.936270e-02            1
#> 6   C2 Factor2 Petal.Length 4.262750e-08 4.116256e-10 6.088379e-07            0
#>   p_value sig
#> 1       0 ***
#> 2       1    
#> 3       0 ***
#> 4       0 ***
#> 5       0 ***
#> 6       1    
# }