Skip to contents

Estimates the NMF-RE model $$Y = X(\Theta A + U) + \mathcal{E}$$ where \(Y\) (\(P \times N\)) is a non-negative observation matrix, \(X\) (\(P \times Q\)) is a non-negative basis matrix learned from the data, \(\Theta\) (\(Q \times K\)) is a non-negative coefficient matrix capturing systematic covariate effects on latent scores, \(A\) (\(K \times N\)) is a covariate matrix, and \(U\) (\(Q \times N\)) is a random effects matrix capturing unit-specific deviations in the latent score space.

NMF-RE can be viewed as a mixed-effects latent-variable model defined on a reconstruction (mean) structure. The non-negativity constraint on \(X\) induces sparse, parts-based loadings, achieving measurement-side variable selection without an explicit sparsity penalty. Inference on \(\Theta\) provides covariate-side variable selection by identifying which covariates significantly affect which components.

Estimation alternates ridge-type BLUP-like closed-form updates for \(U\) with multiplicative non-negative updates for \(X\) and \(\Theta\). The effective degrees of freedom consumed by \(U\) are monitored and a df-based cap can be enforced to prevent near-saturated fits.

When wild.bootstrap = TRUE, inference on \(\Theta\) is performed conditional on \((\hat{X}, \hat{U})\) via asymptotic linearization, a one-step Newton update, and a multiplier (wild) bootstrap, yielding standard errors, z-values, p-values, and confidence intervals without repeated constrained re-optimization.

Usage

nmfre(
  Y,
  A = NULL,
  rank = 2,
  df.rate = NULL,
  wild.bootstrap = TRUE,
  epsilon = 1e-05,
  maxit = 50000,
  ...
)

Arguments

Y

Observation matrix (P x N), non-negative.

A

Covariate matrix (K x N). Default is a row of ones (intercept only).

rank

Integer. Rank of the basis matrix \(X\). Default is 2. For backward compatibility, Q is accepted via ....

df.rate

Rate for computing the dfU cap (cap = rate * N * Q). For backward compatibility, dfU.cap.rate is accepted via .... If NULL (default), runs nmfre.dfU.scan internally and selects the minimum rate where the cap is not binding. Use nmfre.dfU.scan beforehand to examine dfU behavior across rates and choose an appropriate value.

wild.bootstrap

Logical. If TRUE (default), perform wild bootstrap inference on \(\Theta\).

epsilon

Convergence tolerance for relative change in objective (default 1e-5).

maxit

Maximum number of iterations (default 50000).

...

Additional arguments for initialization, variance control, dfU control, optimization, and inference settings.

  • X.init: Initial basis matrix (P x Q), or NULL. When NULL, nmfkc is called internally to generate initial values.

  • C.init: Initial coefficient matrix (Q x K), or NULL. When NULL, nmfkc is called internally to generate initial values.

  • U.init: Initial random effects matrix (Q x N), or NULL (all zeros).

  • prefix: Prefix for basis names (default "Basis").

  • sigma2: Initial residual variance (default 1).

  • sigma2.update: Logical. Update \(\sigma^2\) during iterations (default TRUE).

  • tau2: Initial random effect variance (default 1).

  • tau2.update: Logical. Update \(\tau^2\) by moment matching (default TRUE). Disabled when dfU cap is active.

  • dfU.control: Either "cap" (default) to enforce a cap on dfU, or "off" for no cap.

  • print.trace: Logical. If TRUE, print progress every 100 iterations (default FALSE).

  • seed: Integer seed for reproducibility (default 1).

  • C.p.side: P-value sidedness: "one.sided" (default, for boundary null H0: C=0 vs H1: C>0) or "two.sided".

  • wild.B: Number of wild bootstrap replicates (default 500).

  • wild.seed: Seed for wild bootstrap (default 123).

Value

A list of class "nmfre" with components. The model is \(Y = X(\Theta A + U) + \mathcal{E}\).

Core matrices

X

Basis matrix \(X\) (\(P \times Q\)), columns normalized to sum to 1.

C

Coefficient matrix \(\Theta\) (\(Q \times K\)).

U

Random effects matrix \(U\) (\(Q \times N\)).

Variance components

sigma2

Residual variance \(\hat{\sigma}^2\).

tau2

Random effect variance \(\hat{\tau}^2\).

lambda

Ridge penalty \(\lambda = \sigma^2 / \tau^2\).

Convergence diagnostics

converged

Logical. Whether the algorithm converged.

stop.reason

Character string describing why iteration stopped.

iter

Number of iterations performed.

maxit

Maximum iterations setting used.

epsilon

Convergence tolerance used.

objfunc

Final objective function value \(\|Y - X(\Theta A + U)\|^2 + \lambda \|U\|^2\).

rel.change.final

Final relative change in objective.

objfunc.iter

Numeric vector of objective values per iteration.

rss.trace

Numeric vector of \(\|Y - X(\Theta A + U)\|^2\) per iteration.

Effective degrees of freedom (dfU) diagnostics

dfU

Final effective degrees of freedom \(\mathrm{df}_U = N \sum_q d_q / (d_q + \lambda)\), where \(d_q\) are eigenvalues of \(X'X\).

dfU.cap

Upper bound imposed on \(\mathrm{df}_U\).

dfU.cap.rate

Rate used to compute the cap.

dfU.cap.scan

Result of nmfre.dfU.scan, or NULL.

lambda.enforced

Final \(\lambda\) enforced to satisfy the cap.

dfU.hit.cap

Logical. Whether the cap was binding.

dfU.hit.iter

Iteration at which the cap first bound.

dfU.frac

\(\mathrm{df}_U / (NQ)\), fraction of maximum df.

dfU.cap.frac

\(\mathrm{df}_U^{\mathrm{cap}} / (NQ)\).

Fitted matrices

B

Fixed-effect scores \(\Theta A\) (\(Q \times N\)).

B.prob

Column-normalized probabilities from \(\max(\Theta A, 0)\).

B.blup

BLUP scores \(\Theta A + U\) (\(Q \times N\)).

B.blup.pos

Non-negative BLUP scores \(\max(\Theta A + U, 0)\) (\(Q \times N\)).

B.blup.prob

Column-normalized probabilities from \(\max(\Theta A + U, 0)\).

XB

Fitted values from fixed effects \(X \Theta A\) (\(P \times N\)).

XB.blup

Fitted values including random effects \(X(\Theta A + U)\) (\(P \times N\)).

Fit statistics

r.squared

Coefficient of determination \(R^2\) for \(Y\) vs \(X(\Theta A + U)\) (BLUP).

r.squared.fixed

Coefficient of determination \(R^2\) for \(Y\) vs \(X \Theta A\) (fixed effects only).

ICC

Trace-based Intraclass Correlation Coefficient. In the NMF-RE model, the conditional covariance of the \(n\)-th observation column is \(\mathrm{Var}(Y_n) = \tau^2 X X^\top + \sigma^2 I_P\), a \(P \times P\) matrix. Unlike a standard random intercept model where the design matrix \(Z\) is a simple indicator (so the ICC reduces to \(\tau^2 / (\sigma^2 + \tau^2)\)), the basis matrix \(X\) plays the role of \(Z\) in a random slopes model, making the variance contribution of \(U\) depend on \(X\). To obtain a scalar summary, we take the trace of each component: $$\mathrm{ICC} = \frac{\tau^2 \, \mathrm{tr}(X^\top X)} {\tau^2 \, \mathrm{tr}(X^\top X) + \sigma^2 P}.$$ This equals the average (over \(P\) dimensions) proportion of per-column variance attributable to the random effects.

Inference on \(\Theta\) (wild bootstrap)

sigma2.used

\(\hat{\sigma}^2\) used for inference.

C.vec.cov

Variance-covariance matrix for \(\mathrm{vec}(\Theta)\) (\(QK \times QK\)).

C.se

Standard error matrix for \(\Theta\) (\(Q \times K\)).

C.se.hess

Sandwich (Hessian-based) SE matrix for \(\Theta\).

C.se.boot

Bootstrap SE matrix for \(\Theta\).

coefficients

Data frame with columns Estimate, Std. Error, z value, Pr(>|z|), and confidence interval bounds for each element of \(\Theta\).

C.ci.lower

Lower confidence interval matrix for \(\Theta\) (\(Q \times K\)).

C.ci.upper

Upper confidence interval matrix for \(\Theta\) (\(Q \times K\)).

C.boot.sd

Bootstrap standard deviation matrix for \(\Theta\) (\(Q \times K\)).

C.p.side

P-value sidedness used: "one.sided" or "two.sided".

References

Satoh, K. (2026). Wild Bootstrap Inference for Non-Negative Matrix Factorization with Random Effects. arXiv:2603.01468. https://arxiv.org/abs/2603.01468

Examples

# Example 1. cars data
Y <- matrix(cars$dist, nrow = 1)
A <- rbind(intercept = 1, speed = cars$speed)
res <- nmfre(Y, A, rank = 1, maxit = 5000)
summary(res)
#> NMF-RE: Y(1,50) = X(1,1) [C(1,2) A + U(1,50)]
#> Iterations: 4 (converged, epsilon = 1e-05)
#> R-squared: 0.9047 (XB+blup), 0.6005 (XB)
#> 
#> Variance components:
#>   sigma2 = 1  (residual)
#>   tau2   = 1  (random effect)
#>   lambda = 1  (sigma2 / tau2)
#>   ICC    = 0.5000  (tau2*tr(X'X) / (tau2*tr(X'X) + sigma2*P))
#>   dfU    = 25.00 <= 30.00 (cap rate = 0.60)
#> 
#> Coefficients:
#>                   Estimate Std. Error (Boot) z value   Pr(>z) 
#> intercept:Basis1     0.014      6.427  3.254    0.00   0.4991  
#>     speed:Basis1     2.851      0.456  0.432    6.25 2.118e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# \donttest{
# Example 2. Orthodont data (nlme)
if (requireNamespace("nlme", quietly = TRUE)) {
  Y <- matrix(nlme::Orthodont$distance, 4, 27)
  male <- ifelse(nlme::Orthodont$Sex[seq(1, 108, 4)] == "Male", 1, 0)
  A <- rbind(intercept = 1, male = male)

  # Scan dfU cap rates to choose an appropriate value
  nmfre.dfU.scan(1:10/10, Y, A, rank = 1)

  # Fit with chosen rate
  res <- nmfre(Y, A, rank = 1, df.rate = 0.2)
  summary(res)
}# }
#> NMF-RE: Y(4,27) = X(4,1) [C(1,2) A + U(1,27)]
#> Iterations: 2 (converged, epsilon = 1e-05)
#> R-squared: 0.5654 (XB+blup), 0.4167 (XB)
#> 
#> Variance components:
#>   sigma2 = 1  (residual)
#>   tau2   = 0.9961  (random effect)
#>   lambda = 1.004  (sigma2 / tau2)
#>   ICC    = 0.0588  (tau2*tr(X'X) / (tau2*tr(X'X) + sigma2*P))
#>   dfU    = 5.40 <= 5.40 (cap rate = 0.20)
#> 
#> Coefficients:
#>                   Estimate Std. Error (Boot) z value   Pr(>z) 
#> intercept:Basis1    90.393      2.471  2.452   36.58   <2e-16 ***
#>      male:Basis1     9.606      3.056  2.977    3.14 0.0008356 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1