Non-negative Matrix Factorization with Random Effects

Estimates the NMF-RE model $$Y = X(\Theta A + U) + \mathcal{E}$$ where $Y$ ($P \times N$) is a non-negative observation matrix, $X$ ($P \times Q$) is a non-negative basis matrix learned from the data, $\Theta$ ($Q \times K$) is a non-negative coefficient matrix capturing systematic covariate effects on latent scores, $A$ ($K \times N$) is a covariate matrix, and $U$ ($Q \times N$) is a random effects matrix capturing unit-specific deviations in the latent score space.

NMF-RE can be viewed as a mixed-effects latent-variable model defined on a reconstruction (mean) structure. The non-negativity constraint on $X$ induces sparse, parts-based loadings, achieving measurement-side variable selection without an explicit sparsity penalty. Inference on $\Theta$ provides covariate-side variable selection by identifying which covariates significantly affect which components.

Estimation alternates ridge-type BLUP-like closed-form updates for $U$ with multiplicative non-negative updates for $X$ and $\Theta$. The effective degrees of freedom consumed by $U$ are monitored and a df-based cap can be enforced to prevent near-saturated fits.

When wild.bootstrap = TRUE, inference on $\Theta$ is performed conditional on $(\hat{X}, \hat{U})$ via asymptotic linearization, a one-step Newton update, and a multiplier (wild) bootstrap, yielding standard errors, z-values, p-values, and confidence intervals without repeated constrained re-optimization.

Usage

nmfre(
  Y,
  A = NULL,
  rank = 2,
  df.rate = NULL,
  wild.bootstrap = TRUE,
  epsilon = 1e-05,
  maxit = 5000,
  ...
)

Arguments

Y

Observation matrix (P x N), non-negative.

A

Covariate matrix (K x N). Default is a row of ones (intercept only).

rank

Integer. Rank of the basis matrix $X$. Default is 2. For backward compatibility, Q is accepted via ....

df.rate

Rate for computing the dfU cap (cap = rate * N * Q). For backward compatibility, dfU.cap.rate is accepted via .... If NULL (default), runs nmfre.dfU.scan internally and selects the minimum rate where the cap is not binding. Use nmfre.dfU.scan beforehand to examine dfU behavior across rates and choose an appropriate value.

wild.bootstrap

Logical. If TRUE (default), perform wild bootstrap inference on $\Theta$.

epsilon

Convergence tolerance for relative change in objective (default 1e-5).

maxit

Maximum number of iterations. Default 5000 (matches nmfkc and the other MU functions in the package). When the cap is hit without meeting the relative- tolerance criterion, a "maximum iterations (...) reached..." warning is emitted so users notice unconverged fits.

...

Additional arguments for initialization, variance control, dfU control, optimization, and inference settings.

X.init: Initial basis matrix (P x Q), or NULL. When NULL, nmfkc is called internally to generate initial values.
C.init: Initial coefficient matrix (Q x K), or NULL. When NULL, nmfkc is called internally to generate initial values.
U.init: Initial random effects matrix (Q x N), or NULL (all zeros).
prefix: Prefix for basis names (default "Basis").
sigma2: Initial residual variance (default 1).
sigma2.update: Logical. Update $\sigma^2$ during iterations (default TRUE).
tau2: Initial random effect variance (default 1).
tau2.update: Logical. Update $\tau^2$ by moment matching (default TRUE). Disabled when dfU cap is active.
dfU.control: Either "cap" (default) to enforce a cap on dfU, or "off" for no cap.
print.trace: Logical. If TRUE, print progress every 100 iterations (default FALSE).
seed: Integer seed for reproducibility (default 1).
C.p.side: P-value sidedness: "one.sided" (default, for boundary null H0: C=0 vs H1: C>0) or "two.sided".
wild.B: Number of wild bootstrap replicates (default 500).
wild.seed: Seed for wild bootstrap (default 123).

Value

A list of class "nmfre" with components. The model is $Y = X(\Theta A + U) + \mathcal{E}$.

Core matrices

X: Basis matrix $X$ ($P \times Q$), columns normalized to sum to 1.
C: Coefficient matrix $\Theta$ ($Q \times K$).
U: Random effects matrix $U$ ($Q \times N$).

Variance components

sigma2: Residual variance $\hat{\sigma}^2$.
tau2: Random effect variance $\hat{\tau}^2$.
lambda: Ridge penalty $\lambda = \sigma^2 / \tau^2$.

Convergence diagnostics

converged: Logical. Whether the algorithm converged.
stop.reason: Character string describing why iteration stopped.
iter: Number of iterations performed.
maxit: Maximum iterations setting used.
epsilon: Convergence tolerance used.
objfunc: Final objective function value $\|Y - X(\Theta A + U)\|^2 + \lambda \|U\|^2$.
rel.change.final: Final relative change in objective.
objfunc.iter: Numeric vector of objective values per iteration.
rss.trace: Numeric vector of $\|Y - X(\Theta A + U)\|^2$ per iteration.

Effective degrees of freedom (dfU) diagnostics

dfU: Final effective degrees of freedom $\mathrm{df}_U = N \sum_q d_q / (d_q + \lambda)$, where $d_q$ are eigenvalues of $X'X$.
dfU.cap: Upper bound imposed on $\mathrm{df}_U$.
dfU.cap.rate: Rate used to compute the cap.
dfU.cap.scan: Result of nmfre.dfU.scan, or NULL.
lambda.enforced: Final $\lambda$ enforced to satisfy the cap.
dfU.hit.cap: Logical. Whether the cap was binding.
dfU.hit.iter: Iteration at which the cap first bound.
dfU.frac: $\mathrm{df}_U / (NQ)$, fraction of maximum df.
dfU.cap.frac: $\mathrm{df}_U^{\mathrm{cap}} / (NQ)$.

Fitted matrices

B: Fixed-effect scores $\Theta A$ ($Q \times N$).
B.prob: Column-normalized probabilities from $\max(\Theta A, 0)$.
B.blup: BLUP scores $\Theta A + U$ ($Q \times N$).
B.blup.pos: Non-negative BLUP scores $\max(\Theta A + U, 0)$ ($Q \times N$).
B.blup.prob: Column-normalized probabilities from $\max(\Theta A + U, 0)$.
XB: Fitted values from fixed effects $X \Theta A$ ($P \times N$).
XB.blup: Fitted values including random effects $X(\Theta A + U)$ ($P \times N$).

Fit statistics

r.squared: Pearson $\mathrm{cor}(Y, X(\Theta A + U))^2$ (BLUP prediction).
r.squared.uncentered: Uncentered $1 - \|Y - X(\Theta A + U)\|_F^2 / \|Y\|_F^2$ (BLUP; baseline = zero matrix).
r.squared.centered: Row-mean centered $1 - \|Y - X(\Theta A + U)\|_F^2 / \|Y - \bar Y_{p\cdot}\|_F^2$ (BLUP; baseline = per-row mean).
r.squared.fixed: Pearson $\mathrm{cor}(Y, X\Theta A)^2$ (fixed-only prediction).
r.squared.fixed.uncentered, r.squared.fixed.centered: Uncentered and centered $R^2$ for the fixed-only prediction.
ICC: Trace-based Intraclass Correlation Coefficient. In the NMF-RE model, the conditional covariance of the $n$-th observation column is $\mathrm{Var}(Y_n) = \tau^2 X X^\top + \sigma^2 I_P$, a $P \times P$ matrix. Unlike a standard random intercept model where the design matrix $Z$ is a simple indicator (so the ICC reduces to $\tau^2 / (\sigma^2 + \tau^2)$), the basis matrix $X$ plays the role of $Z$ in a random slopes model, making the variance contribution of $U$ depend on $X$. To obtain a scalar summary, we take the trace of each component: $$\mathrm{ICC} = \frac{\tau^2 \, \mathrm{tr}(X^\top X)} {\tau^2 \, \mathrm{tr}(X^\top X) + \sigma^2 P}.$$ This equals the average (over $P$ dimensions) proportion of per-column variance attributable to the random effects.

Inference on $\Theta$ (wild bootstrap)

sigma2.used: $\hat{\sigma}^2$ used for inference.
C.vec.cov: Variance-covariance matrix for $\mathrm{vec}(\Theta)$ ($QK \times QK$).
C.se: Standard error matrix for $\Theta$ ($Q \times K$).
C.se.hess: Sandwich (Hessian-based) SE matrix for $\Theta$.
C.se.boot: Bootstrap SE matrix for $\Theta$.
coefficients: Data frame with columns Estimate, Std. Error, z value, Pr(>|z|), and confidence interval bounds for each element of $\Theta$.
C.ci.lower: Lower confidence interval matrix for $\Theta$ ($Q \times K$).
C.ci.upper: Upper confidence interval matrix for $\Theta$ ($Q \times K$).
C.boot.sd: Bootstrap standard deviation matrix for $\Theta$ ($Q \times K$).
C.p.side: P-value sidedness used: "one.sided" or "two.sided".

References

Satoh, K. (2026). Wild Bootstrap Inference for Non-Negative Matrix Factorization with Random Effects. arXiv:2603.01468. https://arxiv.org/abs/2603.01468

Examples

# Example 1. cars data
Y <- matrix(cars$dist, nrow = 1)
A <- rbind(intercept = 1, speed = cars$speed)
res <- nmfre(Y, A, rank = 1, maxit = 5000)
summary(res)
#> NMF-RE: Y(1,50) = X(1,1) [C(1,2) A + U(1,50)]
#> Iterations: 4 (converged, epsilon = 1e-05)
#> R-squared (cor^2):    0.9515 (XB+blup), 0.6511 (XB)
#> R-squared (uncentered):     0.9752 (XB+blup), 0.8959 (XB)
#> R-squared (centered): 0.9047 (XB+blup), 0.6005 (XB)
#> 
#> Variance components:
#>   sigma2 = 1  (residual)
#>   tau2   = 1  (random effect)
#>   lambda = 1  (sigma2 / tau2)
#>   ICC    = 0.5000  (tau2*tr(X'X) / (tau2*tr(X'X) + sigma2*P))
#>   dfU    = 25.00 <= 30.00 (cap rate = 0.60)
#> 
#> Coefficients:
#>                   Estimate Std. Error (Boot) z value   Pr(>z) 
#> intercept:Basis1     0.014      6.427  3.254    0.00   0.4991  
#>     speed:Basis1     2.851      0.456  0.432    6.25 2.118e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# \donttest{
# Example 2. Orthodont data (nlme)
if (requireNamespace("nlme", quietly = TRUE)) {
  Y <- matrix(nlme::Orthodont$distance, 4, 27)
  male <- ifelse(nlme::Orthodont$Sex[seq(1, 108, 4)] == "Male", 1, 0)
  A <- rbind(intercept = 1, male = male)

  # Scan dfU cap rates to choose an appropriate value
  nmfre.dfU.scan(1:10/10, Y, A, rank = 1)

  # Fit with chosen rate
  res <- nmfre(Y, A, rank = 1, df.rate = 0.2)
  summary(res)
}# }
#> NMF-RE: Y(4,27) = X(4,1) [C(1,2) A + U(1,27)]
#> Iterations: 2 (converged, epsilon = 1e-05)
#> R-squared (cor^2):    0.5739 (XB+blup), 0.4167 (XB)
#> R-squared (uncentered):     0.9937 (XB+blup), 0.9915 (XB)
#> R-squared (centered): 0.4139 (XB+blup), 0.2133 (XB)
#> 
#> Variance components:
#>   sigma2 = 1  (residual)
#>   tau2   = 0.9961  (random effect)
#>   lambda = 1.004  (sigma2 / tau2)
#>   ICC    = 0.0588  (tau2*tr(X'X) / (tau2*tr(X'X) + sigma2*P))
#>   dfU    = 5.40 <= 5.40 (cap rate = 0.20)
#> 
#> Coefficients:
#>                   Estimate Std. Error (Boot) z value   Pr(>z) 
#> intercept:Basis1    90.393      2.471  2.452   36.58   <2e-16 ***
#>      male:Basis1     9.606      3.056  2.977    3.14 0.0008356 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1