NMF-KC with signed covariate matrix

Solves $$Y \approx X\,\Theta\,A,\qquad X \ge 0,\;\Theta\in\R^{Q\times D}, \;A\in\R^{D\times N},$$ where the covariate matrix $A$ and the coefficient matrix $\Theta$ may be signed. Internally $A = A_{+} - A_{-}$ and $\Theta = C_{+} - C_{-}$ with $A_{\pm}, C_{\pm} \ge 0$ (sign-splitting trick, Ding et al. 2010), and the problem is solved by a Direct Multiplicative Update algorithm whose iteration cost is $O(Q D^2)$, independent of $N$.

Only $X$ is structurally constrained to be non-negative (Semi-NMF sense of Ding, Li, & Jordan 2010). In particular, $Y$ may contain negative entries, in which case the response is fit in the least-squares sense without any non-negativity requirement on $Y$.

When $A \ge 0$ (so $A_{-} = 0$), the result reduces to nmfkc(Y, A, rank) with Euclidean loss, up to reordering.

Usage

nmfkc.signed(
  Y,
  A,
  rank = NULL,
  epsilon = 1e-04,
  maxit = 5000,
  verbose = TRUE,
  ...
)

Arguments

Y

Real-valued $Q_{\mathrm{obs}} \times N$ response matrix. Unlike nmfkc, negative entries are allowed.

A

Real-valued $D \times N$ covariate matrix (signed). A single matrix is passed; its positive and negative parts $A_{+} = \max(A, 0)$ and $A_{-} = \max(-A, 0)$ are computed internally. When using Random Fourier Features (Rahimi & Recht 2007) as $A$, supply the RFF parameters via the hidden pars argument so that predict() can regenerate features for new data (see pars entry in ... below).

rank

Integer. Number of latent components $Q$ in $X$.

epsilon

Relative convergence tolerance on the objective (default 1e-4).

maxit

Maximum number of iterations (default 5000).

verbose

Logical. Print dimensions at start (default TRUE).

...

Additional arguments:

Q: alias for rank.
X.restriction: constraint applied to columns of $X$ after every update, with the scale absorbed into $C_{+}, C_{-}$. One of "colSums" (default, $\mathrm{colSums}(X) = 1$), "colSqSums", "totalSum", "none", "fixed".
X.init: initialization strategy for the basis matrix $X$ ($Q_{\mathrm{obs}} \times Q$). Accepts the same menu as nmfkc: "kmeans" (default), "kmeansar", "nndsvd", "runif", or a user-supplied $Q_{\mathrm{obs}} \times Q$ non-negative numeric matrix. String methods delegate to the shared internal helper .init_X_method() (see nmfkc for the definitions of each method). For signed $Y$, "kmeans" cluster centers may contain negative entries; they are clipped to zero to satisfy $X \ge 0$, and any column that collapses to all-zeros is re-filled with small $\mathrm{Uniform}(0, 0.1)$ noise.
C.init: explicit initial $Q \times D$ coefficient matrix $\Theta$ (signed). Split internally.
warm.start: logical (default TRUE). If TRUE and $Y \ge 0$, runs nmfkc(Y, A = rbind(A_+, A_-), rank = Q) internally to seed $X, C_{+}, C_{-}$. The user's X.init, seed, nstart, and X.restriction are forwarded to the internal nmfkc call so that initialization choices propagate consistently between the warm-start and the signed MU loop. Ignored when $Y$ has negative entries (warm-start is disabled; X.init is used directly by the signed branch instead).
seed: RNG seed for random initialization (default 123).
prefix: name prefix for rows of $C$ and columns of $X$ (default "Basis").
pars: optional list list(omega, b, D, beta) of Random Fourier Feature parameters (Rahimi & Recht 2007; omega: frequency matrix, b: phase offset, D: feature dimension, beta: bandwidth). When supplied, it is stored in the returned object so that summary() can report $\beta$ and downstream predict() calls can regenerate RFF features for new data. If A is not RFF features, leave this NULL.
Y.weights: Optional non-negative weight matrix ($Q_{\mathrm{obs}} \times N$) or vector (length $N$), analogous to the weights argument of lm. Loss becomes $\sum W_{ij} \, (Y_{ij} - (XCA)_{ij})^2$ (lm()-style, linear in $W$). Logical matrices (TRUE / FALSE) are also accepted. Typical usage by nmfkc.signed.cv / nmfkc.signed.ecv passes a binary mask $W \in \{0,1\}$ to hold out test elements; real-valued weights for observation-level importance weighting are also supported. Default NULL: if Y has NA, a binary mask is auto-constructed (0 for NA, 1 elsewhere); otherwise no weighting.
nstart: number of random restarts. Signed models have more local minima than non-negative ones because $\Theta = C_{+} - C_{-}$ can take both positive and negative values. Since nmfkc.signed() itself does not loop over restarts (callers control it), set the outer-loop size via e.g. running the function several times with different seed and keeping the fit with the smallest $objfunc. A restart budget of 10-50 is recommended for publication-grade runs on signed data.

Value

An object of class c("nmfkc.signed", "nmfkc") with

X: $Q_{\mathrm{obs}} \times Q$ basis matrix (non-negative, column-normalized according to X.restriction).
Cp, Cn: $Q \times D$ non-negative parts of $\Theta$, so that $\Theta = C_{+} - C_{-}$.
C: $C_{+} - C_{-}$ (= $\Theta$), signed.
B: $C \, A$, $Q \times N$ (signed).
objfunc.iter: objective values per iteration.
objfunc: final objective.
r.squared: $\mathrm{cor}(Y, \widehat Y)^2$.
mae: mean absolute error.
iter: number of iterations performed.
runtime: elapsed seconds.
Y.signed: logical; whether $Y$ contained negative entries during fitting.
pars: RFF generating parameters, if supplied.
call: the matched call.

Lifecycle

This function is experimental. The interface may change in future versions.

References

Ding, C. H. Q., Li, T., & Jordan, M. I. (2010). Convex and semi-nonnegative matrix factorizations. IEEE TPAMI, 32(1), 45–55.

Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. Advances in NIPS, 20.

Examples

# \donttest{
set.seed(1)
## Example 1: signed A (e.g., hand-built RFF features), non-negative Y
## Build simple signed features Z = sqrt(2/D) * cos(omega^T U + b):
U     <- matrix(stats::rnorm(5 * 40), 5, 40)           # raw input
D     <- 20                                            # feature dim
omega <- matrix(stats::rnorm(5 * D), 5, D)             # random freqs
b     <- stats::runif(D, 0, 2 * pi)                    # phase
Z     <- sqrt(2 / D) *
           cos(t(omega) %*% U + matrix(b, D, 40))      # D x 40, signed
Y     <- matrix(abs(stats::rnorm(8 * 40)), 8, 40)
res1  <- nmfkc.signed(Y, A = Z, rank = 3, maxit = 200)
#> Y(8,40) ~ X(8,3) %*% C(3,20) %*% A(20,40)  [signed covariate]

## Example 2: signed Y (regression)
Y2    <- matrix(stats::rnorm(8 * 40), 8, 40)           # signed response
res2  <- nmfkc.signed(Y2, A = Z, rank = 3, maxit = 200,
                       warm.start = FALSE)
#> Y(8,40) ~ X(8,3) %*% C(3,20) %*% A(20,40)  [signed covariate], Y signed
# }