Multivariate Graph-Regularized Regression — multivariate_graph

Fits a multivariate regression model with graph Laplacian regularization. This method encourages isoforms that are similar (e.g., share exons) to have similar prediction weight vectors.

Usage

multivariate_graph_reg(
  X,
  Y,
  similarity_matrix = NULL,
  lambda1 = NULL,
  lambda_graph = NULL,
  alpha = 0.5,
  nlambda = 15,
  nlambda_graph = 5,
  lambda_graph_seq = NULL,
  normalize_laplacian = FALSE,
  nfolds = 5,
  standardize = FALSE,
  verbose = FALSE,
  seed = 123,
  par = FALSE,
  n.cores = NULL
)

Arguments

X: matrix, design matrix of SNP dosages (n x p)
Y: matrix, matrix of isoform expression across columns (n x q)
similarity_matrix: matrix, q x q symmetric matrix of pairwise isoform similarities (e.g., Jaccard index of shared exons). Values should be >= 0. If NULL, uses identity (no graph regularization, equivalent to lasso).
lambda1: numeric or NULL, L1 penalty parameter. If NULL, selected by CV.
lambda_graph: numeric or NULL, graph penalty parameter. If NULL, selected by CV.
alpha: numeric, elastic net mixing (0 = ridge, 1 = lasso). Default 0.5.
nlambda: int, number of lambda1 values to try in CV
nlambda_graph: int, number of lambda_graph values to try in CV
lambda_graph_seq: numeric vector, specific lambda_graph values to try. If NULL, auto-generated.
normalize_laplacian: logical, use normalized Laplacian. Default FALSE.
nfolds: int, number of CV folds
standardize: logical, standardize X before fitting
verbose: logical, print progress
seed: int, random seed
par: logical, use parallel processing. Default FALSE.
n.cores: int, number of cores for parallel processing. Default NULL (auto-detect).

Value

isotwas_model object containing:

transcripts: list of transcript_model objects
best_lambda1: optimal L1 penalty
best_lambda_graph: optimal graph penalty
laplacian: the Laplacian matrix used

Details

The optimization problem is: $$\min_B \frac{1}{2n}||Y - XB||_F^2 + \lambda_1 ||B||_1 + \frac{\lambda_g}{2} \text{tr}(B^T B L)$$

where L is the graph Laplacian computed from the similarity matrix: $$L = D - S$$ with D being the degree matrix (diagonal with row sums of S).

The graph penalty encourages isoforms with high similarity to have similar coefficient vectors across SNPs.

The similarity matrix encodes prior knowledge about isoform relationships. For isoform-level TWAS, this is typically derived from shared exon structure: isoforms sharing more exons are expected to have more similar cis-regulatory effects, as they share more of the same genetic signal.

When par=TRUE, cross-validation is parallelized over all (lambda_graph, fold) combinations, providing up to nlambda_graph * nfolds parallel tasks.