Skip to contents

Fits a multivariate regression model with graph Laplacian regularization. This method encourages isoforms that are similar (e.g., share exons) to have similar prediction weight vectors.

Usage

multivariate_graph_reg(
  X,
  Y,
  similarity_matrix = NULL,
  lambda1 = NULL,
  lambda_graph = NULL,
  alpha = 0.5,
  nlambda = 15,
  nlambda_graph = 5,
  lambda_graph_seq = NULL,
  normalize_laplacian = FALSE,
  nfolds = 5,
  standardize = FALSE,
  verbose = FALSE,
  seed = 123,
  par = FALSE,
  n.cores = NULL
)

Arguments

X

matrix, design matrix of SNP dosages (n x p)

Y

matrix, matrix of isoform expression across columns (n x q)

similarity_matrix

matrix, q x q symmetric matrix of pairwise isoform similarities (e.g., Jaccard index of shared exons). Values should be >= 0. If NULL, uses identity (no graph regularization, equivalent to lasso).

lambda1

numeric or NULL, L1 penalty parameter. If NULL, selected by CV.

lambda_graph

numeric or NULL, graph penalty parameter. If NULL, selected by CV.

alpha

numeric, elastic net mixing (0 = ridge, 1 = lasso). Default 0.5.

nlambda

int, number of lambda1 values to try in CV

nlambda_graph

int, number of lambda_graph values to try in CV

lambda_graph_seq

numeric vector, specific lambda_graph values to try. If NULL, auto-generated.

normalize_laplacian

logical, use normalized Laplacian. Default FALSE.

nfolds

int, number of CV folds

standardize

logical, standardize X before fitting

verbose

logical, print progress

seed

int, random seed

par

logical, use parallel processing. Default FALSE.

n.cores

int, number of cores for parallel processing. Default NULL (auto-detect).

Value

isotwas_model object containing:

  • transcripts: list of transcript_model objects

  • best_lambda1: optimal L1 penalty

  • best_lambda_graph: optimal graph penalty

  • laplacian: the Laplacian matrix used

Details

The optimization problem is: $$\min_B \frac{1}{2n}||Y - XB||_F^2 + \lambda_1 ||B||_1 + \frac{\lambda_g}{2} \text{tr}(B^T B L)$$

where L is the graph Laplacian computed from the similarity matrix: $$L = D - S$$ with D being the degree matrix (diagonal with row sums of S).

The graph penalty encourages isoforms with high similarity to have similar coefficient vectors across SNPs.

The similarity matrix encodes prior knowledge about isoform relationships. For isoform-level TWAS, this is typically derived from shared exon structure: isoforms sharing more exons are expected to have more similar cis-regulatory effects, as they share more of the same genetic signal.

When par=TRUE, cross-validation is parallelized over all (lambda_graph, fold) combinations, providing up to nlambda_graph * nfolds parallel tasks.