Multivariate Graph-Regularized Regression
multivariate_graph_reg.RdFits a multivariate regression model with graph Laplacian regularization. This method encourages isoforms that are similar (e.g., share exons) to have similar prediction weight vectors.
Usage
multivariate_graph_reg(
X,
Y,
similarity_matrix = NULL,
lambda1 = NULL,
lambda_graph = NULL,
alpha = 0.5,
nlambda = 15,
nlambda_graph = 5,
lambda_graph_seq = NULL,
normalize_laplacian = FALSE,
nfolds = 5,
standardize = FALSE,
verbose = FALSE,
seed = 123,
par = FALSE,
n.cores = NULL
)Arguments
- X
matrix, design matrix of SNP dosages (n x p)
- Y
matrix, matrix of isoform expression across columns (n x q)
- similarity_matrix
matrix, q x q symmetric matrix of pairwise isoform similarities (e.g., Jaccard index of shared exons). Values should be >= 0. If NULL, uses identity (no graph regularization, equivalent to lasso).
- lambda1
numeric or NULL, L1 penalty parameter. If NULL, selected by CV.
- lambda_graph
numeric or NULL, graph penalty parameter. If NULL, selected by CV.
- alpha
numeric, elastic net mixing (0 = ridge, 1 = lasso). Default 0.5.
- nlambda
int, number of lambda1 values to try in CV
- nlambda_graph
int, number of lambda_graph values to try in CV
- lambda_graph_seq
numeric vector, specific lambda_graph values to try. If NULL, auto-generated.
- normalize_laplacian
logical, use normalized Laplacian. Default FALSE.
- nfolds
int, number of CV folds
- standardize
logical, standardize X before fitting
- verbose
logical, print progress
- seed
int, random seed
- par
logical, use parallel processing. Default FALSE.
- n.cores
int, number of cores for parallel processing. Default NULL (auto-detect).
Value
isotwas_model object containing:
transcripts: list of transcript_model objects
best_lambda1: optimal L1 penalty
best_lambda_graph: optimal graph penalty
laplacian: the Laplacian matrix used
Details
The optimization problem is: $$\min_B \frac{1}{2n}||Y - XB||_F^2 + \lambda_1 ||B||_1 + \frac{\lambda_g}{2} \text{tr}(B^T B L)$$
where L is the graph Laplacian computed from the similarity matrix: $$L = D - S$$ with D being the degree matrix (diagonal with row sums of S).
The graph penalty encourages isoforms with high similarity to have similar coefficient vectors across SNPs.
The similarity matrix encodes prior knowledge about isoform relationships. For isoform-level TWAS, this is typically derived from shared exon structure: isoforms sharing more exons are expected to have more similar cis-regulatory effects, as they share more of the same genetic signal.
When par=TRUE, cross-validation is parallelized over all (lambda_graph, fold) combinations, providing up to nlambda_graph * nfolds parallel tasks.