Multivariate Multi-Task LASSO — multivariate

Fits a multi-task LASSO model for multivariate isoform prediction. This method encourages joint feature selection across isoforms using L21 regularization (row sparsity on the coefficient matrix).

Usage

multivariate_mtlasso(
  X,
  Y,
  regularization = c("L21", "Trace", "Lasso"),
  lambda = NULL,
  lambda_seq = NULL,
  nlambda = 20,
  lambda_min_ratio = 0.01,
  Lam2 = 0,
  nfolds = 5,
  standardize = FALSE,
  verbose = FALSE,
  seed = 123,
  par = FALSE,
  n.cores = 1,
  backend = c("fast", "rmtl")
)

Arguments

X

matrix, design matrix of SNP dosages (n x p)

Y

matrix, matrix of isoform expression across columns (n x q)

regularization

character, type of multi-task regularization:

"L21": L2,1 norm (row sparsity) - same SNPs selected across isoforms
"Trace": trace norm (low-rank) - isoforms share latent structure (RMTL only)
"Lasso": standard lasso applied jointly (RMTL only)

lambda

numeric or NULL, regularization parameter. If NULL, selected by CV.

lambda_seq

numeric vector, sequence of lambda values to try in CV. If NULL, automatically generates a sequence.

nlambda

int, number of lambda values if lambda_seq is NULL

lambda_min_ratio

numeric, ratio of min to max lambda

Lam2

numeric, ridge penalty parameter (for RMTL backend). Default 0.

nfolds

int, number of CV folds

standardize

logical, standardize X before fitting. Default FALSE.

verbose

logical, print progress

seed

int, random seed

par

logical, use parallel processing for CV folds

n.cores

int, number of cores for parallel processing

backend

character, implementation to use:

"fast": Custom fast implementation optimized for shared X (default)
"rmtl": Use RMTL package (slower, but supports Trace/Lasso regularization)

Value

isotwas_model object containing:

transcripts: list of transcript_model objects with weights, R2, pvalues
best_lambda: optimal lambda from CV
regularization: type of regularization used

Details

The L21 penalty enforces that the same SNPs are selected or excluded across all isoforms, which is appropriate when SNPs are expected to have shared effects on multiple isoforms of the same gene.

The L21 regularization solves: $$\min_W \frac{1}{2n}||Y - XW||_F^2 + \lambda ||W||_{2,1}$$

where $||W||_{2,1} = \sum_j ||W_j||_2$ is the sum of L2 norms of rows, encouraging entire rows (SNPs) to be zero or non-zero together.

The fast backend exploits the shared X structure by precomputing X'X once, making it much faster than general-purpose MTL solvers.