Skip to contents

Computes the optimal number of features/groups to keep for each joint component for an sO2PLS run. Directly copied from the OmicsPLS::crossval_sparsity() function, but improved the output for plotting purposes.

Usage

so2pls_crossval_sparsity(
  omicspls_input,
  n,
  nx,
  ny,
  nr_folds = 10,
  keepx_seq = NULL,
  keepy_seq = NULL,
  groupx = NULL,
  groupy = NULL,
  tol = 1e-10,
  max_iterations = 100,
  seed = NULL
)

Arguments

omicspls_input

A named list of length 2, produced by get_input_omicspls().

n

Integer, number of joint PLS components. Must be positive.

nx

Integer, number of orthogonal components in X. Negative values are interpreted as 0.

ny

Integer, number of orthogonal components in Y. Negative values are interpreted as 0.

nr_folds

integer, number of folds for the cross-validation. Default value is 10.

keepx_seq

Numeric vector, how many features/groups to keep for cross-validation in each of the joint components of X. Sparsity of each joint component will be selected sequentially.

keepy_seq

Numeric vector, how many features/groups to keep for cross-validation in each of the joint components of Y. Sparsity of each joint component will be selected sequentially.

groupx

Character vector, group name of each X-feature. Its length must be equal to the number of features in X. The order of the group names must corresponds to the order of the features. If NULL, no groups are considered. Default value is NULL.

groupy

Character vector, group name of each Y-feature. Its length must be equal to the number of features in Y. The order of the group names must corresponds to the order of the features. If NULL, no groups are considered. Default value is NULL.

tol

Numeric, threshold for which the NIPALS method is deemed converged. Must be positive. Default value is 1e-10.

max_iterations

Integer, maximum number of iterations for the NIPALS method.

seed

Integer, seed to use. Default is NULL, i.e. no seed is set inside the function.

Value

A list with the following elements:

  • Best: a vector giving for each join component the number of features to keep from X and Y that yield the highest covariance between the joint components of X and Y (elements x1, y1, x2, y2, etc), and the number of features to keep from X and Y yielding the highest covariance under the 1 standard error rule (elements x_1sd1, y_1sd1, x_1sd2, y_1sd2, etc).

  • Covs: a list, with as many elements as number of joint components (n). Each element is a matrix giving the average covariance between the joint components of X and Y obtained across the folds, for each tested values of keepx (columns) and of keepy (rows).

  • SEcov: a list, with as many elements as number of joint components (n). Each element is a matrix giving the standard error of the covariance between the joint components of X and Y obtained across the folds, for each tested values of keepx (columns) and of keepy (rows).