Target factory for feature preselection based on sPLS-DA
Source:R/prefiltering.R
feature_preselection_splsda_factory.Rd
Creates a list of targets to perform feature preselection on datasets from a
MultiDataSet
object with sPLS-DA (from the mixOmics
package).
Usage
feature_preselection_splsda_factory(
mo_data_target,
group,
to_keep_ns,
to_keep_props = NULL,
target_name_prefix = "",
filtered_set_target_name = NULL,
multilevel = NULL,
seed_perf = NULL,
seed_run = NULL,
...
)
Arguments
- mo_data_target
Symbol, the name of the target containing the
MultiDataSet
object.- group
Character, the column name in the samples information data-frame to use as samples group.
- to_keep_ns
Named integer vector, the number of feature to retain in each dataset to be prefiltered (names should correspond to a dataset name). Value should be less than the number of features in the corresponding dataset. Set to
NULL
in order to useto_keep_props
instead.- to_keep_props
Named numeric vector, the proportion of features to retain in each dataset to be prefiltered (names should correspond to a dataset name). Value should be > 0 and < 1. Will be ignored if
to_keep_ns
is notNULL
.- target_name_prefix
Character, a prefix to add to the name of the targets created by this target factory. Default value is
""
.- filtered_set_target_name
Character, the name of the final target containing the filtered
MultiDataSet
object. If NULL, a name will automatically be supplied. Default value isNULL
.- multilevel
Character vector of length 1 or 3 to be used as information about repeated measurements. See
get_input_splsda()
for details. Default value isNULL
(no repeated measurements).- seed_perf
Named integer vector, the seed to use for the
perf_splsda()
function for each dataset. The length and names should match those ofto_keep_ns
orto_keep_props
. If not named, the values will be used in order of the datasets into_keep_ns
orto_keep_props
. Default value isNULL
, i.e. no seed is set.- seed_run
Named integer vector, the seed to use for the
run_splsda()
function for each dataset. The length and names should match those ofto_keep_ns
orto_keep_props
. If not named, the values will be used in order of the datasets into_keep_ns
orto_keep_props
. Default value isNULL
, i.e. no seed is set.- ...
Further arguments passed to the perf_splsda function.
Value
A list of target objects. With target_name_prefix = ""
and
filtered_set_target_name = NULL
, the following targets are created:
splsda_spec
: generates a grouped tibble where each row corresponds to one dataset to be filtered, with the columns specifying each dataset name, and associated values fromto_keep_ns
andto_keep_props
.individual_splsda_input
: a dynamic branching target that runs theget_input_splsda()
function for each dataset.
individual_splsda_perf
: a dynamic branching target that runs theperf_splsda()
function for each dataset.individual_splsda_run
: a dynamic branching target that runs therun_splsda()
function for each dataset, using the results fromindividual_splsda_perf
to guide the number of latent components to construct.filtered_set_slpsda
: a target to retain from the originalMultiDataSet
object only features selected in each sPLS-DA run.
Examples
if (FALSE) { # \dontrun{
## in the _targets.R
library(moiraine)
list(
## add code here to load the different datasets
## the following target creates a MultiDataSet object from previously
## created omics sets (geno_set, trans_set, etc)
tar_target(
mo_set,
create_multiomics_set(geno_set, trans_set, metabo_set, pheno_set)
),
feature_preselection_splsda_factory(
mo_set,
group = "outcome_group",
to_keep_ns = c("rnaseq" = 1000, "metabolome" = 500),
filtered_set_target_name = "mo_set_filtered",
folds = 10 ## example of an argument passed to perf_splsda
),
## Another example using to_keep_props
feature_preselection_splsda_factory(
mo_set,
group = "outcome_group",
to_keep_ns = NULL,
to_keep_props = c("rnaseq" = 0.3, "metabolome" = 0.5),
filtered_set_target_name = "mo_set_filtered",
folds = 10 ## example of an argument passed to perf_splsda
)
)
} # }