Applies a transformation to a dataset from a MultiDataSet object
Source:R/transformation.R
transform_dataset.Rd
Applies a transformation to a dataset from a MultiDataSet
object.
Implemented transformations are: Variance Stabilising Normalisation (from the
vsn
package), Variance Stabilising Transformation (from the DESeq2
package - only for count data), and appropriate feature-wise normalisation
through the BestNormalise
package.
Usage
transform_dataset(
mo_data,
dataset,
transformation,
return_multidataset = FALSE,
return_matrix_only = FALSE,
verbose = TRUE,
log_base = 2,
pre_log_function = zero_to_half_min,
method,
...
)
Arguments
- mo_data
A
MultiDataSet-class
object.- dataset
Character, name of the dataset to transform.
- transformation
Character, transformation to be applied. Possible values are:
vsn
,vst-deseq2
,logx
best-normalize-auto
orbest-normalize-manual
. SeeDetails
.- return_multidataset
Logical, should a
MultiDataSet
object with the original data replaced by the transformed data returned? IfFALSE
, the output of the function depends onreturn_matrix_only
. Default value isFALSE
.- return_matrix_only
Logical, should only the transformed matrix be returned? If
TRUE
, the function will return a matrix. IfFALSE
, the function instead returns a list with the transformed data as well as other information relevant to the transformation. Ignored ifreturn_multidataset
isTRUE
. Default value isFALSE
.- verbose
Logical, should information about the transformation be printed? Default value is
TRUE
.- log_base
Numeric, the base with respect to which logarithms are computed. Default value is
2
. Only used iftransformation = 'logx'
.- pre_log_function
Function that will be applied to the matrix before the log transformation (e.g. to apply an offset to the values to avoid issues with zeros). Default value is the
zero_to_half_min()
function. Only used iftransformation = 'logx'
.- method
Character, if
transformation = 'best-normalize-manual'
, which normalisation method should be applied. See possible values intransform_bestNormalise_manual()
. Ignored for other transformations.- ...
Further arguments passed to the
bestNormalize::bestNormalize()
function or themethod
function from thebestNormalize
package.
Value
if
return_multidataset = TRUE
: a MultiDataSet::MultiDataSet object, in which the original data for the transformed dataset has been replaced.if
return_multidataset = FALSE
andreturn_matrix_only = TRUE
: a matrix with the transformed data.if
return_multidataset = FALSE
andreturn_matrix_only = FALSE
: a list with two elements,transformed_data
containing a matrix of transformed data, andinfo_transformation
containing information about the transformation (depends on the transformation applied).
Details
Currently implemented transformations and recommendations based on dataset type:
vsn
: Variance Stabilising normalisation, implemented in thevsn::justvsn()
function from thevsn
package. This method was originally developed for microarray intensities. This transformation is recommended for microarray, metabolome, chemical or other intensity-based datasets. In practice, applies thetransform_vsn()
function.vst-deseq2
: Variance Stabilising Transformation, implemented in theDESeq2::varianceStabilizingTransformation()
function from theDESeq2
package. This method is applicable to count data only. This transformation is recommended for RNAseq or similar count-based datasets. In practice, applies thetransform_vst()
function.logx
: log-transformation (default to log2, but base can be specified). In practice, applies thetransform_logx()
function.best-normalize-auto
: most appropriate normalisation method automatically selected from a number of options, implemented in thebestNormalize::bestNormalize()
function from thebestNormalize
package. This transformation is recommended for phenotypes that are each measured on different scales (since the transformation method selected will potentially be different across the features), preferably with a reasonable number of features (less than 100) to avoid large computation times. In practice, applies thetransform_bestNormalise_auto()
function.best-normalize-manual
: performs the same transformation (specified through themethod
argument) to each feature of a dataset. This transformation is recommended for phenotypes data in which the different phenotypes are measured on the same scale. The different normalisation methods are:"arcsinh_x"
: data is transformed aslog(x + sqrt(x^2 + 1))
;"boxcox"
: Box Cox transformation;"center_scale"
: data is centered and scaled;"exp_x"
: data is transformed asexp(x)
;"log_x"
: data is transformed aslog_b(x+a)
(a
andb
either selected automatically per variable or passed as arguments);"orderNorm"
: Ordered Quantile technique;"sqrt_x"
: data transformed assqrt(x + a)
(a
selected automatically per variable or passed as argument),"yeojohnson"
: Yeo-Johnson transformation.