Skip to contents

This function calculates Pearson/Spearman correlations between all pairs of features in a matrix/dataframe much faster than the base R cor function. It can also calculate correlations between all pairs of features from two input matrices/dataframes when `data2` is provided. It is also possible to simultaneously calculate mutual rank (MR) of correlations as well as their p-values and adjusted p-values. Additionally, this function can automatically combine and flatten the result matrices. Selecting correlated features using an MR-based threshold rather than based on their correlation coefficients or an arbitrary p-value is more efficient and accurate in inferring functional associations in systems, for example in gene regulatory networks.

Usage

fcor(
  data,
  data2 = NULL,
  na_to_zero = TRUE,
  method = "spearman",
  mutualRank = TRUE,
  mutualRank_mode = "unsigned",
  pvalue = FALSE,
  adjust = "BH",
  flat = TRUE,
  remove_self = TRUE,
  remove_duplicate_pairs = TRUE
)

Arguments

data

a numeric dataframe/matrix with features on columns and samples/observations on rows. If `data2` is not provided, correlations are calculated between all pairs of features in `data`.

data2

an optional numeric dataframe/matrix with features on columns and samples/observations on rows. If provided, correlations are calculated between all features in `data` and all features in `data2`. `data` and `data2` must have the same number of rows, and the rows must correspond to the same samples/observations in the same order. Default is `NULL`.

na_to_zero

logical, whether to convert NAs to 0 in the output (default) or not.

method

a character string indicating which correlation coefficient is to be computed. One of `"pearson"` or `"spearman"` (default).

mutualRank

logical, whether to calculate mutual ranks of correlations or not.

mutualRank_mode

a character string indicating whether to rank based on `"signed"` or `"unsigned"` (default) correlation values. In the `"unsigned"` mode, only the level of a correlation value is important and not its sign; therefore, the function ranks the absolute values of correlations. Options are `"unsigned"` and `"signed"`.

pvalue

logical, whether to calculate p-values of correlations or not.

adjust

p-value correction method when `pvalue = TRUE`, a character string including any of `"BH"` (default), `"bonferroni"`, `"holm"`, `"hochberg"`, `"hommel"`, or `"none"`.

flat

logical, whether to combine and flatten the result matrices or not.

remove_self

logical, whether to remove self-correlations from the flattened output when `data2` is provided. This is useful when `data2` contains some or all of the same features as `data`. Default is `TRUE`.

remove_duplicate_pairs

logical, whether to remove duplicate undirected feature pairs from the flattened output when `data2` is provided. This is useful when `data2` contains the same features as `data`, because pairs such as `geneA-geneB` and `geneB-geneA` may otherwise both be returned. Default is `TRUE`.

Value

Depending on the input data and the value of `flat`, a dataframe or list including `cor` correlation coefficients, `mr` mutual ranks of correlation coefficients, `p` p-values of correlation coefficients, and `p.adj` adjusted p-values. If `data2` is not provided and `flat = TRUE`, the flattened output contains the upper triangle of the all-pairs correlation matrix. If `data2` is provided and `flat = TRUE`, the flattened output contains feature pairs between `data` and `data2`.

Details

When `data2 = NULL`, the function performs the standard all-pairs correlation analysis among the features of `data`. When `data2` is provided, the function performs a rectangular correlation analysis between the features of `data` and the features of `data2`.

For Spearman correlation with `data2`, the two input matrices are internally combined before rank transformation so that feature-wise ranks are calculated consistently across the same samples/observations.

When `mutualRank = TRUE` and `data2` is provided, the calculated MR values are based on the rectangular correlation space between `data` and `data2`. Therefore, these MR values are not necessarily identical to MR values obtained from a full all-pairs correlation matrix followed by post hoc filtering.

Examples

if (FALSE) { # \dontrun{
set.seed(1234)

# All-pairs correlation among features
data <- datasets::attitude
cor <- fcor(data = data)

# Correlation between two sets of features
data1 <- mtcars[, 1:4]
data2 <- mtcars[, 5:11]
cor_rect <- fcor(data = data1, data2 = data2)

# Correlation between selected features and all features
selected_data <- mtcars[, 1:4]
all_data <- mtcars
cor_selected_all <- fcor(data = selected_data, data2 = all_data)
} # }