cosifer.utils package

Submodules

cosifer.utils.data module

Data utilities.

cosifer.utils.data.get_synthetic_data(n_samples, n_features, precision_matrix=None, alpha=0.98, seed=1)

Generate synthetic data using a covariance matrix obtained by inverting a randomly generated precision matrix.

Parameters
  • n_samples ([type]) – [description]

  • n_features ([type]) – [description]

  • precision_matrix ([type], optional) – [description]. Defaults to None.

  • alpha (float, optional) – [description]. Defaults to 0.98.

  • seed (int, optional) – [description]. Defaults to 1.

Returns

a tuple with two elements. The first is a pd.DataFrame

represeting the data. The second is the precision matrix used to generate the data.

Return type

tuple

cosifer.utils.data.read_data(filepath, standardize=True, samples_on_rows=True, sep='\t', fillna=0.0, **kwargs)

Read data from file.

Parameters
  • filepath (str) – path to the file.

  • standardize (bool, optional) – toggle data standardization. Defaults to True.

  • samples_on_rows (bool, optional) – flag to indicate whether data are following the format where each row represents a sample. Defaults to True.

  • sep (str, optional) – field separator. Defaults to ‘ ‘.

  • fillna (float, optional) – value used to fill NAs. Defaults to 0.

Returns

a dataframe parsed from the provided filepath.

Return type

pd.DataFrame

cosifer.utils.data.read_gmt(filepath)

Read a GMT file.

Parameters

filepath (str) – path to a GMT file.

Returns

a dictionary containing sets of features.

Return type

dict

cosifer.utils.data.scale_graph(graph, threshold=0.0)

Min-max scale a matrix representing a graph assuming poitive values.

Parameters
  • graph (pd.DataFrame) – a dataframe representing a graph.

  • threshold (float, optional) – threshold to impose on the edge weights. Defaults to .0.

Returns

a dataframe representing the scaled graph.

Return type

pd.DataFrame

cosifer.utils.stats module

Statistics utils.

cosifer.utils.stats.benjamini_hochberg_correction(p_values, q_star)

Return indices of pValues that make reject null hypothesis at given significance level with a Benjamini-Hochberg correction. Used implementation robust to nan values through statsmodels.

Parameters
  • p_values (iterable) – p-values to be used for correction.

  • q_star (float) – false discovery rate.

Returns

indices of significant p-values.

Return type

list

cosifer.utils.stats.benjamini_yekutieli_correction(p_values, q_star)

Return indices of pValues that make reject null hypothesis at given significance level with a Benjamini-Yekutieli correction. Used implementation robust to nan values through statsmodels.

Parameters
  • p_values (iterable) – p-values to be used for correction.

  • q_star (float) – false discovery rate.

Returns

indices of significant p-values.

Return type

list

cosifer.utils.stats.bonferroni_correction(p_values, q_star)

Return indices of pValues that make reject null hypothesis at given significance level with a Bonferroni correction. Used implementation robust to nan values through statsmodels.

Parameters
  • p_values (iterable) – p-values to be used for correction.

  • q_star (float) – false discovery rate.

Returns

indices of significant p-values.

Return type

list

cosifer.utils.stats.from_precision_matrix_partial_correlations(precision, scaled=False)

Compute partial correlations from the precision matrix.

Parameters
  • precision (np.ndarray) – a precision matrix.

  • scaled (bool, optional) – flag to min-max scale the correlations. Defaults to False.

Returns

the partial correlation matrix.

Return type

np.ndarray

cosifer.utils.vector_quantization module

Vector quantization utilities.

cosifer.utils.vector_quantization.k_means_bic(X, clusters_centers, clusters_labels, sigma_eps=1.0)

Compute BIC for K-means clustering.

Parameters
  • X (np.ndarray) – clustered data.

  • clusters_centers (np.ndarray) – cluster centers.

  • clusters_labels (np.ndarray) – cluster labels.

  • sigma_eps (float, optional) – standard deviation. Defaults to 1..

Returns

BIC score.

Return type

float

cosifer.utils.vector_quantization.k_means_optimized_with_bic(X, k_min=3, k_max=9, k_step=1, sigma_eps=1.0, n_init=100, **kwargs)

Find an optimal K-mean model minizing the BIC score.

Parameters
  • X (np.ndarray) – data to cluster.

  • k_min (int, optional) – minimum number of clusters. Defaults to 3.

  • k_max (int, optional) – maximum number of clusters. Defaults to 9.

  • k_step (int, optional) – number of cluster steps. Defaults to 1.

  • sigma_eps (float, optional) – standard deviation. Defaults to 1..

  • n_init (int, optional) – number of K-means initializations. Defaults to 100.

Returns

a tuple containing two elements: the first is the optimal model,

the second one is a dictionary mapping the number of clusters to the BIC score.

Return type

tuple

cosifer.utils.vector_quantization.k_means_vector_quantization(x, k_min=3, k_max=9, k_step=1, sigma_eps=1.0, n_init=100, **kwargs)

Quantize a vector using K-means optimized via BIC score.

Parameters
  • x (np.ndarray) – array to quantize.

  • k_min (int, optional) – minimum number of clusters. Defaults to 3.

  • k_max (int, optional) – maximum number of clusters. Defaults to 9.

  • k_step (int, optional) – number of cluster steps. Defaults to 1.

  • sigma_eps (float, optional) – standard deviation. Defaults to 1..

  • n_init (int, optional) – number of K-means initializations. Defaults to 100.

Returns

the quantized vector.

Return type

np.ndarray

Module contents