cosifer.utils package¶

Submodules¶

cosifer.utils.data module¶

Data utilities.

cosifer.utils.data.get_synthetic_data(n_samples, n_features, precision_matrix=None, alpha=0.98, seed=1)¶

Generate synthetic data using a covariance matrix obtained by inverting a randomly generated precision matrix.

Parameters

n_samples ([type]) – [description]
n_features ([type]) – [description]
precision_matrix ([type], optional) – [description]. Defaults to None.
alpha (float, optional) – [description]. Defaults to 0.98.
seed (int, optional) – [description]. Defaults to 1.

Returns

a tuple with two elements. The first is a pd.DataFrame: represeting the data. The second is the precision matrix used to generate the data.

Return type

tuple

cosifer.utils.data.read_data(filepath, standardize=True, samples_on_rows=True, sep='\t', fillna=0.0, **kwargs)¶

Read data from file.

Parameters

filepath (str) – path to the file.
standardize (bool, optional) – toggle data standardization. Defaults to True.
samples_on_rows (bool, optional) – flag to indicate whether data are following the format where each row represents a sample. Defaults to True.
sep (str, optional) – field separator. Defaults to ‘ ‘.
fillna (float, optional) – value used to fill NAs. Defaults to 0.

Returns

a dataframe parsed from the provided filepath.

Return type

pd.DataFrame

cosifer.utils.data.read_gmt(filepath)¶

Read a GMT file.

Parameters: filepath (str) – path to a GMT file.
Returns: a dictionary containing sets of features.
Return type: dict

cosifer.utils.data.scale_graph(graph, threshold=0.0)¶

Min-max scale a matrix representing a graph assuming poitive values.

Parameters

graph (pd.DataFrame) – a dataframe representing a graph.
threshold (float, optional) – threshold to impose on the edge weights. Defaults to .0.

Returns

a dataframe representing the scaled graph.

Return type

pd.DataFrame

cosifer.utils.stats module¶

Statistics utils.

cosifer.utils.stats.benjamini_hochberg_correction(p_values, q_star)¶

Return indices of pValues that make reject null hypothesis at given significance level with a Benjamini-Hochberg correction. Used implementation robust to nan values through statsmodels.

Parameters

p_values (iterable) – p-values to be used for correction.
q_star (float) – false discovery rate.

Returns

indices of significant p-values.

Return type

list

cosifer.utils.stats.benjamini_yekutieli_correction(p_values, q_star)¶

Return indices of pValues that make reject null hypothesis at given significance level with a Benjamini-Yekutieli correction. Used implementation robust to nan values through statsmodels.

Parameters

p_values (iterable) – p-values to be used for correction.
q_star (float) – false discovery rate.

Returns

indices of significant p-values.

Return type

list

cosifer.utils.stats.bonferroni_correction(p_values, q_star)¶

Return indices of pValues that make reject null hypothesis at given significance level with a Bonferroni correction. Used implementation robust to nan values through statsmodels.

Parameters

p_values (iterable) – p-values to be used for correction.
q_star (float) – false discovery rate.

Returns

indices of significant p-values.

Return type

list

cosifer.utils.stats.from_precision_matrix_partial_correlations(precision, scaled=False)¶

Compute partial correlations from the precision matrix.

Parameters

precision (np.ndarray) – a precision matrix.
scaled (bool, optional) – flag to min-max scale the correlations. Defaults to False.

Returns

the partial correlation matrix.

Return type

np.ndarray

cosifer.utils.vector_quantization module¶

Vector quantization utilities.

cosifer.utils.vector_quantization.k_means_bic(X, clusters_centers, clusters_labels, sigma_eps=1.0)¶

Compute BIC for K-means clustering.

Parameters

X (np.ndarray) – clustered data.
clusters_centers (np.ndarray) – cluster centers.
clusters_labels (np.ndarray) – cluster labels.
sigma_eps (float, optional) – standard deviation. Defaults to 1..

Returns

BIC score.

Return type

float

cosifer.utils.vector_quantization.k_means_optimized_with_bic(X, k_min=3, k_max=9, k_step=1, sigma_eps=1.0, n_init=100, **kwargs)¶

Find an optimal K-mean model minizing the BIC score.

Parameters

X (np.ndarray) – data to cluster.
k_min (int, optional) – minimum number of clusters. Defaults to 3.
k_max (int, optional) – maximum number of clusters. Defaults to 9.
k_step (int, optional) – number of cluster steps. Defaults to 1.
sigma_eps (float, optional) – standard deviation. Defaults to 1..
n_init (int, optional) – number of K-means initializations. Defaults to 100.

Returns

a tuple containing two elements: the first is the optimal model,: the second one is a dictionary mapping the number of clusters to the BIC score.

Return type

tuple

cosifer.utils.vector_quantization.k_means_vector_quantization(x, k_min=3, k_max=9, k_step=1, sigma_eps=1.0, n_init=100, **kwargs)¶

Quantize a vector using K-means optimized via BIC score.

Parameters

x (np.ndarray) – array to quantize.
k_min (int, optional) – minimum number of clusters. Defaults to 3.
k_max (int, optional) – maximum number of clusters. Defaults to 9.
k_step (int, optional) – number of cluster steps. Defaults to 1.
sigma_eps (float, optional) – standard deviation. Defaults to 1..
n_init (int, optional) – number of K-means initializations. Defaults to 100.

Returns

the quantized vector.

Return type

np.ndarray

cosifer.utils package¶

Submodules¶

cosifer.utils.data module¶

cosifer.utils.stats module¶

cosifer.utils.vector_quantization module¶

Module contents¶