cosifer.utils package¶
Submodules¶
cosifer.utils.data module¶
Data utilities.
-
cosifer.utils.data.
get_synthetic_data
(n_samples, n_features, precision_matrix=None, alpha=0.98, seed=1)¶ Generate synthetic data using a covariance matrix obtained by inverting a randomly generated precision matrix.
- Parameters
n_samples ([type]) – [description]
n_features ([type]) – [description]
precision_matrix ([type], optional) – [description]. Defaults to None.
alpha (float, optional) – [description]. Defaults to 0.98.
seed (int, optional) – [description]. Defaults to 1.
- Returns
- a tuple with two elements. The first is a pd.DataFrame
represeting the data. The second is the precision matrix used to generate the data.
- Return type
tuple
-
cosifer.utils.data.
read_data
(filepath, standardize=True, samples_on_rows=True, sep='\t', fillna=0.0, **kwargs)¶ Read data from file.
- Parameters
filepath (str) – path to the file.
standardize (bool, optional) – toggle data standardization. Defaults to True.
samples_on_rows (bool, optional) – flag to indicate whether data are following the format where each row represents a sample. Defaults to True.
sep (str, optional) – field separator. Defaults to ‘ ‘.
fillna (float, optional) – value used to fill NAs. Defaults to 0.
- Returns
a dataframe parsed from the provided filepath.
- Return type
pd.DataFrame
-
cosifer.utils.data.
read_gmt
(filepath)¶ Read a GMT file.
- Parameters
filepath (str) – path to a GMT file.
- Returns
a dictionary containing sets of features.
- Return type
dict
-
cosifer.utils.data.
scale_graph
(graph, threshold=0.0)¶ Min-max scale a matrix representing a graph assuming poitive values.
- Parameters
graph (pd.DataFrame) – a dataframe representing a graph.
threshold (float, optional) – threshold to impose on the edge weights. Defaults to .0.
- Returns
a dataframe representing the scaled graph.
- Return type
pd.DataFrame
cosifer.utils.stats module¶
Statistics utils.
-
cosifer.utils.stats.
benjamini_hochberg_correction
(p_values, q_star)¶ Return indices of pValues that make reject null hypothesis at given significance level with a Benjamini-Hochberg correction. Used implementation robust to nan values through statsmodels.
- Parameters
p_values (iterable) – p-values to be used for correction.
q_star (float) – false discovery rate.
- Returns
indices of significant p-values.
- Return type
list
-
cosifer.utils.stats.
benjamini_yekutieli_correction
(p_values, q_star)¶ Return indices of pValues that make reject null hypothesis at given significance level with a Benjamini-Yekutieli correction. Used implementation robust to nan values through statsmodels.
- Parameters
p_values (iterable) – p-values to be used for correction.
q_star (float) – false discovery rate.
- Returns
indices of significant p-values.
- Return type
list
-
cosifer.utils.stats.
bonferroni_correction
(p_values, q_star)¶ Return indices of pValues that make reject null hypothesis at given significance level with a Bonferroni correction. Used implementation robust to nan values through statsmodels.
- Parameters
p_values (iterable) – p-values to be used for correction.
q_star (float) – false discovery rate.
- Returns
indices of significant p-values.
- Return type
list
-
cosifer.utils.stats.
from_precision_matrix_partial_correlations
(precision, scaled=False)¶ Compute partial correlations from the precision matrix.
- Parameters
precision (np.ndarray) – a precision matrix.
scaled (bool, optional) – flag to min-max scale the correlations. Defaults to False.
- Returns
the partial correlation matrix.
- Return type
np.ndarray
cosifer.utils.vector_quantization module¶
Vector quantization utilities.
-
cosifer.utils.vector_quantization.
k_means_bic
(X, clusters_centers, clusters_labels, sigma_eps=1.0)¶ Compute BIC for K-means clustering.
- Parameters
X (np.ndarray) – clustered data.
clusters_centers (np.ndarray) – cluster centers.
clusters_labels (np.ndarray) – cluster labels.
sigma_eps (float, optional) – standard deviation. Defaults to 1..
- Returns
BIC score.
- Return type
float
-
cosifer.utils.vector_quantization.
k_means_optimized_with_bic
(X, k_min=3, k_max=9, k_step=1, sigma_eps=1.0, n_init=100, **kwargs)¶ Find an optimal K-mean model minizing the BIC score.
- Parameters
X (np.ndarray) – data to cluster.
k_min (int, optional) – minimum number of clusters. Defaults to 3.
k_max (int, optional) – maximum number of clusters. Defaults to 9.
k_step (int, optional) – number of cluster steps. Defaults to 1.
sigma_eps (float, optional) – standard deviation. Defaults to 1..
n_init (int, optional) – number of K-means initializations. Defaults to 100.
- Returns
- a tuple containing two elements: the first is the optimal model,
the second one is a dictionary mapping the number of clusters to the BIC score.
- Return type
tuple
-
cosifer.utils.vector_quantization.
k_means_vector_quantization
(x, k_min=3, k_max=9, k_step=1, sigma_eps=1.0, n_init=100, **kwargs)¶ Quantize a vector using K-means optimized via BIC score.
- Parameters
x (np.ndarray) – array to quantize.
k_min (int, optional) – minimum number of clusters. Defaults to 3.
k_max (int, optional) – maximum number of clusters. Defaults to 9.
k_step (int, optional) – number of cluster steps. Defaults to 1.
sigma_eps (float, optional) – standard deviation. Defaults to 1..
n_init (int, optional) – number of K-means initializations. Defaults to 100.
- Returns
the quantized vector.
- Return type
np.ndarray