Pipeline Functions
Lower-level functions used by the workflow classes.
Quality Control and Normalization
sc_QC
sc_QC(
X: DataFrame,
min_lib_size: float = 1000,
remove_outlier_cells: bool = True,
min_percent: float = 0.05,
max_mito_ratio: float = 0.1,
min_exp_avg: float = 0,
min_exp_sum: float = 0,
) -> pd.DataFrame
main QC function in scTenifold pipelines
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
DataFrame
|
A single-cell RNAseq DataFrame (rows: genes, cols: cells) |
required |
min_lib_size
|
float
|
Minimum library size of cells |
1000
|
remove_outlier_cells
|
bool
|
Whether the QC function will remove the outlier cells |
True
|
min_percent
|
float
|
Minimum fraction of cells where the gene needs to be expressed to be included in the analysis. |
0.05
|
max_mito_ratio
|
float
|
Maximum mitochondrial genes ratio included in the final df |
0.1
|
min_exp_avg
|
float
|
Minimum average expression value in each gene |
0
|
min_exp_sum
|
float
|
Minimum sum of expression value in each gene |
0
|
Returns:
| Name | Type | Description |
|---|---|---|
X_modified |
DataFrame
|
The DataFrame after QC |
Source code in scTenifold/core/_QC.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
cpm_norm
cpm_norm(
X: Union[ndarray, DataFrame],
) -> Union[np.ndarray, pd.DataFrame]
Counts-per-million normalize a genes-by-cells matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
Union[ndarray, DataFrame]
|
Genes-by-cells count matrix. |
required |
Returns:
| Type | Description |
|---|---|
Normalized matrix with the same type and shape as ``X``.
|
|
Source code in scTenifold/core/_norm.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
Tensor Decomposition
tensor_decomp
tensor_decomp(
networks: ndarray,
gene_names: Sequence[str],
method: str = "parafac",
n_decimal: int = 1,
K: int = 5,
tol: float = 1e-06,
max_iter: int = 1000,
random_state: int = 42,
**kwargs,
) -> pd.DataFrame
Perform tensor decomposition on pc networks
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
networks
|
ndarray
|
Concatenated network, expected shape = (n_genes, n_genes, n_pcnets) |
required |
gene_names
|
Sequence[str]
|
The name of each gene in the network (order matters) |
required |
method
|
str
|
Tensor decomposition method, tensorly's decomposition method was used: http://tensorly.org/stable/modules/api.html#module-tensorly.decomposition |
'parafac'
|
n_decimal
|
int
|
Number of decimal in the final df |
1
|
K
|
int
|
Rank in parafac function |
5
|
tol
|
float
|
Tolerance in the iteration |
1e-06
|
max_iter
|
int
|
Number of interation |
1000
|
random_state
|
int
|
Random seed used to reproduce the same result |
42
|
**kwargs
|
Keyword arguments used in the decomposition function |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
tensor_decomp_df |
DataFrame
|
The result of tensor decomposition, expected shape = (n_genes, n_genes) |
References
http://tensorly.org/stable/modules/api.html#module-tensorly.decomposition
Source code in scTenifold/core/_decomposition.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
Knockout Internals
ko_propagation
ko_propagation(
B: ndarray,
x: ndarray,
ko_gene_id: Sequence[int],
degree: int = 1,
) -> np.ndarray
Propagate a gene knockout through an adjacency matrix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
B
|
ndarray
|
Adjacency matrix (genes x genes); diagonal is zeroed in-place on a copy. |
required |
x
|
ndarray
|
Expression matrix (genes x cells). |
required |
ko_gene_id
|
Sequence[int]
|
Index of the gene to knock out. |
required |
degree
|
int
|
Maximum propagation depth. |
1
|
Returns:
| Type | Description |
|---|---|
Knocked-out expression matrix (non-negative).
|
|
Source code in scTenifold/core/_ko.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |
reconstruct_pcnets
reconstruct_pcnets(
nets: List[coo_matrix],
X_df: DataFrame,
ko_gene_id: Sequence[int],
degree: int = 1,
**kwargs,
) -> List[np.ndarray]
Rebuild PC networks from knocked-out expression for each input net.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nets
|
List[coo_matrix]
|
PC networks (sparse) used to seed propagation. |
required |
X_df
|
DataFrame
|
Expression DataFrame (genes x cells). |
required |
ko_gene_id
|
Sequence[int]
|
Index of the gene to knock out. |
required |
degree
|
int
|
Propagation depth passed to :func: |
1
|
**kwargs
|
Forwarded to :func: |
{}
|
Returns:
| Type | Description |
|---|---|
List of post-knockout PC networks.
|
|
Source code in scTenifold/core/_ko.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | |