Tutorial 1: Data and Inputs¶
scTenifoldpy expects expression data as genes x cells. The public API accepts either a pandas.DataFrame in that shape or an AnnData-like object with X, var_names, and obs_names.
In [1]:
Copied!
import pandas as pd
from scTenifold.data import get_test_df
x = get_test_df(n_cells=80, n_genes=120, random_state=0)
y = get_test_df(n_cells=80, n_genes=120, random_state=1)
x.shape, y.shape
import pandas as pd
from scTenifold.data import get_test_df
x = get_test_df(n_cells=80, n_genes=120, random_state=0)
y = get_test_df(n_cells=80, n_genes=120, random_state=1)
x.shape, y.shape
Out[1]:
((120, 80), (120, 80))
Rows are genes and columns are cells. Gene names are required because downstream steps align networks by shared gene names.
In [2]:
Copied!
x.index[:5].tolist(), x.columns[:5].tolist()
x.index[:5].tolist(), x.columns[:5].tolist()
Out[2]:
(['MT-1', 'MT-2', 'MT-3', 'MT-4', 'MT-5'], ['Cell-1', 'Cell-2', 'Cell-3', 'Cell-4', 'Cell-5'])
AnnData inputs¶
AnnData stores observations as rows and variables as columns, so scTenifoldpy transposes adata.X internally to genes x cells. Use layer="counts" in the high-level API to read an AnnData layer instead of adata.X.
In [3]:
Copied!
from scTenifold.core._networks import anndata_to_dataframe
class MiniAnnData:
def __init__(self, matrix, obs_names, var_names):
self.X = matrix
self.obs_names = obs_names
self.var_names = var_names
self.layers = {}
adata = MiniAnnData(x.T.to_numpy(), obs_names=x.columns, var_names=x.index)
converted = anndata_to_dataframe(adata)
converted.equals(x)
from scTenifold.core._networks import anndata_to_dataframe
class MiniAnnData:
def __init__(self, matrix, obs_names, var_names):
self.X = matrix
self.obs_names = obs_names
self.var_names = var_names
self.layers = {}
adata = MiniAnnData(x.T.to_numpy(), obs_names=x.columns, var_names=x.index)
converted = anndata_to_dataframe(adata)
converted.equals(x)
Out[3]:
True
Remote example datasets¶
fetch_data() downloads example datasets from the companion data repository. It is useful for demos, but notebook tutorials keep remote downloads optional so the docs can build offline.
In [4]:
Copied!
from scTenifold.data import list_data
# Requires network access:
# list_data()
# datasets = fetch_data("AD")
from scTenifold.data import list_data
# Requires network access:
# list_data()
# datasets = fetch_data("AD")