metatime package
Submodules
metatime.annotator module
Automatic annotator for tumor single cell data. # #
- metatime.annotator.annotator(dat: Union[pandas.core.frame.DataFrame, anndata._core.anndata.AnnData], mecnamedict: dict, gcol='overcluster')[source]
Annotator marking top1st enriched cell states based on MeC scores.
- Parameters
dat – dataframe with gene by cell scores and one column gcol (grouping column) e.g. leiden class or, anndata with gene by cell scores, and one column gcol in dat.obs Malignant cells shall be removed to keep tumor microenvironmental cells only, such as immune cells, fibroblasts, endothelial cells
mecnamedict – for renaming dat columns into functional names can be loaded from pre-computed tumor MeC functional annotation
gcol – for grouping cells. e.g. a column for overclustered cluster assignment.
- Returns
projmat (pd.DataFrame) – Dataframe with mec projection scores and newly added predicted label column ‘MetaTiME_’+gcol
gpred – median score of each gcol group
gpreddict – dictionary how is each gcol mapped to a label
Examples
>>> projmat, gpred, gpreddict = annotator( projmat , mecnamedict)
- metatime.annotator.overcluster(adata: anndata._core.anndata.AnnData, resolution: float = 8, random_state: int = 0, clustercol: str = 'overcluster')[source]
Overcluster single cell data to get cluster level cell state annotation
- Parameters
adata – scanpy object with adata.uns[‘neighbors’] computed. if adata.obsm[‘X_umap’] does not exist, recomputes umap coordinates. otherwise, keep the umap coordinates
resolution – clustering resolution
random_state – clustering random state
clustercol – key to add to adata.obs that records cluster assignment
- Returns
- Return type
scanpy object with clustering results.
- metatime.annotator.pdataToTable(pdata: anndata._core.anndata.AnnData, mectable: pandas.core.frame.DataFrame, gcol: str = 'overcluster')[source]
Convert projected scores to two simple pandas dataframes
- Parameters
pdata – anndata with gene by mec scores, and one column gcol in pdata.obs
mectable – for renaming dat columns into functional names can be loaded from pre-computed tumor MeC functional annotation Required columns: [‘MeC_id’, ‘Annotation’, ‘UseForCellTypeAnno’]
gcol – a column in pdata.obs for grouping cells. a column for overclustered cluster assignment.
- Returns
projmat : a pandas dataframe with mec scores and a column for grouping cells. useful for annotating cell states. mecscores: a pandas dataframe for per-cell mec scores. columns use functional annotation of mec ids
- Return type
tuple
- metatime.annotator.saveToAdata(adata: anndata._core.anndata.AnnData, projmat: pandas.core.frame.DataFrame, gcol: str = 'overcluster', ANNOTATION_ONLY: bool = False)[source]
Save annotation to adata.
- Parameters
adata (anndata.AnnData) – scRNA scanpy object
projmat (pd.DataFrame) – Dataframe with mec projection scores and newly added predicted label column ‘MetaTiME_’+gcol
gcol – A column in projmat for grouping cells. Typically a column for overclustered cluster assignment.
ANNOTATION_ONLY – Whether to only add cluster-wise annotation to anndata or Also append scores to adata.obs
- Returns
adata with per-cluster annotation column in adata.obs[[gcol]], and scores appended to adata.obs if ANNOTATION_ONLY==True
- Return type
anndata.AnnData
- metatime.annotator.saveToPdata(pdata: anndata._core.anndata.AnnData, adata: anndata._core.anndata.AnnData, projmat: pandas.core.frame.DataFrame, gcol: str = 'overcluster', BORROW_ADATA_EMBEDDING=True)[source]
Save annotation to pdata. Borrow embedding from adata for easy visualization, including adata.obsm[‘X_pca’], adata.obsm[‘X_umap’], adata.obsm[‘X_pca_harmony’]
- Parameters
pdata (anndata.AnnData) – scanpy object for per-cell projected score.
adata (anndata.AnnData) – scRNA scanpy object
projmat (pd.DataFrame) – Dataframe with mec projection scores and newly added predicted label column ‘MetaTiME_’+gcol
gcol – A column in projmat for grouping cells. Typically a column for overclustered cluster assignment.
BORROW_ADATA_EMBEDDING – Whether to borrow pca and umap embeddings from adata to write in pdata. For easy visualization.
- Returns
pdata with per-cluster annotation column in pdata.obs[[gcol]].
- Return type
anndata.AnnData
metatime.config module
metatime.loaddata module
Functions for new scRNA data processing
- metatime.loaddata.adatapp(adata_input: anndata._core.anndata.AnnData, mode: str = 'pp', random_state=42, MAX_MITO_PERCENT=5, MINGENES=500, MINCOUNTS=1000, MINCELLS=5)[source]
Pre-processing.
- Parameters
adata_input –
scanpy scRNA object if adata.X is integer , or adata has ‘counts’ layer, like from 10X data. go through standard preprocessing. remove cells with <500 genes and <1000 counts. remove genes with min cells < 5. keep mitocondrial <5%. normalize to 1e6. log. pca, umap. normalized value saved to ‘norm_data’ layer
if adata.X is continuous, such as Smartseq data remove cells with <500 genes. remove genes with <5 cells.
mode – choose from [‘pp’ , ‘umap’] pp: full pre-processing umap: preprocessing was done. only compute highly_varaible genes, pca, neighbors, umap.
random_state – umap random state
MAX_MITO_PERCENT –
MINGENES –
MINCOUNTS –
MINCELLS –
- Returns
- Return type
processed scanpy object
Examples
>>> adata = adatapp(adata )
- metatime.loaddata.add_extra_metainfo(adata, meta)[source]
Add special extra metainformation for sample datasets
- Parameters
adata (anndata.AnnData) – scanpy object with scRNA expression
meta (pd.DataFrame) – Dataframe with columns of metadata for each cell. index is cell barcode same as in adata.obs.index.
- Returns
scanpy object with new columns merged in obs.
- Return type
anndata.AnnData
Examples
>>> adata = testdata.add_extra_metainfo( adata, meta )
- metatime.loaddata.batchharmonize(adata, batchcols=[], random_state=0)[source]
Harmonize batches such as patient, sample, using Harmony. Re-calculated neighbors, umap .
- Parameters
adata – scanpy object.
batchcols – batch columns in adata.obs
random_state – umap random_state
- Returns
- Return type
adata with batch correction and neighbors,umap calcualted
- metatime.loaddata.isRawCount(ds)[source]
- Tell whether the matrix is raw count or float
if top 100 cells all mod 0, then determine it’s raw count = integer
- Parameters
ds – loom, or just a numpy matrix 2d.
- Returns
0, not raw count. 1, is raw count.
- Return type
int
- metatime.loaddata.load(file, preprocessing=False, delimiter=',')[source]
Read in a expression matrix table file as scanpy object.
- If converting from an Seurat object, table can be saved in R:
tmpdata = SeuratObject@assays$RNA@data write.csv( t( as.matrix(tmpdata) ), file = ‘data.csv’)
- Parameters
file – file path for the expression file. if suffix is .h5 or .h5ad, data is loaded using scanpy. if not, load the file in format of a table text file
preprocessing – whether to do preprocessing for the loaded data
delimiter – delimiter of the expression matrix table, if file is a text table
- Returns
loaded scRNA data in scanpy object
- Return type
adata
Examples
>>> adata = load( file, preprocessing=False )
metatime.mecmapper module
- metatime.mecmapper.annToDataFrame(adata_input, genescaling=False, layer='norm_data')[source]
Extract expression matrix from scanpy object. :param adata_input: Input scanpy object. :type adata_input: anndata.AnnData :param genescaling: Whether to z-scale extracted feature matrix. :type genescaling: bool :param layer: Layer of expression to extract from adata_input :type layer: str
- Returns
Extracted expression matrix from scanpy object.
- Return type
pd.dataframe
- metatime.mecmapper.projectMec(expr, mec, glstcol='TopGene20')[source]
- Parameters
expr – expression matrix, cell by gene.
mec – mec table with one column, two types of format accepted format 1, each row is a mec with genes , comma seperated. format 2, each row is a genes, each column is a mec, value is float
glstcol – Used only when mec is the list format format1, and glstcol is the column name to record comma separated gene list for each mec.
- Returns
- Return type
scorepd, cell by signature.
Examples
>>> scorepd = mecmapper.projectMec( df, mec )
- metatime.mecmapper.projectMecAnn(adata_input, mec, genescaling=False, sigscaling=True, addon=False, layer='norm_data', glstcol='TopGene20')[source]
Project single cell expression in AnnData to MeCs Calls: projectMeC,annToDataFrame
- Parameters
adata_input (anndata.AnnData) – Input scanpy object for gene expression
mec (pd.DataFrame) – mec table with one column, two types of format both accepted format 1, each row is a mec with genes , comma seperated. format 2, each row is a genes, each column is a mec, value is float.
genescaling (bool) – Whether to scale expression on gene level. Recommended to be False.
sigscaling (bool) – Whether to scale projected scores across cells. Recomended and default is True.
addon (bool) – Whether to keep the original adata and append the signautres in obs, or return an independent anndata (pdata) with only projected values (which saves memory).
layer (str) – Layer of expression to extract from adata_input
glstcol – Used only when mec is the list format format 1, and glstcol is the column name to record comma separated gene list for each mec.
- Returns
If addon is False, return pdata where values are MeC-projected values. If addon is True, return adata where values are same as in adata_input, but with extra obs columns.
- Return type
anndata.AnnData
Examples
>>> pdata = mecmapper.projectMecAnn(adata, mec_score_topg, sigscaling=True, genescaling=False, addon=False)
- metatime.mecmapper.projectModuleAnn_aucell(adata, module, glstcol='TopGene20')[source]
Alternative function that projects scRNA data using top genes from MeCs and AUCell module has to be list mode.
- Parameters
adata (anndata.AnnData) – Input scanpy object for gene expression
module (pd.DataFrame) – mec table with one column, two types of format accepted format 1, each row is a mec with genes , comma seperated.
glstcol – Used only when mec is the list format format 1, and glstcol is the column name to record comma separated gene list for each mec.
- Returns
adata with aucell score stored in extra columns starting with ‘score_sig_’ in adata.obs
- Return type
anndata.AnnData
metatime.mecs module
## class and functions for meta components
- class metatime.mecs.MetatimeMecs(mec_score, mec_topg, mec_anno)[source]
Bases:
objectClass for MetaTiME Mecs.
- Parameters
modelpath – directory of model files
- Variables
mecscore – pandas dataframe, z-weights gene by component
mectopg – pandas dataframe, list for top gene only
mecanno – pandas dataframe, annotation for each mec.
- property feature: numpy.ndarray
get genes covered in the mec z-weight matrix
- static load_mec_precomputed(mecDIR='/home/docs/checkouts/readthedocs.org/user_builds/metatime/checkouts/latest/metatime/pretrained/mec/')[source]
Load pre-computed Meta-component matrix and ordered list. Look for precomputed files in mecDIR
MeC_allgene_average-weights.tsv MeC_topgene.tsv
- Parameters
mecDIR – Path for model files
- Returns
Loaded MeCs
- Return type
Examples
>>> mecmodel = mecs.MetatimeMecs.load_mec_precomputed()
- property nmecs: float
get number of meta-components
- metatime.mecs.getmecnamedict_ct(mectable, only_include_mecs_UseForCellStateAnno=True)[source]
Collect list of meta components to be used for cell state annotation.
- Parameters
mectable – must have UseForCellStateAnno column with 0 or 1. 1:used in cell state annotation. this is helpful to remove pan-cell cell state like general mitochondrial activity component.
only_include_mecs_UseForCellStateAnno – filter mecs used for cell state annotation. Default: True
- Returns
a subsetted dictionary only containing MeCs used for cell state enrichment.
- Return type
dict
- metatime.mecs.load_mecname(mode='table', mecDIR='/home/docs/checkouts/readthedocs.org/user_builds/metatime/checkouts/latest/metatime/pretrained/mec/')[source]
Load functional annotation for pre-computed tumor microenvironemnt MeCs
- Parameters
mode – choose from [‘mecnamedict’, ‘table’, ‘meciddict’] load manual assigned name for easy understanding of assigned names from file: MeC_anno_name.tsv under mecDIR. Required columns: [‘MeC_id’, ‘Annotation’, ‘UseForCellStateAnno’] Required seperator: tab Annotation column NA will be filtered.
- Returns
- Return type
functional annotation in desided format
metatime.plmapper module
# function for plotting after mapper
- class metatime.plmapper.MidpointNormalize(vmin=None, vmax=None, midpoint=None, clip=False)[source]
Bases:
matplotlib.colors.NormalizeNormalise the colorbar separately for above-midpoint and below-midpoint. Useful for visualizing the signature continuum with diverging colorbar. borrowed from [chris35wills](http://chris35wills.github.io/matplotlib_diverging_colorbar/)
- Parameters
colors.Normalize –
midpoint=0 (e.g.) –
vmin=vmin –
vmax=vmax –
Examples
# Pass kwargs to scatterplot >>> kwargs={‘color_map’:’RdBu_r’, ‘norm’: MidpointNormalize(midpoint=0,vmin=vmin,vmax=vmax) }
# note:
- metatime.plmapper.gen_mpl_labels(adata, groupby, exclude=(), ax=None, adjust_kwargs=None, text_kwargs=None)[source]
Get locations of cluster median . Borrowed from scanpy github forum.
- metatime.plmapper.plot_annotation_on_data(sdata, COL='MetaTiME', title=None, fontsize=8, MIN_CELL=5)[source]
Plot annotated cells with non-overlapping fonts. Calls: gen_mpl_labels Occasionally, if sdata.uns[‘MetaTiME_overcluster_colors’] exists, color may not be updated. In that case, resetting color del pdata.uns[‘MetaTiME_overcluster_colors’]
- Parameters
sdata : anndata.AnnData scanpy object for single cell , or scanpy object with projected signature.
- COLstr
column of cluster assignment in sdata.obs
- MIN_CELL: int
Minimum number of cells to plot and mark
Matplotlib figure.
Examples
>>> plmapper.plot_annotation_on_data(pdata, title = 'MetaTiME')
- metatime.plmapper.plot_umap_mec(pdata: anndata._core.anndata.AnnData, meccol: str, mecnamedict: dict, use_MeC_name: bool = True, figfile: Optional[str] = None, figsize=(3, 3))[source]
Plot signature continuum for a specific MeC.
- Parameters
pdata (anndata.AnnData) – scanpy object with projection score matrix in pdata.X
meccol – MeC id
mecnamedict –
- metatime.plmapper.plot_umap_mecproj_2condition(pdata, meccol, mecnamedict, cellscond1, cellscond2, use_MeC_name=True, figfile='../tmp.png')[source]
Pplotting comparison panel for pdata format, beta version.
Examples
`Xproj = pdata.to_df() use_MeC_name = True mec_col =’score_1’ mec_name = mecnamedict[ mec_col ] condcol = ‘response’ # cells from 2 conditions allcellscond1= adata[adata.obs[ condcol ].isin([‘Non-responder’]), ].obs.index allcellscond2= adata[adata.obs[ condcol ].isin([‘Responder’]), ].obs.index
- plmapper.plot_umap_mecproj_2condition( pdata, meccol = ‘score_0’, mecnamedict = mecnamedict,
cellscond1 = allcellscond1, cellscond2 = allcellscond2, use_MeC_name = True, )
- metatime.plmapper.plot_umap_proj(adata: anndata._core.anndata.AnnData, Xproj: pandas.core.frame.DataFrame, mecnamedict, use_MeC_name=True, figfile=None, N_col=4, N_MECS=100)[source]
Plotting function: plot the signature continuum score for each funcntional MeC in one large figure.
- Parameters
adata (anndata.AnnData.) – scanpy object for scRNA data. Two formats are accepted. 1. scanpy object with gene by cell expression, with extra mec columns in adata.obs. When adata.var_names has less than 100 features, go with format1. 2. scanpy object with only gene by MeC scores in adata.X. projected style format.
Xproj (pd.DataFrame) – Dataframe for projected scores
- Returns
- Return type
matplotlib figure