metatime package

Submodules

metatime.annotator module

Automatic annotator for tumor single cell data. # #

metatime.annotator.annotator(dat: Union[pandas.core.frame.DataFrame, anndata._core.anndata.AnnData], mecnamedict: dict, gcol='overcluster')[source]

Annotator marking top1st enriched cell states based on MeC scores.

Parameters

dat – dataframe with gene by cell scores and one column gcol (grouping column) e.g. leiden class or, anndata with gene by cell scores, and one column gcol in dat.obs Malignant cells shall be removed to keep tumor microenvironmental cells only, such as immune cells, fibroblasts, endothelial cells
mecnamedict – for renaming dat columns into functional names can be loaded from pre-computed tumor MeC functional annotation
gcol – for grouping cells. e.g. a column for overclustered cluster assignment.

Returns

projmat (pd.DataFrame) – Dataframe with mec projection scores and newly added predicted label column ‘MetaTiME_’+gcol
gpred – median score of each gcol group
gpreddict – dictionary how is each gcol mapped to a label

Examples

>>> projmat, gpred, gpreddict = annotator( projmat ,  mecnamedict)

metatime.annotator.overcluster(adata: anndata._core.anndata.AnnData, resolution: float = 8, random_state: int = 0, clustercol: str = 'overcluster')[source]

Overcluster single cell data to get cluster level cell state annotation

Parameters

adata – scanpy object with adata.uns[‘neighbors’] computed. if adata.obsm[‘X_umap’] does not exist, recomputes umap coordinates. otherwise, keep the umap coordinates
resolution – clustering resolution
random_state – clustering random state
clustercol – key to add to adata.obs that records cluster assignment

Returns

Return type

scanpy object with clustering results.

metatime.annotator.pdataToTable(pdata: anndata._core.anndata.AnnData, mectable: pandas.core.frame.DataFrame, gcol: str = 'overcluster')[source]

Convert projected scores to two simple pandas dataframes

Parameters

pdata – anndata with gene by mec scores, and one column gcol in pdata.obs
mectable – for renaming dat columns into functional names can be loaded from pre-computed tumor MeC functional annotation Required columns: [‘MeC_id’, ‘Annotation’, ‘UseForCellTypeAnno’]
gcol – a column in pdata.obs for grouping cells. a column for overclustered cluster assignment.

Returns

projmat : a pandas dataframe with mec scores and a column for grouping cells. useful for annotating cell states. mecscores: a pandas dataframe for per-cell mec scores. columns use functional annotation of mec ids

Return type

tuple

metatime.annotator.saveToAdata(adata: anndata._core.anndata.AnnData, projmat: pandas.core.frame.DataFrame, gcol: str = 'overcluster', ANNOTATION_ONLY: bool = False)[source]

Save annotation to adata.

Parameters

adata (anndata.AnnData) – scRNA scanpy object
projmat (pd.DataFrame) – Dataframe with mec projection scores and newly added predicted label column ‘MetaTiME_’+gcol
gcol – A column in projmat for grouping cells. Typically a column for overclustered cluster assignment.
ANNOTATION_ONLY – Whether to only add cluster-wise annotation to anndata or Also append scores to adata.obs

Returns

adata with per-cluster annotation column in adata.obs[[gcol]], and scores appended to adata.obs if ANNOTATION_ONLY==True

Return type

anndata.AnnData

metatime.annotator.saveToPdata(pdata: anndata._core.anndata.AnnData, adata: anndata._core.anndata.AnnData, projmat: pandas.core.frame.DataFrame, gcol: str = 'overcluster', BORROW_ADATA_EMBEDDING=True)[source]

Save annotation to pdata. Borrow embedding from adata for easy visualization, including adata.obsm[‘X_pca’], adata.obsm[‘X_umap’], adata.obsm[‘X_pca_harmony’]

Parameters

pdata (anndata.AnnData) – scanpy object for per-cell projected score.
adata (anndata.AnnData) – scRNA scanpy object
projmat (pd.DataFrame) – Dataframe with mec projection scores and newly added predicted label column ‘MetaTiME_’+gcol
gcol – A column in projmat for grouping cells. Typically a column for overclustered cluster assignment.
BORROW_ADATA_EMBEDDING – Whether to borrow pca and umap embeddings from adata to write in pdata. For easy visualization.

Returns

pdata with per-cluster annotation column in pdata.obs[[gcol]].

Return type

anndata.AnnData

metatime.config module

metatime.loaddata module

Functions for new scRNA data processing

metatime.loaddata.adatapp(adata_input: anndata._core.anndata.AnnData, mode: str = 'pp', random_state=42, MAX_MITO_PERCENT=5, MINGENES=500, MINCOUNTS=1000, MINCELLS=5)[source]

Pre-processing.

Parameters

adata_input –
scanpy scRNA object if adata.X is integer , or adata has ‘counts’ layer, like from 10X data. go through standard preprocessing. remove cells with <500 genes and <1000 counts. remove genes with min cells < 5. keep mitocondrial <5%. normalize to 1e6. log. pca, umap. normalized value saved to ‘norm_data’ layer

if adata.X is continuous, such as Smartseq data remove cells with <500 genes. remove genes with <5 cells.
mode – choose from [‘pp’ , ‘umap’] pp: full pre-processing umap: preprocessing was done. only compute highly_varaible genes, pca, neighbors, umap.
random_state – umap random state
MAX_MITO_PERCENT –
MINGENES –
MINCOUNTS –
MINCELLS –

Returns

Return type

processed scanpy object

Examples

>>> adata = adatapp(adata )

metatime.loaddata.add_extra_metainfo(adata, meta)[source]

Add special extra metainformation for sample datasets

Parameters

adata (anndata.AnnData) – scanpy object with scRNA expression
meta (pd.DataFrame) – Dataframe with columns of metadata for each cell. index is cell barcode same as in adata.obs.index.

Returns

scanpy object with new columns merged in obs.

Return type

anndata.AnnData

Examples

>>> adata = testdata.add_extra_metainfo( adata, meta )

metatime.loaddata.batchharmonize(adata, batchcols=[], random_state=0)[source]

Harmonize batches such as patient, sample, using Harmony. Re-calculated neighbors, umap .

Parameters

adata – scanpy object.
batchcols – batch columns in adata.obs
random_state – umap random_state

Returns

Return type

adata with batch correction and neighbors,umap calcualted

metatime.loaddata.isRawCount(ds)[source]

Tell whether the matrix is raw count or float: if top 100 cells all mod 0, then determine it’s raw count = integer

Parameters: ds – loom, or just a numpy matrix 2d.
Returns: 0, not raw count. 1, is raw count.
Return type: int

metatime.loaddata.load(file, preprocessing=False, delimiter=',')[source]

Read in a expression matrix table file as scanpy object.

If converting from an Seurat object, table can be saved in R:: tmpdata = SeuratObject@assays$RNA@data write.csv( t( as.matrix(tmpdata) ), file = ‘data.csv’)

Parameters

file – file path for the expression file. if suffix is .h5 or .h5ad, data is loaded using scanpy. if not, load the file in format of a table text file
preprocessing – whether to do preprocessing for the loaded data
delimiter – delimiter of the expression matrix table, if file is a text table

Returns

loaded scRNA data in scanpy object

Return type

adata

Examples

>>> adata = load( file, preprocessing=False )

metatime.mecmapper module

metatime.mecmapper.annToDataFrame(adata_input, genescaling=False, layer='norm_data')[source]

Extract expression matrix from scanpy object. :param adata_input: Input scanpy object. :type adata_input: anndata.AnnData :param genescaling: Whether to z-scale extracted feature matrix. :type genescaling: bool :param layer: Layer of expression to extract from adata_input :type layer: str

Returns: Extracted expression matrix from scanpy object.
Return type: pd.dataframe

metatime.mecmapper.projectMec(expr, mec, glstcol='TopGene20')[source]

Parameters

expr – expression matrix, cell by gene.
mec – mec table with one column, two types of format accepted format 1, each row is a mec with genes , comma seperated. format 2, each row is a genes, each column is a mec, value is float
glstcol – Used only when mec is the list format format1, and glstcol is the column name to record comma separated gene list for each mec.

Returns

Return type

scorepd, cell by signature.

Examples

>>> scorepd = mecmapper.projectMec( df, mec )

metatime.mecmapper.projectMecAnn(adata_input, mec, genescaling=False, sigscaling=True, addon=False, layer='norm_data', glstcol='TopGene20')[source]

Project single cell expression in AnnData to MeCs Calls: projectMeC,annToDataFrame

Parameters

adata_input (anndata.AnnData) – Input scanpy object for gene expression
mec (pd.DataFrame) – mec table with one column, two types of format both accepted format 1, each row is a mec with genes , comma seperated. format 2, each row is a genes, each column is a mec, value is float.
genescaling (bool) – Whether to scale expression on gene level. Recommended to be False.
sigscaling (bool) – Whether to scale projected scores across cells. Recomended and default is True.
addon (bool) – Whether to keep the original adata and append the signautres in obs, or return an independent anndata (pdata) with only projected values (which saves memory).
layer (str) – Layer of expression to extract from adata_input
glstcol – Used only when mec is the list format format 1, and glstcol is the column name to record comma separated gene list for each mec.

Returns

If addon is False, return pdata where values are MeC-projected values. If addon is True, return adata where values are same as in adata_input, but with extra obs columns.

Return type

anndata.AnnData

Examples

>>> pdata = mecmapper.projectMecAnn(adata, mec_score_topg, sigscaling=True, genescaling=False, addon=False)

metatime.mecmapper.projectModuleAnn_aucell(adata, module, glstcol='TopGene20')[source]

Alternative function that projects scRNA data using top genes from MeCs and AUCell module has to be list mode.

Parameters

adata (anndata.AnnData) – Input scanpy object for gene expression
module (pd.DataFrame) – mec table with one column, two types of format accepted format 1, each row is a mec with genes , comma seperated.
glstcol – Used only when mec is the list format format 1, and glstcol is the column name to record comma separated gene list for each mec.

Returns

adata with aucell score stored in extra columns starting with ‘score_sig_’ in adata.obs

Return type

anndata.AnnData

metatime.mecmapper.scale(df)[source]: standardize scaling feature

metatime.mecs module

## class and functions for meta components

class metatime.mecs.MetatimeMecs(mec_score, mec_topg, mec_anno)[source]

Bases: object

Class for MetaTiME Mecs.

Parameters

modelpath – directory of model files

Variables

mecscore – pandas dataframe, z-weights gene by component
mectopg – pandas dataframe, list for top gene only
mecanno – pandas dataframe, annotation for each mec.

property feature: numpy.ndarray: get genes covered in the mec z-weight matrix

static load_mec_precomputed(mecDIR='/home/docs/checkouts/readthedocs.org/user_builds/metatime/checkouts/latest/metatime/pretrained/mec/')[source]

Load pre-computed Meta-component matrix and ordered list. Look for precomputed files in mecDIR

MeC_allgene_average-weights.tsv MeC_topgene.tsv

Parameters: mecDIR – Path for model files
Returns: Loaded MeCs
Return type: MetatimeMecs

Examples

>>> mecmodel = mecs.MetatimeMecs.load_mec_precomputed()

property nmecs: float: get number of meta-components

metatime.mecs.getmecnamedict_ct(mectable, only_include_mecs_UseForCellStateAnno=True)[source]

Collect list of meta components to be used for cell state annotation.

Parameters

mectable – must have UseForCellStateAnno column with 0 or 1. 1:used in cell state annotation. this is helpful to remove pan-cell cell state like general mitochondrial activity component.
only_include_mecs_UseForCellStateAnno – filter mecs used for cell state annotation. Default: True

Returns

a subsetted dictionary only containing MeCs used for cell state enrichment.

Return type

dict

metatime.mecs.load_mecname(mode='table', mecDIR='/home/docs/checkouts/readthedocs.org/user_builds/metatime/checkouts/latest/metatime/pretrained/mec/')[source]

Load functional annotation for pre-computed tumor microenvironemnt MeCs

Parameters: mode – choose from [‘mecnamedict’, ‘table’, ‘meciddict’] load manual assigned name for easy understanding of assigned names from file: MeC_anno_name.tsv under mecDIR. Required columns: [‘MeC_id’, ‘Annotation’, ‘UseForCellStateAnno’] Required seperator: tab Annotation column NA will be filtered.
Returns
Return type: functional annotation in desided format

metatime.plmapper module

# function for plotting after mapper

class metatime.plmapper.MidpointNormalize(vmin=None, vmax=None, midpoint=None, clip=False)[source]

Bases: matplotlib.colors.Normalize

Normalise the colorbar separately for above-midpoint and below-midpoint. Useful for visualizing the signature continuum with diverging colorbar. borrowed from [chris35wills](http://chris35wills.github.io/matplotlib_diverging_colorbar/)

Parameters

colors.Normalize –
midpoint=0 (e.g.) –
vmin=vmin –
vmax=vmax –

Examples

# Pass kwargs to scatterplot >>> kwargs={‘color_map’:’RdBu_r’, ‘norm’: MidpointNormalize(midpoint=0,vmin=vmin,vmax=vmax) }

# note:

metatime.plmapper.gen_mpl_labels(adata, groupby, exclude=(), ax=None, adjust_kwargs=None, text_kwargs=None)[source]: Get locations of cluster median . Borrowed from scanpy github forum.

metatime.plmapper.plot_annotation_on_data(sdata, COL='MetaTiME', title=None, fontsize=8, MIN_CELL=5)[source]

Plot annotated cells with non-overlapping fonts. Calls: gen_mpl_labels Occasionally, if sdata.uns[‘MetaTiME_overcluster_colors’] exists, color may not be updated. In that case, resetting color del pdata.uns[‘MetaTiME_overcluster_colors’]

Parameters

sdata : anndata.AnnData scanpy object for single cell , or scanpy object with projected signature.

COLstr

column of cluster assignment in sdata.obs

MIN_CELL: int

Minimum number of cells to plot and mark

Matplotlib figure.

Examples

>>> plmapper.plot_annotation_on_data(pdata, title = 'MetaTiME')

metatime.plmapper.plot_umap_mec(pdata: anndata._core.anndata.AnnData, meccol: str, mecnamedict: dict, use_MeC_name: bool = True, figfile: Optional[str] = None, figsize=(3, 3))[source]

Plot signature continuum for a specific MeC.

Parameters

pdata (anndata.AnnData) – scanpy object with projection score matrix in pdata.X
meccol – MeC id
mecnamedict –

metatime.plmapper.plot_umap_mecproj_2condition(pdata, meccol, mecnamedict, cellscond1, cellscond2, use_MeC_name=True, figfile='../tmp.png')[source]

Pplotting comparison panel for pdata format, beta version.

Examples

`Xproj = pdata.to_df() use_MeC_name = True mec_col =’score_1’ mec_name = mecnamedict[ mec_col ] condcol = ‘response’ # cells from 2 conditions allcellscond1= adata[adata.obs[ condcol ].isin([‘Non-responder’]), ].obs.index allcellscond2= adata[adata.obs[ condcol ].isin([‘Responder’]), ].obs.index

plmapper.plot_umap_mecproj_2condition( pdata, meccol = ‘score_0’, mecnamedict = mecnamedict,: cellscond1 = allcellscond1, cellscond2 = allcellscond2, use_MeC_name = True, )

`

metatime.plmapper.plot_umap_proj(adata: anndata._core.anndata.AnnData, Xproj: pandas.core.frame.DataFrame, mecnamedict, use_MeC_name=True, figfile=None, N_col=4, N_MECS=100)[source]

Plotting function: plot the signature continuum score for each funcntional MeC in one large figure.

Parameters

adata (anndata.AnnData.) – scanpy object for scRNA data. Two formats are accepted. 1. scanpy object with gene by cell expression, with extra mec columns in adata.obs. When adata.var_names has less than 100 features, go with format1. 2. scanpy object with only gene by MeC scores in adata.X. projected style format.
Xproj (pd.DataFrame) – Dataframe for projected scores

Returns

Return type

matplotlib figure

metatime.plmapper.plot_umap_proj_pdata(pdata, mecnamedict, use_MeC_name=True, figfile=None, N_col=4)[source]: Save all projection images in one large figure. input is pdata format. Deprecated

metatime package

Submodules

metatime.annotator module

metatime.config module

metatime.loaddata module

metatime.mecmapper module

metatime.mecs module

metatime.plmapper module

metatime.testdata module

Module contents