metatime package

Submodules

metatime.annotator module

Automatic annotator for tumor single cell data. # #

metatime.annotator.annotator(dat: Union[pandas.core.frame.DataFrame, anndata._core.anndata.AnnData], mecnamedict: dict, gcol='overcluster')[source]

Annotator marking top1st enriched cell states based on MeC scores.

Parameters
  • dat – dataframe with gene by cell scores and one column gcol (grouping column) e.g. leiden class or, anndata with gene by cell scores, and one column gcol in dat.obs Malignant cells shall be removed to keep tumor microenvironmental cells only, such as immune cells, fibroblasts, endothelial cells

  • mecnamedict – for renaming dat columns into functional names can be loaded from pre-computed tumor MeC functional annotation

  • gcol – for grouping cells. e.g. a column for overclustered cluster assignment.

Returns

  • projmat (pd.DataFrame) – Dataframe with mec projection scores and newly added predicted label column ‘MetaTiME_’+gcol

  • gpred – median score of each gcol group

  • gpreddict – dictionary how is each gcol mapped to a label

Examples

>>> projmat, gpred, gpreddict = annotator( projmat ,  mecnamedict)
metatime.annotator.overcluster(adata: anndata._core.anndata.AnnData, resolution: float = 8, random_state: int = 0, clustercol: str = 'overcluster')[source]

Overcluster single cell data to get cluster level cell state annotation

Parameters
  • adata – scanpy object with adata.uns[‘neighbors’] computed. if adata.obsm[‘X_umap’] does not exist, recomputes umap coordinates. otherwise, keep the umap coordinates

  • resolution – clustering resolution

  • random_state – clustering random state

  • clustercol – key to add to adata.obs that records cluster assignment

Returns

Return type

scanpy object with clustering results.

metatime.annotator.pdataToTable(pdata: anndata._core.anndata.AnnData, mectable: pandas.core.frame.DataFrame, gcol: str = 'overcluster')[source]

Convert projected scores to two simple pandas dataframes

Parameters
  • pdata – anndata with gene by mec scores, and one column gcol in pdata.obs

  • mectable – for renaming dat columns into functional names can be loaded from pre-computed tumor MeC functional annotation Required columns: [‘MeC_id’, ‘Annotation’, ‘UseForCellTypeAnno’]

  • gcol – a column in pdata.obs for grouping cells. a column for overclustered cluster assignment.

Returns

projmat : a pandas dataframe with mec scores and a column for grouping cells. useful for annotating cell states. mecscores: a pandas dataframe for per-cell mec scores. columns use functional annotation of mec ids

Return type

tuple

metatime.annotator.saveToAdata(adata: anndata._core.anndata.AnnData, projmat: pandas.core.frame.DataFrame, gcol: str = 'overcluster', ANNOTATION_ONLY: bool = False)[source]

Save annotation to adata.

Parameters
  • adata (anndata.AnnData) – scRNA scanpy object

  • projmat (pd.DataFrame) – Dataframe with mec projection scores and newly added predicted label column ‘MetaTiME_’+gcol

  • gcol – A column in projmat for grouping cells. Typically a column for overclustered cluster assignment.

  • ANNOTATION_ONLY – Whether to only add cluster-wise annotation to anndata or Also append scores to adata.obs

Returns

adata with per-cluster annotation column in adata.obs[[gcol]], and scores appended to adata.obs if ANNOTATION_ONLY==True

Return type

anndata.AnnData

metatime.annotator.saveToPdata(pdata: anndata._core.anndata.AnnData, adata: anndata._core.anndata.AnnData, projmat: pandas.core.frame.DataFrame, gcol: str = 'overcluster', BORROW_ADATA_EMBEDDING=True)[source]

Save annotation to pdata. Borrow embedding from adata for easy visualization, including adata.obsm[‘X_pca’], adata.obsm[‘X_umap’], adata.obsm[‘X_pca_harmony’]

Parameters
  • pdata (anndata.AnnData) – scanpy object for per-cell projected score.

  • adata (anndata.AnnData) – scRNA scanpy object

  • projmat (pd.DataFrame) – Dataframe with mec projection scores and newly added predicted label column ‘MetaTiME_’+gcol

  • gcol – A column in projmat for grouping cells. Typically a column for overclustered cluster assignment.

  • BORROW_ADATA_EMBEDDING – Whether to borrow pca and umap embeddings from adata to write in pdata. For easy visualization.

Returns

pdata with per-cluster annotation column in pdata.obs[[gcol]].

Return type

anndata.AnnData

metatime.config module

metatime.loaddata module

Functions for new scRNA data processing

metatime.loaddata.adatapp(adata_input: anndata._core.anndata.AnnData, mode: str = 'pp', random_state=42, MAX_MITO_PERCENT=5, MINGENES=500, MINCOUNTS=1000, MINCELLS=5)[source]

Pre-processing.

Parameters
  • adata_input

    scanpy scRNA object if adata.X is integer , or adata has ‘counts’ layer, like from 10X data. go through standard preprocessing. remove cells with <500 genes and <1000 counts. remove genes with min cells < 5. keep mitocondrial <5%. normalize to 1e6. log. pca, umap. normalized value saved to ‘norm_data’ layer

    if adata.X is continuous, such as Smartseq data remove cells with <500 genes. remove genes with <5 cells.

  • mode – choose from [‘pp’ , ‘umap’] pp: full pre-processing umap: preprocessing was done. only compute highly_varaible genes, pca, neighbors, umap.

  • random_state – umap random state

  • MAX_MITO_PERCENT

  • MINGENES

  • MINCOUNTS

  • MINCELLS

Returns

Return type

processed scanpy object

Examples

>>> adata = adatapp(adata )
metatime.loaddata.add_extra_metainfo(adata, meta)[source]

Add special extra metainformation for sample datasets

Parameters
  • adata (anndata.AnnData) – scanpy object with scRNA expression

  • meta (pd.DataFrame) – Dataframe with columns of metadata for each cell. index is cell barcode same as in adata.obs.index.

Returns

scanpy object with new columns merged in obs.

Return type

anndata.AnnData

Examples

>>> adata = testdata.add_extra_metainfo( adata, meta )
metatime.loaddata.batchharmonize(adata, batchcols=[], random_state=0)[source]

Harmonize batches such as patient, sample, using Harmony. Re-calculated neighbors, umap .

Parameters
  • adata – scanpy object.

  • batchcols – batch columns in adata.obs

  • random_state – umap random_state

Returns

Return type

adata with batch correction and neighbors,umap calcualted

metatime.loaddata.isRawCount(ds)[source]
Tell whether the matrix is raw count or float

if top 100 cells all mod 0, then determine it’s raw count = integer

Parameters

ds – loom, or just a numpy matrix 2d.

Returns

0, not raw count. 1, is raw count.

Return type

int

metatime.loaddata.load(file, preprocessing=False, delimiter=',')[source]

Read in a expression matrix table file as scanpy object.

If converting from an Seurat object, table can be saved in R:

tmpdata = SeuratObject@assays$RNA@data write.csv( t( as.matrix(tmpdata) ), file = ‘data.csv’)

Parameters
  • file – file path for the expression file. if suffix is .h5 or .h5ad, data is loaded using scanpy. if not, load the file in format of a table text file

  • preprocessing – whether to do preprocessing for the loaded data

  • delimiter – delimiter of the expression matrix table, if file is a text table

Returns

loaded scRNA data in scanpy object

Return type

adata

Examples

>>> adata = load( file, preprocessing=False )

metatime.mecmapper module

metatime.mecmapper.annToDataFrame(adata_input, genescaling=False, layer='norm_data')[source]

Extract expression matrix from scanpy object. :param adata_input: Input scanpy object. :type adata_input: anndata.AnnData :param genescaling: Whether to z-scale extracted feature matrix. :type genescaling: bool :param layer: Layer of expression to extract from adata_input :type layer: str

Returns

Extracted expression matrix from scanpy object.

Return type

pd.dataframe

metatime.mecmapper.projectMec(expr, mec, glstcol='TopGene20')[source]
Parameters
  • expr – expression matrix, cell by gene.

  • mec – mec table with one column, two types of format accepted format 1, each row is a mec with genes , comma seperated. format 2, each row is a genes, each column is a mec, value is float

  • glstcol – Used only when mec is the list format format1, and glstcol is the column name to record comma separated gene list for each mec.

Returns

Return type

scorepd, cell by signature.

Examples

>>> scorepd = mecmapper.projectMec( df, mec )
metatime.mecmapper.projectMecAnn(adata_input, mec, genescaling=False, sigscaling=True, addon=False, layer='norm_data', glstcol='TopGene20')[source]

Project single cell expression in AnnData to MeCs Calls: projectMeC,annToDataFrame

Parameters
  • adata_input (anndata.AnnData) – Input scanpy object for gene expression

  • mec (pd.DataFrame) – mec table with one column, two types of format both accepted format 1, each row is a mec with genes , comma seperated. format 2, each row is a genes, each column is a mec, value is float.

  • genescaling (bool) – Whether to scale expression on gene level. Recommended to be False.

  • sigscaling (bool) – Whether to scale projected scores across cells. Recomended and default is True.

  • addon (bool) – Whether to keep the original adata and append the signautres in obs, or return an independent anndata (pdata) with only projected values (which saves memory).

  • layer (str) – Layer of expression to extract from adata_input

  • glstcol – Used only when mec is the list format format 1, and glstcol is the column name to record comma separated gene list for each mec.

Returns

If addon is False, return pdata where values are MeC-projected values. If addon is True, return adata where values are same as in adata_input, but with extra obs columns.

Return type

anndata.AnnData

Examples

>>> pdata = mecmapper.projectMecAnn(adata, mec_score_topg, sigscaling=True, genescaling=False, addon=False)
metatime.mecmapper.projectModuleAnn_aucell(adata, module, glstcol='TopGene20')[source]

Alternative function that projects scRNA data using top genes from MeCs and AUCell module has to be list mode.

Parameters
  • adata (anndata.AnnData) – Input scanpy object for gene expression

  • module (pd.DataFrame) – mec table with one column, two types of format accepted format 1, each row is a mec with genes , comma seperated.

  • glstcol – Used only when mec is the list format format 1, and glstcol is the column name to record comma separated gene list for each mec.

Returns

adata with aucell score stored in extra columns starting with ‘score_sig_’ in adata.obs

Return type

anndata.AnnData

metatime.mecmapper.scale(df)[source]

standardize scaling feature

metatime.mecs module

## class and functions for meta components

class metatime.mecs.MetatimeMecs(mec_score, mec_topg, mec_anno)[source]

Bases: object

Class for MetaTiME Mecs.

Parameters

modelpath – directory of model files

Variables
  • mecscore – pandas dataframe, z-weights gene by component

  • mectopg – pandas dataframe, list for top gene only

  • mecanno – pandas dataframe, annotation for each mec.

property feature: numpy.ndarray

get genes covered in the mec z-weight matrix

static load_mec_precomputed(mecDIR='/home/docs/checkouts/readthedocs.org/user_builds/metatime/checkouts/latest/metatime/pretrained/mec/')[source]

Load pre-computed Meta-component matrix and ordered list. Look for precomputed files in mecDIR

MeC_allgene_average-weights.tsv MeC_topgene.tsv

Parameters

mecDIR – Path for model files

Returns

Loaded MeCs

Return type

MetatimeMecs

Examples

>>> mecmodel = mecs.MetatimeMecs.load_mec_precomputed()
property nmecs: float

get number of meta-components

metatime.mecs.getmecnamedict_ct(mectable, only_include_mecs_UseForCellStateAnno=True)[source]

Collect list of meta components to be used for cell state annotation.

Parameters
  • mectable – must have UseForCellStateAnno column with 0 or 1. 1:used in cell state annotation. this is helpful to remove pan-cell cell state like general mitochondrial activity component.

  • only_include_mecs_UseForCellStateAnno – filter mecs used for cell state annotation. Default: True

Returns

a subsetted dictionary only containing MeCs used for cell state enrichment.

Return type

dict

metatime.mecs.load_mecname(mode='table', mecDIR='/home/docs/checkouts/readthedocs.org/user_builds/metatime/checkouts/latest/metatime/pretrained/mec/')[source]

Load functional annotation for pre-computed tumor microenvironemnt MeCs

Parameters

mode – choose from [‘mecnamedict’, ‘table’, ‘meciddict’] load manual assigned name for easy understanding of assigned names from file: MeC_anno_name.tsv under mecDIR. Required columns: [‘MeC_id’, ‘Annotation’, ‘UseForCellStateAnno’] Required seperator: tab Annotation column NA will be filtered.

Returns

Return type

functional annotation in desided format

metatime.plmapper module

# function for plotting after mapper

class metatime.plmapper.MidpointNormalize(vmin=None, vmax=None, midpoint=None, clip=False)[source]

Bases: matplotlib.colors.Normalize

Normalise the colorbar separately for above-midpoint and below-midpoint. Useful for visualizing the signature continuum with diverging colorbar. borrowed from [chris35wills](http://chris35wills.github.io/matplotlib_diverging_colorbar/)

Parameters
  • colors.Normalize

  • midpoint=0 (e.g.) –

  • vmin=vmin

  • vmax=vmax

Examples

# Pass kwargs to scatterplot >>> kwargs={‘color_map’:’RdBu_r’, ‘norm’: MidpointNormalize(midpoint=0,vmin=vmin,vmax=vmax) }

# note:

metatime.plmapper.gen_mpl_labels(adata, groupby, exclude=(), ax=None, adjust_kwargs=None, text_kwargs=None)[source]

Get locations of cluster median . Borrowed from scanpy github forum.

metatime.plmapper.plot_annotation_on_data(sdata, COL='MetaTiME', title=None, fontsize=8, MIN_CELL=5)[source]

Plot annotated cells with non-overlapping fonts. Calls: gen_mpl_labels Occasionally, if sdata.uns[‘MetaTiME_overcluster_colors’] exists, color may not be updated. In that case, resetting color del pdata.uns[‘MetaTiME_overcluster_colors’]

Parameters

sdata : anndata.AnnData scanpy object for single cell , or scanpy object with projected signature.

COLstr

column of cluster assignment in sdata.obs

MIN_CELL: int

Minimum number of cells to plot and mark

Matplotlib figure.

Examples

>>> plmapper.plot_annotation_on_data(pdata, title = 'MetaTiME')
metatime.plmapper.plot_umap_mec(pdata: anndata._core.anndata.AnnData, meccol: str, mecnamedict: dict, use_MeC_name: bool = True, figfile: Optional[str] = None, figsize=(3, 3))[source]

Plot signature continuum for a specific MeC.

Parameters
  • pdata (anndata.AnnData) – scanpy object with projection score matrix in pdata.X

  • meccol – MeC id

  • mecnamedict

metatime.plmapper.plot_umap_mecproj_2condition(pdata, meccol, mecnamedict, cellscond1, cellscond2, use_MeC_name=True, figfile='../tmp.png')[source]

Pplotting comparison panel for pdata format, beta version.

Examples

`Xproj = pdata.to_df() use_MeC_name = True mec_col =’score_1’ mec_name = mecnamedict[ mec_col ] condcol = ‘response’ # cells from 2 conditions allcellscond1= adata[adata.obs[ condcol ].isin([‘Non-responder’]), ].obs.index allcellscond2= adata[adata.obs[ condcol ].isin([‘Responder’]), ].obs.index

plmapper.plot_umap_mecproj_2condition( pdata, meccol = ‘score_0’, mecnamedict = mecnamedict,

cellscond1 = allcellscond1, cellscond2 = allcellscond2, use_MeC_name = True, )

`

metatime.plmapper.plot_umap_proj(adata: anndata._core.anndata.AnnData, Xproj: pandas.core.frame.DataFrame, mecnamedict, use_MeC_name=True, figfile=None, N_col=4, N_MECS=100)[source]

Plotting function: plot the signature continuum score for each funcntional MeC in one large figure.

Parameters
  • adata (anndata.AnnData.) – scanpy object for scRNA data. Two formats are accepted. 1. scanpy object with gene by cell expression, with extra mec columns in adata.obs. When adata.var_names has less than 100 features, go with format1. 2. scanpy object with only gene by MeC scores in adata.X. projected style format.

  • Xproj (pd.DataFrame) – Dataframe for projected scores

Returns

Return type

matplotlib figure

metatime.plmapper.plot_umap_proj_pdata(pdata, mecnamedict, use_MeC_name=True, figfile=None, N_col=4)[source]

Save all projection images in one large figure. input is pdata format. Deprecated

metatime.testdata module

Module contents