MetaTiME for Cell State Annotator and Cell Signature Mapper

installation for colab

[4]:

##Calling matplotlib first help avoid error in colab when plotting
import matplotlib
import matplotlib.pyplot as plt
!wget -q https://www.dropbox.com/s/gnmgfxa85basvwq/logo.png
img = matplotlib.image.imread('./logo.png', 0)

logo = plt.imshow( img ) # for test the colab matplotlib. a logo shall be plotted:)
_=plt.axis('off')

## install
!pip install scanpy
!pip install metatime
!pip install anndata
!pip install adjustText
!pip install leidenalg
!pip install harmonypy

imports

[1]:

import sys
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.dpi'] = 200
import seaborn as sns
import scanpy as sc


import importlib as imp
import metatime
import matplotlib
import matplotlib.pyplot as plt
"""
## For local testing in dev
METATIME_DIR = '../../metatime/'
if METATIME_DIR not in sys.path:
    sys.path.append(METATIME_DIR)
file = '../notebooks_dev/BCC_GSE123813_aPD1_with_metainfo.h5ad'
##
"""

from metatime import config
from metatime import loaddata
from metatime import mecmapper
from metatime import mecs
from metatime import annotator
from metatime import plmapper
from metatime import dmec
import matplotlib

Get sample data

[ ]:

#!wget -O testdata.h5ad https://www.dropbox.com/s/0sj6gtppsfr2ouo/NSCLC_GSE117570_res.pp.h5ad?dl=0

!wget -O testdata.h5ad https://www.dropbox.com/s/f791mv3tnhs9fii/BCC_GSE123813_aPD1_with_metainfo.h5ad

Load scRNA data

[ ]:

file = 'testdata.h5ad'

Or provide path to local file

[3]:

# file = '../notebooks_dev/BCC_GSE123813_aPD1_with_metainfo.h5ad' # Yost et al. 2019 Nat Med.

Load data

[3]:

adata = loaddata.load(file = file)

 Loaded ../notebooks_dev/BCC_GSE123813_aPD1_with_metainfo.h5ad

Processing and Harmonize

If your data contains count matrix, we provide a wrapped function for pre-processing the data.

Otherwise, if the data is already depth-normalized, log-transformed, and cells are filtered, we can skip this step.

[21]:

# adata = loaddata.adatapp( adata )

The sample data has multiple patients , and we can use batch correction on patients.

[4]:

batchcols = ['patient']
adata = loaddata.batchharmonize( adata, batchcols = batchcols, random_state = 0 )

[Log ] hamonize with batch  ['patient']

2023-03-14 11:18:26,769 - harmonypy - INFO - Iteration 1 of 10
2023-03-14 11:19:26,441 - harmonypy - INFO - Iteration 2 of 10
2023-03-14 11:20:26,466 - harmonypy - INFO - Iteration 3 of 10
2023-03-14 11:21:32,953 - harmonypy - INFO - Iteration 4 of 10
2023-03-14 11:22:26,570 - harmonypy - INFO - Iteration 5 of 10
2023-03-14 11:23:18,462 - harmonypy - INFO - Iteration 6 of 10
2023-03-14 11:24:01,069 - harmonypy - INFO - Iteration 7 of 10
2023-03-14 11:24:29,524 - harmonypy - INFO - Iteration 8 of 10
2023-03-14 11:25:14,176 - harmonypy - INFO - Iteration 9 of 10
2023-03-14 11:26:07,662 - harmonypy - INFO - Iteration 10 of 10
2023-03-14 11:26:58,323 - harmonypy - INFO - Stopped before convergence

[Log ] Recomputing umap

It is recommended that malignant cells are identified first and removed for best practice in cell state annotation.

In the BCC data, the cluster of malignant cells are identified with inferCNV. We can use the pre-saved column ‘isTME’ to keep Tumor Microenvironment cells

[5]:

adata = adata[adata.obs['isTME']]

We can over-cluster the cells which is useful for fine-grained cell state annotation.

[6]:

adata = annotator.overcluster( adata ) # this generates a 'overcluster' columns in adata.obs

/homes6/yiz/Tools/pip/pippy3/lib/python3.7/site-packages/scanpy/tools/_leiden.py:160: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  categories=natsorted(map(str, np.unique(groups))),

Load MetaTiME trained model for TME

Next, let’s load the pre-computed MetaTiME MetaComponents (MeCs), and their functional annotation.

[7]:

# Load the pre-trained MeCs
mecmodel = mecs.MetatimeMecs.load_mec_precomputed()

# Load functional annotation for MetaTiME-TME
mectable = mecs.load_mecname( mecDIR = config.SCMECDIR, mode ='table' )
mecnamedict = mecs.getmecnamedict_ct(mectable)

Project on MeC space

We can project each cell onto this functional MeC space.

[8]:

pdata = mecmapper.projectMecAnn(adata, mecmodel.mec_score)

/liulab/yiz/proj/MetaTiME/metatime/mecmapper.py:151: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  adataproj = anndata.AnnData( projected, obs = adata.obs )

Per-cell score projected on MeC space

Convert to table to look at per-cell projection scores

[9]:

projmat, mecscores = annotator.pdataToTable(pdata, mectable, gcol = 'overcluster')
mecscores

[9]:

	0: Pan_Interferon-response	1: Pan_Proliferation-S	2: Pan_Heat-stress	3: DC_pDC	4: B_Plasma	5: Pan_Proliferation-G2-M	6: DC_cDC2-MHCII	7: T_Treg-T-co-signaling	8: T_NK-cytotoxic	9: M_Monocyte-CD16	...	77: B_Plasma-JCHAIN	78: T_CD4T-IL7R	MeC_79	MeC_80	81: Stroma_Fibroblast-VEGF	MeC_82	83: Pan_Myeloid-fibroblast	MeC_84	MeC_85	overcluster
bcc.su001.pre.tcell_AAACCTGCAGGGATTG	-0.819890	-0.688164	-1.293394	-0.623175	-0.619621	-0.014213	-0.421470	-0.575049	0.806688	-0.228148	...	-0.503919	1.365941	0.532829	1.303657	-0.015501	-0.200811	2.227211	-2.101272	0.964769	103
bcc.su001.pre.tcell_AAACGGGCATAGACTC	0.497730	-0.781702	-1.236870	-0.225077	-0.407063	-0.196667	-0.708806	0.148677	1.226127	0.229111	...	-0.964028	-0.869057	-0.948729	0.932158	-1.617307	-0.813145	-0.541694	-1.985215	0.821645	71
bcc.su001.pre.tcell_AAACGGGTCATACGGT	-0.853597	0.389129	2.834898	-0.235956	0.469343	0.946867	-0.673814	0.078563	1.469857	0.388126	...	-0.713808	1.094122	1.178537	-2.930747	-2.534621	1.399297	-0.640663	2.175453	-1.674753	9
bcc.su001.pre.tcell_AAAGATGAGACAGGCT	1.643644	-0.217423	-0.643116	0.088754	-0.523546	-1.255705	-0.849190	0.658166	-0.770305	1.756781	...	-1.132153	1.547834	-1.547482	0.496351	-1.177201	-0.764956	0.551764	0.479404	-0.165305	8
bcc.su001.pre.tcell_AAAGATGTCTGAGTGT	-0.904096	-0.855004	-1.025405	-0.417323	-0.265433	-0.517993	-0.371883	-0.238967	-1.225588	0.211766	...	0.743283	2.142204	0.523307	0.401327	-0.243886	-0.139409	1.541129	-0.646087	-0.965725	40
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
bcc.su012.post.tcell_TTTGCGCGTCTTTCAT	-0.655006	-0.323290	1.357172	-0.252120	-0.477362	-0.562883	-0.469015	-0.594926	-0.978787	0.131918	...	0.224142	1.476642	1.258007	-0.838055	0.543334	0.264970	-0.528499	1.782975	-2.250962	19
bcc.su012.post.tcell_TTTGGTTCAGCCAGAA	-1.002944	-0.177266	1.441417	-0.175024	-0.643992	-1.178595	-1.009185	0.391318	0.190629	1.112531	...	0.111604	0.727014	0.063260	0.221161	1.290879	0.011502	-0.428551	1.685099	-0.775345	15
bcc.su012.post.tcell_TTTGTCAAGCACCGTC	-0.705487	-0.384236	1.495082	-0.411420	-0.401349	-0.957136	-1.487738	0.004184	-0.690182	0.843381	...	-0.196389	-0.427286	-0.727919	-0.666853	1.105166	3.032268	-0.136370	1.011591	-0.914623	15
bcc.su012.post.tcell_TTTGTCAAGTGAACGC	0.205688	-0.372909	1.382011	0.256174	-0.391342	0.236742	-1.236226	1.282568	0.039292	0.352591	...	-1.496709	-0.348647	-0.667981	0.685011	-1.942186	-0.779483	-0.759375	0.198692	0.043316	12
bcc.su012.post.tcell_TTTGTCAGTCCCTTGT	-0.059757	0.314007	1.000176	-0.132242	0.371538	-0.351378	-0.189296	3.665382	-0.018753	0.335091	...	-0.520862	-0.083217	2.131293	-0.706136	0.002483	-0.122560	-0.354572	1.812793	-0.942595	3

38836 rows × 87 columns

Cell state annotator and save the results to scRNA objects

Based on per-cell projection scores, compute the per-cluster cell state scores, and find out the top 1st enriched cell state.

Save the cluster-wise cell state annotation.

Or, it can also be saved to the scanpy object “pdata”, containing only the scores, which is smaller to save.

[10]:

projmat, gpred, gpreddict = annotator.annotator( projmat,  mecnamedict, gcol = 'overcluster')

adata = annotator.saveToAdata( adata, projmat )
pdata = annotator.saveToPdata( pdata, adata, projmat )

/liulab/yiz/proj/MetaTiME/metatime/annotator.py:190: UserWarning: Columns in projmat already overlap columns in adata.obs. Overwriting adata.obs.
  warnings.warn('Columns in projmat already overlap columns in adata.obs. Overwriting adata.obs. '

New columns added to adata and pdata: - MetaTiME_overcluster

[11]:

## Simplify the cell states as on UMAP plot by updating:
##  adata, pdata, mecnamedict
adata.obs['MetaTiME'] = adata.obs['MetaTiME_overcluster'].str.split(': ').str.get(1)
pdata.obs['MetaTiME'] = pdata.obs['MetaTiME_overcluster'].str.split(': ').str.get(1)
mecnamedict_simp = mecnamedict.copy()
_=[mecnamedict_simp.update({t: mecnamedict[t].split(': ')[1]}) for t in mecnamedict.keys()]

New columns added to adata and pdata: - MetaTiME

A simplified naming dictory: - mecnamedict_simp

Plot per-cluster, Top1 enriched cell states

[12]:

fig,ax = plmapper.plot_annotation_on_data( adata, COL = 'MetaTiME', fontsize=7 )

/homes6/yiz/Tools/pip/pippy3/lib/python3.7/site-packages/anndata/_core/anndata.py:1235: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  df[key] = c

../_images/notebooks_metatime_annotator_38_1.png

Check the list of MeCs

[13]:

mecnamedict_simp

[13]:

{'MeC_0': 'Pan_Interferon-response',
 'MeC_1': 'Pan_Proliferation-S',
 'MeC_2': 'Pan_Heat-stress',
 'MeC_3': 'DC_pDC',
 'MeC_4': 'B_Plasma',
 'MeC_5': 'Pan_Proliferation-G2-M',
 'MeC_6': 'DC_cDC2-MHCII',
 'MeC_7': 'T_Treg-T-co-signaling',
 'MeC_8': 'T_NK-cytotoxic',
 'MeC_9': 'M_Monocyte-CD16',
 'MeC_11': 'T_NK-T-cytotoxic',
 'MeC_12': 'T_CD8T-GZMK-CCL5',
 'MeC_13': 'Myeloid_Mast',
 'MeC_14': 'Pan_Proliferation',
 'MeC_15': 'Pan_Ribosome',
 'MeC_16': 'T_CD4T-MAIT-Th17',
 'MeC_17': 'M_Monocyte-CD14',
 'MeC_18': 'B_B',
 'MeC_19': 'M_Macrophage-C1Q',
 'MeC_20': 'T_Tfh-CXCL13',
 'MeC_21': 'T_CD4T-cytoskeleton',
 'MeC_22': 'Stroma_Fibroblast-collagen',
 'MeC_23': 'Stroma_Endothelial',
 'MeC_24': 'Pan_Metallothionein',
 'MeC_25': 'T_Cytotoxic-GZMB',
 'MeC_26': 'M_Macrophage-SPP1',
 'MeC_27': 'Stroma_Fibro-endo-EMT',
 'MeC_28': 'Pan_Beta-catenin',
 'MeC_29': 'M_Monocyte-CD14-S100A12',
 'MeC_30': 'B_B-Ig-kappa',
 'MeC_31': 'Pan_Protein-processing',
 'MeC_32': 'T_CD8T-CD3',
 'MeC_33': 'Pan_Cycling-MYC',
 'MeC_34': 'T_NK-T-IFNG-CCL4',
 'MeC_35': 'T_T-Immune-synapse',
 'MeC_36': 'M_Macrophage-SPP1-C1Q',
 'MeC_37': 'M_Macrophage-lipid-PPARG',
 'MeC_38': 'Stroma_Endothelial-CCL21',
 'MeC_39': 'Myeloid_Platelet',
 'MeC_40': 'T_CD8Tex-CXCL13',
 'MeC_41': 'Stroma_Myofibroblast',
 'MeC_42': 'M_Macrophage-IL1-NFKB',
 'MeC_43': 'T_T-naive',
 'MeC_45': 'M_Macrophage-IL1-JUN',
 'MeC_46': 'Myeloid_Erythrocytes',
 'MeC_47': 'Pan_Glycolysis',
 'MeC_48': 'T_NK-T-CCL5',
 'MeC_49': 'M_Macrophage_RNASE1-C1Q',
 'MeC_50': 'B_B-PAX5',
 'MeC_51': 'DC_DC-Mature',
 'MeC_52': 'M_Monocyte-LYZ',
 'MeC_53': 'Pan_MHCI',
 'MeC_54': 'Pan_Ribosome-RPS18',
 'MeC_55': 'DC_cDC1',
 'MeC_56': 'T_T-CREM',
 'MeC_57': 'Pan_AP1-JUN',
 'MeC_58': 'Pan_MHCII',
 'MeC_59': 'B_B-Ig-lambda',
 'MeC_61': 'Pan_AP1-FOS',
 'MeC_63': 'Pan_MHCII-HLA-DRA',
 'MeC_64': 'Pan_AP1-IER2',
 'MeC_65': 'T_T-co-signaling',
 'MeC_66': 'Pan_CCL2',
 'MeC_67': 'Pan_Heat-stress-HSPA1A',
 'MeC_68': 'Pan_NR4A-signaling',
 'MeC_69': 'T_CD4T-CXCR4',
 'MeC_74': 'M_Monocyte-CD14-proliferation',
 'MeC_75': 'T_NK-T-Cytotoxic-GZMB',
 'MeC_76': 'Pan_CXCL1-CXCL2',
 'MeC_77': 'B_Plasma-JCHAIN',
 'MeC_78': 'T_CD4T-IL7R',
 'MeC_81': 'Stroma_Fibroblast-VEGF',
 'MeC_83': 'Pan_Myeloid-fibroblast'}

Plot signature continuum for a single MeC

[14]:

fig=plmapper.plot_umap_mec( pdata, 'MeC_3' ,mecnamedict, figfile =None)

../_images/notebooks_metatime_annotator_42_0.png

Plot signature continuum for all functional MeCs

[15]:

fig=plmapper.plot_umap_proj_pdata( pdata, mecnamedict , figfile =None)

../_images/notebooks_metatime_annotator_44_0.png

Comparing conditions

The first way is to count cells newly annotated by MetaTiME. A count table can be returned in all ways grouping the cells. The conditions here are dataset - specific.

[16]:

condcol = 'treatment'
clustercol = 'MetaTiME'
samplecol = 'patient'

cellcts = dmec.count_cell_condition( adata.obs, countbycols = [condcol, clustercol, samplecol] )
#cellcts.to_csv('./cell_counts.txt',sep='\t')
cellcts

[16]:

	patient	su001	su002	su003	su004	su005	su006	su007	su008	su009	su010	su012
treatment	MetaTiME
post	B_B	3447.0	0.0	8.0	19.0	55.0	18.0	2.0	127.0	2.0	0.0	0.0
	B_B-Ig-kappa	12.0	3.0	0.0	3.0	8.0	537.0	0.0	27.0	0.0	0.0	2.0
	B_B-Ig-lambda	7.0	1.0	1.0	2.0	3.0	76.0	2.0	32.0	0.0	0.0	0.0
	B_B-PAX5	273.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	B_Plasma	15.0	4.0	0.0	5.0	16.0	164.0	2.0	23.0	8.0	0.0	2.0
...	...	...	...	...	...	...	...	...	...	...	...	...
pre	T_T-CREM	72.0	4.0	7.0	33.0	64.0	83.0	126.0	12.0	324.0	7.0	100.0
	T_T-Immune-synapse	47.0	0.0	6.0	22.0	28.0	35.0	82.0	4.0	146.0	6.0	130.0
	T_T-naive	1.0	0.0	2.0	13.0	13.0	2.0	22.0	1.0	113.0	0.0	13.0
	T_Tfh-CXCL13	25.0	2.0	9.0	16.0	74.0	335.0	94.0	87.0	408.0	0.0	465.0
	T_Treg-T-co-signaling	200.0	6.0	8.0	52.0	79.0	300.0	166.0	90.0	338.0	3.0	353.0

76 rows × 11 columns

Writing the cell counts to table

[22]:

cellcts.to_csv('./cell_counts.txt',sep='\t')

MetaTiME also provides functions to test differential signatures cluster-wise.

[17]:

condcol = 'treatment' # the column in adata.obs to mark the condition
cond1 = ['pre'] # Condition 1 in adata.obs[ condcol ],
cond2 = ['post'] # Condition 2 in adata.obs[ condcol ],
clustercol = 'MetaTiME' # MetaTiME annotated cluster labels


# get the cell barcodes for two conditions
allcellscond1= adata[adata.obs[ condcol ].isin(cond1), ].obs.index
allcellscond2= adata[adata.obs[ condcol ].isin(cond2), ].obs.index

[18]:

print(f'Compare cells based on {condcol} with conditions {cond1} and {cond2}')
print(f'Cells in Condition {cond1} : {len(allcellscond1)}')
print(f'Cells in Condition {cond2} : {len(allcellscond2)}')

Compare cells based on treatment with conditions ['pre'] and ['post']
Cells in Condition ['pre'] : 13710
Cells in Condition ['post'] : 25126

Get A table listing all signatures enriching in each cluster, as candidate for testing differential signature.

[19]:

mec_enriched = dmec.enrich_mec(
    adata=pdata, labelcol =clustercol, mecnamedict=mecnamedict_simp)

Testing all candidate signatures in each cell cluster.

[20]:

diffmec, diffmecsig, diffmec_full = dmec.dmec(adata, pdata, allcellscond1, allcellscond2, clustercol, mec_enriched, mecnamedict_simp,
                                              test_clusters='all', test_method = 'ttest' )

Plotting the differential signature dot plot.

X-axis: difference of mean signature scores between conditions. Y-axis: -⁡log(p-value) from the test statistic chosen. The significant cluster-wise differential signature is marked as “EnrichedMeC@ClusterName”; when the enriched MeC is the same as cluster name, the signature is marked “ClusterName”. Size of the dots is proportionally to the mean signature score.

[27]:

sns.set_theme(style='white')
fig_topdiff = dmec.plot_topdiff(diffmec, fontsize=6)

../_images/notebooks_metatime_annotator_60_0.png

Save all the significant differential signatures@Cluster

[24]:

diff, diff_sig = dmec.topdiff_df(diffmec)

[25]:

diff_sig[:3]

[25]:

	-logp	effect	effect1	meanb	meana	scorecol	cluster1	mec
M_Monocyte-CD14-proliferation@M_Macrophage-IL1-NFKB	47.236925	-1.885173	-1.885173	0.486959	2.372133	MeC_74	M_Macrophage-IL1-NFKB	M_Monocyte-CD14-proliferation
M_Macrophage-IL1-NFKB	43.831255	-2.024648	-2.024648	4.083235	6.107882	MeC_42	M_Macrophage-IL1-NFKB	M_Macrophage-IL1-NFKB
T_NK-T-IFNG-CCL4@M_Macrophage-IL1-NFKB	38.030503	-1.774028	-1.774028	0.264443	2.038471	MeC_34	M_Macrophage-IL1-NFKB	T_NK-T-IFNG-CCL4

Saving this table

[36]:

diff_sig.to_csv('diff_sig_allcluster.txt',sep='\t')